當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

postgresql索引_PostgreSQL中的索引— 9（BRIN）

發布時間：2023/12/16 数据库 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 postgresql索引_PostgreSQL中的索引— 9（BRIN）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

postgresql索引

indexing engine, the interface of access methods, and the following methods: 索引引擎，訪問方法的接口以及以下方法： hash indexes, 哈希索引， B-trees, B樹， GiST, GiST ， SP-GiST, SP-GiST ， GIN, and GIN和RUM. The topic of this article is BRIN indexes.RUM 。本文的主題是BRIN索引。

布林 (BRIN)

一般概念 (General concept)

Unlike indexes with which we've already got acquainted, the idea of BRIN is to avoid looking through definitely unsuited rows rather than quickly find the matching ones. This is always an inaccurate index: it does not contain TIDs of table rows at all.

與我們已經熟悉的索引不同，BRIN的想法是避免瀏覽絕對不合適的行，而不是快速找到匹配的行。這始終是一個不準確的索引：它根本不包含表行的TID。

Simplistically, BRIN works fine for columns where values correlate with their physical location in the table. In other words, if a query without ORDER BY clause returns the column values virtually in the increasing or decreasing order (and there are no indexes on that column).

簡而言之，對于值與表中物理位置相關的列，BRIN可以很好地工作。換句話說，如果沒有ORDER BY子句的查詢實際上以升序或降序返回列值(并且該列上沒有索引)。

This access method was created in scope of Axle, the European project for extremely large analytical databases, with an eye on tables that are several terabyte or dozens of terabytes large. An important feature of BRIN that enables us to create indexes on such tables is a small size and minimal overhead costs of maintenance.

這種訪問方法是在Axle的范圍內創建的， Axle是用于大型分析數據庫的歐洲項目，著眼于幾TB或數十TB的表。 BRIN的一項重要功能使我們能夠在此類表上創建索引，它的體積小且維護開銷最小。

This works as follows. The table is split into ranges that are several pages large (or several blocks large, which is the same) — hence the name: Block Range Index, BRIN. The index stores summary information on the data in each range. As a rule, this is the minimal and maximal values, but it happens to be different, as shown further. Assume that a query is performed that contains the condition for a column; if the sought values do not get into the interval, the whole range can be skipped; but if they do get, all rows in all blocks will have to be looked through to choose the matching ones among them.

其工作原理如下。表被分成多個顯示頁面大(或幾個塊大，這是相同的) 的范圍 -故名：塊范圍索引，布林。索引存儲有關每個范圍中數據的摘要信息。通常，這是最小值和最大值，但碰巧是不同的，如進一步所示。假設執行的查詢包含列的條件；如果所搜索的值未進入該間隔，則可以跳過整個范圍；但如果確實獲得，則必須仔細檢查所有塊中的所有行，以在其中選擇匹配的行。

It will not be a mistake to treat BRIN not as an index, but as an accelerator of sequential scan. We can regard BRIN as an alternative to partitioning if we consider each range as a ?virtual? partition.

將BRIN視為索引，而不是順序掃描的加速器，這不是錯誤的。如果我們將每個范圍視為“虛擬”分區，則可以將BRIN視為分區的替代方案。

Now let's discuss the structure of the index in more detail.

現在讓我們更詳細地討論索引的結構。

結構體 (Structure)

The first (more exactly, zero) page contains the metadata.

第一頁(更確切地說是零)包含元數據。

Pages with the summary information are located at a certain offset from the metadata. Each index row on those pages contains summary information on one range.

帶有摘要信息的頁面與元數據之間有一定的偏移量。這些頁面上的每個索引行都包含一個范圍的摘要信息。

Between the meta page and summary data, pages with the reverse range map (abbreviated as ?revmap?) are located. Actually, this is an array of pointers (TIDs) to the corresponding index rows.

在元頁面和摘要數據之間，找到具有反向范圍圖 (簡稱為“ revmap”)的頁面。實際上，這是指向相應索引行的指針(TID)的數組。

For some ranges, the pointer in ?revmap? can lead to no index row (one is marked in gray in the figure). In such a case, the range is considered to have no summary information yet.

對于某些范圍，?revmap?中的指針可能不會導致索引行(圖中的灰色標記為一個)。在這種情況下，該范圍被認為還沒有摘要信息。

掃描索引 (Scanning the index)

How is the index used if it does not contain references to table rows? This access method certainly cannot return rows TID by TID, but it can build a bitmap. There can be two kinds of bitmap pages: accurate, to the row, and inaccurate, to the page. It's an inaccurate bitmap that is used.

如果索引不包含對表行的引用，該如何使用？這種訪問方法當然不能按TID逐行返回TID，但可以構建位圖。位圖頁面可以有兩種：精確到位的頁面和不精確的頁面。使用的是不正確的位圖。

The algorithm is simple. The map of ranges is sequentially scanned (that is, the ranges are went through in the order of their location in the table). The pointers are used to determine index rows with summary information on each range. If a range does not contain the value sought, it is skipped, and if it can contain the value (or summary information is unavailable), all pages of the range are added to the bitmap. The resulting bitmap is then used as usual.

該算法很簡單。依次掃描范圍圖(即，范圍按照表中位置的順序進行瀏覽)。指針用于確定帶有每個范圍的摘要信息的索引行。如果范圍不包含所尋求的值，則將其跳過，并且如果范圍可以包含該值(或摘要信息不可用)，則該范圍的所有頁面都將添加到位圖中。然后，照常使用生成的位圖。

更新索引 (Updating the index)

It is more interesting how the index is updated when the table is changed.

更改表時如何更新索引更有趣。

When adding a new version of a row to a table page, we determine which range it is contained in and use the map of ranges to find the index row with the summary information. All these are simple arithmetic operations. Let, for instance, the size of a range be four and on page 13, a row version with the value of 42 occur. The number of the range (starting with zero) is 13?/?4?=?3, therefore, in ?revmap? we take the pointer with the offset of 3 (its order number is four).

當將新版本的行添加到表頁面時，我們確定該行包含在哪個范圍中，并使用范圍圖查找包含摘要信息的索引行。所有這些都是簡單的算術運算。例如，假設范圍的大小為4，然后在第13頁上，出現值為42的行版本。范圍的數字(從零開始)是13/4 = 3，因此，在《 revmap》中，我們采用偏移量為3的指針(其順序號為4)。

The minimal value for this range is 31, and the maximal one is 40. Since the new value of 42 is out of the interval, we update the maximal value (see the figure). But if the new value is still within the stored limits, the index does not need to be updated.

此范圍的最小值為31，最大值為40。由于新值42不在間隔內，因此我們更新最大值(請參見圖)。但是，如果新值仍在存儲的限制內，則無需更新索引。

All this relates to the situation when the new version of the page occurs in a range for which the summary information is available. When the index is created, the summary information is computed for all ranges available, but while the table is further expanded, new pages can occur that fall out of the limits. Two options are available here:

所有這些都與頁面的新版本出現在可使用摘要信息的范圍內的情況有關。創建索引時，將為所有可用范圍計算摘要信息，但是在進一步擴展表時，可能會出現超出限制的新頁面。這里有兩個選項：

Usually the index is not updated immediately. This is not a big deal: as already mentioned, when scanning the index, the whole range will be looked through. Actual update is done during ?vacuum?, or it can be done manually by calling ?brin_summarize_new_values? function.

通常，索引不會立即更新。沒什么大不了的：如前所述，在掃描索引時，將瀏覽整個范圍。實際更新是在“真空”期間完成的，也可以通過調用“ brin_summarize_new_values”函數手動完成。

If we create the index with ?autosummarize? parameter, the update will be done immediately. But when pages of the range are populated with new values, updates can happen too often, therefore, this parameter is turned off by default.

如果我們使用?autosummarize?參數創建索引，則更新將立即完成。但是，當使用新值填充范圍頁面時，更新可能會經常發生，因此，默認情況下此參數處于關閉狀態。

When new ranges occur, the size of ?revmap? can increase. Whenever the map, located between the meta page and summary data, needs to be extended by another page, existing row versions are moved to some other pages. So, the map of ranges is always located between the meta page and summary data.

當出現新范圍時，?revmap?的大小可能會增加。每當位于元頁面和摘要數據之間的地圖需要由另一頁面擴展時，現有行版本就會移至其他頁面。因此，范圍圖始終位于元頁面和摘要數據之間。

When a row is deleted,… nothing happens. We can notice that sometimes the minimal or maximal value will be deleted, in which case the interval could be reduced. But to detect this, we would have to read all values in the range, and this is costly.

當刪除一行時，…什么也沒有發生。我們可以注意到，有時最小值或最大值將被刪除，在這種情況下可以減小間隔。但是要檢測到這一點，我們將必須讀取該范圍內的所有值，這是昂貴的。

The correctness of the index is not affected, but search may require looking through more ranges than is actually needed. In general, summary information can be manually recalculated for such a zone (by calling ?brin_desummarize_range? and ?brin_summarize_new_values? functions), but how can we detect such a need? Anyway, no conventional procedure is available to this end.

索引的正確性不受影響，但是搜索可能需要查看比實際需要更多的范圍。通常，可以手動重新計算此類區域的摘要信息(通過調用?brin_desummarize_range?和?brin_summarize_new_values?函數)，但是我們如何檢測到這種需求？無論如何，沒有常規的程序可用于此目的。

Finally, updating a row is just a deletion of the outdated version and addition of a new one.

最后， 更新一行只是刪除過時的版本，而增加新的版本。

例 (Example)

Let's try to build our own mini data warehouse for the data from tables of the demo database. Let's assume that for the purpose of BI reporting, a denormalized table is needed to reflect the flights departed from an airport or landed in the airport to the accuracy of a seat in the cabin. The data for each airport will be added to the table once a day, when it is midnight in the appropriate time zone. The data will be neither updated nor deleted.

讓我們嘗試為演示數據庫表中的數據構建自己的小型數據倉庫。假設出于BI報告的目的，需要使用非規范化表格來反映從機場起飛或降落在機場的航班到機艙座位的準確性。每個機場的數據每天都會在適當時區的午夜12點添加到表中。數據將不會被更新或刪除。

The table will look as follows:

該表如下所示：

demo=# create table flights_bi(airport_code char(3),airport_coord point, -- geo coordinates of airportairport_utc_offset interval, -- time zoneflight_no char(6), -- flight numberflight_type text. -- flight type: departure / arrival scheduled_time timestamptz, -- scheduled departure/arrival time of flightactual_time timestamptz, -- actual time of flightaircraft_code char(3),seat_no varchar(4), -- seat numberfare_conditions varchar(10), -- travel classpassenger_id varchar(20),passenger_name text );

We can simulate the procedure of loading the data using nested loops: an external one — by days (we will consider a?large database, therefore 365 days), and an internal loop — by time zones (from UTC+02 to UTC+12). The query is pretty long and not of particular interest, so I'll hide it under the spoiler.

我們可以模擬使用嵌套循環加載數據的過程：一個外部循環-按天(我們將考慮一個大型數據庫，因此為365天)，一個內部循環-按時區(從UTC + 02到UTC + 12) 。該查詢很長，并且沒有特別的興趣，因此我將其隱藏在擾流器下。

模擬將數據加載到存儲中 (Simulation of loading the data to the storage)

DO $$ <<local>> DECLAREcurdate date := (SELECT min(scheduled_departure) FROM flights);utc_offset interval; BEGINWHILE (curdate <= bookings.now()::date) LOOPutc_offset := interval '12 hours';WHILE (utc_offset >= interval '2 hours') LOOPINSERT INTO flights_biWITH flight (airport_code,airport_coord,flight_id,flight_no,scheduled_time,actual_time,aircraft_code,flight_type) AS (-- прибытияSELECT a.airport_code,a.coordinates,f.flight_id,f.flight_no,f.scheduled_departure,f.actual_departure,f.aircraft_code,'departure'FROM airports a,flights f,pg_timezone_names tznWHERE a.airport_code = f.departure_airportAND f.actual_departure IS NOT NULLAND tzn.name = a.timezoneAND tzn.utc_offset = local.utc_offsetAND timezone(a.timezone, f.actual_departure)::date = curdateUNION ALL-- вылетыSELECT a.airport_code,a.coordinates,f.flight_id,f.flight_no,f.scheduled_arrival,f.actual_arrival,f.aircraft_code,'arrival'FROM airports a,flights f,pg_timezone_names tznWHERE a.airport_code = f.arrival_airportAND f.actual_arrival IS NOT NULLAND tzn.name = a.timezoneAND tzn.utc_offset = local.utc_offsetAND timezone(a.timezone, f.actual_arrival)::date = curdate)SELECT f.airport_code,f.airport_coord,local.utc_offset,f.flight_no,f.flight_type,f.scheduled_time,f.actual_time,f.aircraft_code,s.seat_no,s.fare_conditions,t.passenger_id,t.passenger_nameFROM flight fJOIN seats sON s.aircraft_code = f.aircraft_codeLEFT JOIN boarding_passes bpON bp.flight_id = f.flight_idAND bp.seat_no = s.seat_noLEFT JOIN ticket_flights tfON tf.ticket_no = bp.ticket_noAND tf.flight_id = bp.flight_idLEFT JOIN tickets tON t.ticket_no = tf.ticket_no;RAISE NOTICE '%, %', curdate, utc_offset;utc_offset := utc_offset - interval '1 hour';END LOOP;curdate := curdate + 1;END LOOP; END; $$;demo=# select count(*) from flights_bi;count ----------30517076 (1 row)demo=# select pg_size_pretty(pg_total_relation_size('flights_bi'));pg_size_pretty ----------------4127 MB (1 row)

We get 30 million rows and 4?GB. Not so large a size, but good enough for a laptop: sequential scan took me about 10?seconds.

我們得到3000萬行和4 GB。尺寸不算大，但足以用于筆記本電腦：順序掃描花了我大約10秒鐘。

我們應該在哪些列上創建索引？ (On what columns should we create the index?)

Since BRIN indexes have a small size and moderate overhead costs and updates happen infrequently, if any, a rare opportunity arises to build many indexes ?just in case?, for example, on all fields on which analyst users can create their ad-hoc queries. Won't come useful — never mind, but even an index that is not very efficient will work better than sequential scan for sure. Of course, there are fields on which it is absolutely useless to build an index; pure common sense will prompt them.

由于BRIN索引的大小小且管理費用適中，并且更新很少發生(如果有的話)，因此出現了難得的機會(例如，以防萬一)建立許多索引，例如，在分析師用戶可以創建其臨時查詢的所有字段上。不會有用-沒關系，但是即使是效率不高的索引也肯定會比順序掃描更好。當然，在某些字段上建立索引絕對是沒有用的。純粹的常識會提示他們。

But it should be odd to limit ourselves to this piece of advice, therefore, let's try to state a more accurate criterion.

但是將自己限制在這條建議上應該很奇怪，因此，讓我們嘗試提出一個更準確的標準。

We've already mentioned that the data must somewhat correlate with its physical location. Here it makes sense to remember that PostgreSQL gathers table column statistics, which include the correlation value. The planner uses this value to select between a regular index scan and bitmap scan, and we can use it to estimate the applicability of BRIN index.

我們已經提到，數據必須與其物理位置有所關聯。這里要記住，PostgreSQL收集表列統計信息，其中包括相關值。計劃者使用此值在常規索引掃描和位圖掃描之間進行選擇，我們可以使用它來估計BRIN索引的適用性。

In the above example, the data is evidently ordered by days (by ?scheduled_time?, as well as by ?actual_time? — there is no much difference). This is because when rows are added to the table (without deletions and updates), they are laid out in the file one after another. In the simulation of data loading we did not even use ORDER BY clause, therefore, dates within a day can be, in general, mixed up in an arbitrary way, but ordering must be in place. Let's check this:

在上面的示例中，數據顯然按天排序(按“ scheduled_time”和“ actual_time”排序-差別不大)。這是因為將行添加到表中(沒有刪除和更新)時，它們在文件中一個接一個地排列。在數據加載的模擬中，我們甚至沒有使用ORDER BY子句，因此，通常一天內的日期可以以任意方式混合，但是必須有序。讓我們檢查一下：

The value that is not too close to zero (ideally, near plus-minus one, as in this case), tells us that BRIN index will be appropriate.

該值不太接近零(在這種情況下，理想情況下，接近正負1)告訴我們BRIN指數是合適的。

The travel class ?fare_condition? (the column contains three unique values) and type of the flight ?flight_type? (two unique values) unexpectedly appeared to be in the second and third places. This is an illusion: formally the correlation is high, while actually on several successive pages all possible values will be encountered for sure, which means that BRIN won't do any good.

出差航班類別“ fare_condition?(該列包含三個唯一值)和航班類型“ flight_type?(兩個唯一值)出乎意料地位于第二和第三位。這是一種錯覺：形式上的相關性很高，而實際上在幾個連續的頁面上肯定會遇到所有可能的值，這意味著BRIN不會發揮任何作用。

The time zone ?airport_utc_offset? goes next: in the considered example, within a day cycle, airports are ordered by time zones ?by construction?.

接下來是時區“ airport_utc_offset”：在所考慮的示例中，在一天周期內，按時區“按構造”對機場進行了排序。

It's these two fields, time and time zone, that we will further experiment with.

我們將進一步試驗這兩個字段(時間和時區)。

可能削弱相關性 (Possible weakening of the correlation)

The correlation that is place ?by construction? can be easily weakened when the data is changed. And the matter here is not in a change to a particular value, but in the structure of the multiversion concurrency control: the outdated row version is deleted on one page, but a new version may be inserted wherever free space is available. Due to this, whole rows get mixed up during updates.

更改數據時，很容易削弱“構造”位置的相關性。此處的問題不是更改特定值，而是多版本并發控件的結構：過時的行版本在一頁上被刪除，但是只要有可用空間，就可以插入新版本。因此，整個行在更新期間會混合在一起。

We can partially control this effect by reducing the value of ?fillfactor? storage parameter and this way leaving free space on a page for future updates. But do we want to increase the size of an already huge table? Besides, this does not resolve the issue of deletions: they also ?set traps? for new rows by freeing the space somewhere inside existing pages. Due to this, rows that otherwise would get to the end of file, will be inserted at some arbitrary place.

我們可以通過減小?fillfactor?存儲參數的值來部分控制此效果，并通過這種方式在頁面上留下可用空間以供將來更新。但是，我們是否要增加已經很大的桌子的大小？此外，這不能解決刪除問題：它們還通過釋放現有頁面內某處的空間來為新行“設置陷阱”。因此，否則將到達文件末尾的行將插入到任意位置。

By the way, this is a curious fact. Since BRIN index does not contain references to table rows, its availability should not hinder HOT updates at all, but it does.

順便說一句，這是一個奇怪的事實。由于BRIN索引不包含對表行的引用，因此它的可用性不應完全阻止HOT更新，但它確實可以。

So, BRIN is mainly designed for tables of large and even huge sizes that are either not updated at all or updated very slightly. However, it perfectly copes with the addition of new rows (to the end of the table). This is not surprising since this access method was created with a view to data warehouses and analytical reporting.

因此，BRIN主要設計用于甚至根本不更新或更新很小的大型甚至大型表。但是，它完美地應對了新行的增加(到表的末尾)。這并不奇怪，因為創建此訪問方法是為了查看數據倉庫和分析報告。

我們需要選擇什么大小的范圍？ (What size of a range do we need to select?)

If we deal with a terabyte table, our main concern when selecting the size of a range will probably be not to make BRIN index too large. However, in our situation, we can afford analyzing data more accurately.

如果處理一個TB的表，那么在選擇范圍大小時，我們主要關心的可能不是使BRIN索引太大。但是，在我們的情況下，我們可以提供更準確的數據分析能力。

To do this, we can select unique values of a column and see on how many pages they occur. Localization of the values increases the chances of success in applying BRIN index. Moreover, the found number of pages will prompt the size of a range. But if the value is ?spread? over all pages, BRIN is useless.

為此，我們可以選擇列的唯一值，并查看它們出現在多少頁上。值的本地化增加了成功應用BRIN指數的機會。此外，找到的頁數將提示范圍的大小。但是，如果該值在所有頁面上都“傳播”，則BRIN是無用的。

Of course, we should use this technique keeping a watchful eye on an internal structure of the data. For example, it makes no sense to consider each date (more exactly, a timestamp, also including time) as a unique value — we need to round it to days.

當然，我們應該使用這種技術來密切注意數據的內部結構。例如，將每個日期(更確切地說是時間戳，還包括時間)視為唯一值是沒有意義的-我們需要將其舍入為幾天。

Technically, this analysis can be done by looking at the value of the hidden ?ctid? column, which provides the pointer to a row version (TID): the number of the page and the number of the row inside the page. Unfortunately, there is no conventional technique to decompose TID into its two components, therefore, we have to cast types through the text representation:

從技術上講，可以通過查看隱藏的“ ctid”列的值來完成此分析，該值提供了指向行版本(TID)的指針：頁面數和頁面內行數。不幸的是，沒有傳統的技術可以將TID分解為兩個部分，因此，我們必須通過文本表示來轉換類型：

demo=# select min(numblk), round(avg(numblk)) avg, max(numblk) from ( select count(distinct (ctid::text::point)[0]) numblkfrom flights_bigroup by scheduled_time::date ) t;min | avg | max ------+------+------1192 | 1500 | 1796 (1 row)demo=# select relpages from pg_class where relname = 'flights_bi';relpages ----------528172 (1 row)

We can see that each day is distributed across pages pretty evenly, and days are slightly mixed up with each other (1500?&times 365?= 547500, which is only a little larger than the number of pages in the table 528172). This is actually clear ?by construction? anyway.

我們可以看到，每天幾乎均勻地分布在頁面上，并且天彼此之間略有混淆(1500＆times 365 = 547500，這僅比表528172中的頁面數大一點)。無論如何，這實際上是“通過建設”明確的。

Valuable information here is a specific number of pages. With a conventional range size of 128 pages, each day will populate 9–14 ranges. This seems realistic: with a query for a specific day, we can expect an error around 10%.

此處的重要信息是特定數量的頁面。傳統的范圍大小為128頁，每天將填充9-14個范圍。這似乎很現實：查詢特定的一天，我們可以預期出現10％左右的錯誤。

Let's try:

我們試試吧：

demo=# create index on flights_bi using brin(scheduled_time);

The size of the index is as small as 184?KB:

索引的大小小至184 KB：

demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_scheduled_time_idx'));pg_size_pretty ----------------184 kB (1 row)

In this case, it hardly makes sense to increase the size of a range at the cost of losing the accuracy. But we can reduce the size if required, and the accuracy will, on the contrary, increase (along with the size of the index).

在這種情況下，以損失精度為代價增加范圍的大小幾乎沒有意義。但是如果需要，我們可以減小大小，相反，準確性會提高(以及索引的大小)。

Now let's look at time zones. Here we cannot use a brute-force approach either. All values should be divided by the number of day cycles instead since the distribution is repeated within each day. Besides, since there are few time zones only, we can look at the entire distribution:

現在讓我們看一下時區。在這里，我們也不能使用暴力手段。所有值都應除以天周期數，而不是因為每天都會重復分配。此外，由于只有幾個時區，我們可以查看整個分布：

demo=# select airport_utc_offset, count(distinct (ctid::text::point)[0])/365 numblk from flights_bi group by airport_utc_offset order by 2;airport_utc_offset | numblk --------------------+--------12:00:00 | 606:00:00 | 802:00:00 | 1011:00:00 | 1308:00:00 | 2809:00:00 | 2910:00:00 | 4004:00:00 | 4707:00:00 | 11005:00:00 | 23103:00:00 | 932 (11 rows)

On average, the data for each time zone populates 133?pages a day, but the distribution is highly non-uniform: Petropavlovsk-Kamchatskiy and Anadyr fit as few as six pages, while Moscow and its neighborhood require hundreds of them. The default size of a range is no good here; let's, for example, set it to four pages.

平均而言，每個時區的數據每天填充133頁，但分布高度不均勻：Petropavlovsk-Kamchatskiy和Anadyr僅有六頁，而莫斯科及其附近地區則需要數百頁。范圍的默認大小在這里不合適。例如，將其設置為四個頁面。

demo=# create index on flights_bi using brin(airport_utc_offset) with (pages_per_range=4);demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_airport_utc_offset_idx'));pg_size_pretty ----------------6528 kB (1 row)

執行計劃 (Execution plan)

Let's look at how our indexes work. Let's select some day, say, a week ago (in the demo database, ?today? is determined by ?booking.now? function):

讓我們看一下索引的工作方式。讓我們選擇某天，例如一周前(在演示數據庫中，“今天”由“ booking.now”函數確定)：

demo=# \set d 'bookings.now()::date - interval \'7 days\''demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '1 day';QUERY PLAN --------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=10.282..94.328 rows=83954 loops=1)Recheck Cond: ...Rows Removed by Index Recheck: 12045Heap Blocks: lossy=1664-> Bitmap Index Scan on flights_bi_scheduled_time_idx(actual time=3.013..3.013 rows=16640 loops=1)Index Cond: ...Planning time: 0.375 msExecution time: 97.805 ms

As we can see, the planner used the index created. How accurate is it? The ratio of the number of rows that meet the query conditions (?rows? of Bitmap Heap Scan node) to the total number of rows returned using the index (the same value plus Rows Removed by Index Recheck) tells us about this. In this case 83954?/?(83954?+?12045), which is approximately 90%, as expected (this value will change from one day to another).

如我們所見，計劃者使用了創建的索引。它有多精確？滿足查詢條件的行數(“位圖堆掃描”節點的“行”)與使用索引返回的總行數(相同的值加上通過索引重新檢查刪除的行)之比告訴我們這一點。在這種情況下，為83954 /(83954 + 12045)，大約為預期值的90％(此值將一天到一天更改)。

Where does the 16640 number in ?actual rows? of Bitmap Index Scan node originate from? The thing is that this node of the plan builds an inaccurate (page-by-page) bitmap and is completely unaware of how many rows the bitmap will touch, while something needs to be shown. Therefore, in despair one page is assumed to contain 10 rows. The bitmap contains 1664 pages in total (this value is shown in ?Heap Blocks: lossy=1664?); so, we just get 16640. Altogether, this is a senseless number, which we should not pay attention to.

位圖索引掃描節點的“實際行”中的16640數字從何而來？問題在于，該計劃的該節點將構建不準確的(逐頁)位圖，并且完全不知道該位圖將觸摸多少行，而需要顯示某些內容。因此，絕望地假設一頁包含10行。位圖總共包含1664頁(此值在《堆塊：有損= 1664》中顯示)；因此，我們只得到16640。這是一個毫無意義的數字，我們不應該注意。

How about airports? For example, let's take the time zone of Vladivostok, which populates 28 pages a day:

機場呢？例如，讓我們以符拉迪沃斯托克(Vladivostok)的時區為例，該時區每天填充28頁：

demo=# explain (costs off,analyze)select *from flights_biwhere airport_utc_offset = interval '8 hours';QUERY PLAN ----------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=75.151..192.210 rows=587353 loops=1)Recheck Cond: (airport_utc_offset = '08:00:00'::interval)Rows Removed by Index Recheck: 191318Heap Blocks: lossy=13380-> Bitmap Index Scan on flights_bi_airport_utc_offset_idx(actual time=74.999..74.999 rows=133800 loops=1)Index Cond: (airport_utc_offset = '08:00:00'::interval)Planning time: 0.168 msExecution time: 212.278 ms

The planner again uses the BRIN index created. The accuracy is worse (about 75% in this case), but this is expected since the correlation is lower.

計劃者再次使用創建的BRIN索引。準確性較差(在這種情況下約為75％)，但這是可以預期的，因為相關性較低。

Several BRIN indexes (just like any other ones) can certainly be joined at the bitmap level. For example, the following is the data on the selected time zone for a month (notice ?BitmapAnd? node):

當然，可以在位圖級別上連接幾個BRIN索引(就像其他索引一樣)。例如，以下是所選時區一個月的數據(注意“ BitmapAnd”節點)：

demo=# \set d 'bookings.now()::date - interval \'60 days\''demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '30 days'and airport_utc_offset = interval '8 hours';QUERY PLAN ---------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=62.046..113.849 rows=48154 loops=1)Recheck Cond: ...Rows Removed by Index Recheck: 18856Heap Blocks: lossy=1152-> BitmapAnd (actual time=61.777..61.777 rows=0 loops=1)-> Bitmap Index Scan on flights_bi_scheduled_time_idx(actual time=5.490..5.490 rows=435200 loops=1)Index Cond: ...-> Bitmap Index Scan on flights_bi_airport_utc_offset_idx(actual time=55.068..55.068 rows=133800 loops=1)Index Cond: ...Planning time: 0.408 msExecution time: 115.475 ms

與B樹比較 (Comparison with B-tree)

What if we create regular B-tree index on the same field as BRIN?

如果我們在與BRIN相同的字段上創建常規B樹索引，該怎么辦？

demo=# create index flights_bi_scheduled_time_btree on flights_bi(scheduled_time);demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_scheduled_time_btree'));pg_size_pretty ----------------654 MB (1 row)

It appeared to be several thousand times larger than our BRIN! However, the query is performed a little faster: the planner used statistics to figure out that the data is physically ordered and it is not needed to build a bitmap and, mainly, that the index condition does not need to be rechecked:

它似乎比我們的BRIN大數千倍！但是，查詢的執行速度要快一些：計劃者使用統計信息來確定數據是物理排序的，不需要構建位圖，并且主要是不需要重新檢查索引條件：

demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '1 day';QUERY PLAN ----------------------------------------------------------------Index Scan using flights_bi_scheduled_time_btree on flights_bi(actual time=0.099..79.416 rows=83954 loops=1)Index Cond: ...Planning time: 0.500 msExecution time: 85.044 ms

That's what is so wonderful about BRIN: we sacrifice the efficiency, but gain very much space.

對于BRIN而言，這真是太妙了：我們犧牲了效率，卻獲得了很大的空間。

操作員類別 (Operator classes)

最小最大 (minmax)

For data types whose values can be compared with one another, summary information consists of the minimal and maximal values. Names of the corresponding operator classes contain ?minmax?, for example, ?date_minmax_ops?. Actually, these are data types that we were considering so far, and most of the types are of this kind.

對于其值可以相互比較的數據類型，摘要信息由最小值和最大值組成。相應的運算符類別的名稱包含?minmax?，例如?date_minmax_ops?。實際上，這些是我們到目前為止正在考慮的數據類型，并且大多數類型都是這種類型。

包括的 (inclusive)

Comparison operators are defined not for all data types. For example, they are not defined for points (?point? type), which represent the geographical coordinates of airports. By the way, it's for this reason that the statistics do not show the correlation for this column.

并非為所有數據類型定義比較運算符。例如，沒有為代表機場地理坐標的點(“點”類型)定義它們。順便說一下，正是由于這個原因，統計信息并未顯示此列的相關性。

demo=# select attname, correlation from pg_stats where tablename='flights_bi' and attname = 'airport_coord';attname | correlation ---------------+-------------airport_coord | (1 row)

But many of such types enable us to introduce a concept of a ?bounding area?, for example, a bounding rectangle for geometric shapes. We discussed in detail how GiST index uses this feature. Similarly, BRIN also enables gathering summary information on columns having data types like these: the bounding area for all values inside a range is just the summary value.

但是許多這樣的類型使我們能夠引入“邊界區域”的概念，例如，幾何形狀的邊界矩形。我們詳細討論了GiST索引如何使用此功能。同樣，BRIN還可以收集具有以下數據類型的列的摘要信息：范圍內所有值的邊界區域僅是摘要值。

Unlike for GiST, the summary value for BRIN must be of the same type as the values being indexed. Therefore, we cannot build the index for points, although it is clear that the coordinates could work in BRIN: the longitude is closely connected with the time zone. Fortunately, nothing hinders creation of the index on an expression after transforming points into degenerate rectangles. At the same time, we will set the size of a range to one page, just to show the limit case:

與GiST不同，BRIN的摘要值必須與所索引的值具有相同的類型。因此，盡管很明顯坐標可以在BRIN中工作，但我們無法建立點的索引：經度與時區緊密相關。幸運的是，在將點轉換為退化的矩形后，沒有任何事情會妨礙在表達式上創建索引。同時，我們將范圍的大小設置為一頁，以顯示極限情況：

demo=# create index on flights_bi using brin (box(airport_coord)) with (pages_per_range=1);

The size of the index is as small as 30?MB even in such an extreme situation:

即使在這種極端情況下，索引的大小也只有30 MB：

demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_box_idx'));pg_size_pretty ----------------30 MB (1 row)

Now we can make up queries that limit the airports by coordinates. For example:

現在，我們可以組成通過坐標限制機場的查詢。例如：

demo=# select airport_code, airport_name from airports where box(coordinates) <@ box '120,40,140,50';airport_code | airport_name --------------+-----------------KHV | Khabarovsk-NovyiVVO | Vladivostok (2 rows)

The planner will, however, refuse to use our index.

但是，計劃者將拒絕使用我們的索引。

demo=# analyze flights_bi;demo=# explain select * from flights_bi where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN ---------------------------------------------------------------------Seq Scan on flights_bi (cost=0.00..985928.14 rows=30517 width=111)Filter: (box(airport_coord) <@ '(140,50),(120,40)'::box)

Why? Let's disable sequential scan and see what happens:

為什么？讓我們禁用順序掃描，看看會發生什么：

demo=# set enable_seqscan = off;demo=# explain select * from flights_bi where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN --------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (cost=14079.67..1000007.81 rows=30517 width=111)Recheck Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)-> Bitmap Index Scan on flights_bi_box_idx(cost=0.00..14072.04 rows=30517076 width=0)Index Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)

It appears that the index can be used, but the planner supposes that the bitmap will have to be built on the whole table (look at ?rows? of Bitmap Index Scan node), and it is no wonder that the planner chooses sequential scan in this case. The issue here is that for geometric types, PostgreSQL does not gather any statistics, and the planner has to go blindly:

看來可以使用索引，但是計劃者認為位圖必須建立在整個表上(請看“位圖索引掃描”節點的“行”)，也就不足為奇了。這個案例。這里的問題是，對于幾何類型，PostgreSQL不會收集任何統計信息，并且計劃者必須盲目行動：

Alas. But there are no complaints about the index — it does work and works fine:

唉。但是沒有人對該索引有任何抱怨，它確實可以正常工作：

demo=# explain (costs off,analyze) select * from flights_bi where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN ----------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=158.142..315.445 rows=781790 loops=1)Recheck Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)Rows Removed by Index Recheck: 70726Heap Blocks: lossy=14772-> Bitmap Index Scan on flights_bi_box_idx(actual time=158.083..158.083 rows=147720 loops=1)Index Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)Planning time: 0.137 msExecution time: 340.593 ms

The conclusion must be like this: PostGIS is needed if anything nontrivial is required of the geometry. It can gather statistics anyway.

結論必須是這樣的：如果幾何圖形有任何重要要求，則需要PostGIS。它仍然可以收集統計信息。

內部構造 (Internals)

The conventional extension ?pageinspect? enables us to look inside BRIN index.

傳統的擴展名“ pageinspect”使我們能夠查看BRIN索引的內部。

First, the metainformation will prompt us the size of a range and how many pages are allocated for ?revmap?:

首先，元信息將提示我們范圍的大小以及?revmap?分配了多少頁：

demo=# select * from brin_metapage_info(get_raw_page('flights_bi_scheduled_time_idx',0));magic | version | pagesperrange | lastrevmappage ------------+---------+---------------+----------------0xA8109CFA | 1 | 128 | 3 (1 row)

Pages 1–3 here are allocated for ?revmap?, while the rest contain summary data. From ?revmap? we can get references to summary data for each range. Say, the information on the first range, incorporating first 128 pages, is located here:

此處的第1至3頁分配給?revmap?，其余的則包含摘要數據。從《 revmap》中，我們可以獲得每個范圍的摘要數據的引用。說，第一個范圍的信息(包含前128頁)位于以下位置：

demo=# select * from brin_revmap_data(get_raw_page('flights_bi_scheduled_time_idx',1)) limit 1;pages ---------(6,197) (1 row)

And this is the summary data itself:

這是摘要數據本身：

demo=# select allnulls, hasnulls, value from brin_page_items(get_raw_page('flights_bi_scheduled_time_idx',6),'flights_bi_scheduled_time_idx' ) where itemoffset = 197;allnulls | hasnulls | value ----------+----------+----------------------------------------------------f | f | {2016-08-15 02:45:00+03 .. 2016-08-15 17:15:00+03} (1 row)

Next range:

下一個范圍：

demo=# select * from brin_revmap_data(get_raw_page('flights_bi_scheduled_time_idx',1)) offset 1 limit 1;pages ---------(6,198) (1 row)demo=# select allnulls, hasnulls, value from brin_page_items(get_raw_page('flights_bi_scheduled_time_idx',6),'flights_bi_scheduled_time_idx' ) where itemoffset = 198;allnulls | hasnulls | value ----------+----------+----------------------------------------------------f | f | {2016-08-15 06:00:00+03 .. 2016-08-15 18:55:00+03} (1 row)

And so on.

等等。

For ?inclusion? classes, the ?value? field will display something like

對于“包含”類，“值”字段將顯示類似

{(94.4005966186523,69.3110961914062),(77.6600036621,51.6693992614746) .. f .. f}

The first value is the embedding rectangle, and ?f? letters at the end denote lacking empty elements (the first one) and lacking unmergeable values (the second one). Actually, the only unmergeable values are ?IPv4? and ?IPv6? addresses (?inet? data type).

第一個值是嵌入矩形，末尾的“ f”字母表示缺少空元素(第一個)和缺少不可合并的值(第二個)。實際上，唯一不可合并的值是“ IPv4”和“ IPv6”地址(“ inet”數據類型)。

物產 (Properties)

Reminding you of the queries that have already been provided.

提醒您已經提供的查詢。

The following are the properties of the access method:

以下是訪問方法的屬性：

Indexes can be created on several columns. In this case, its own summary statistics are gathered for each column, but they are stored together for each range. Of course, this index makes sense if one and the same size of a range is suitable for all columns.

可以在幾列上創建索引。在這種情況下，將為每列收集其自己的摘要統計信息，但對于每個范圍將它們一起存儲。當然，如果一個且相同大小的范圍適用于所有列，則此索引才有意義。

The following index-layer properties are available:

以下索引層屬性可用：

Evidently, only bitmap scan is supported.

顯然，僅支持位圖掃描。

However, lack of clustering may seem confusing. Seemingly, since BRIN index is sensitive to physical order of rows, it would be logical to be able to cluster data according to the index. But this is not so. We can only create a ?regular? index (B-tree or GiST, depending on the data type) and cluster according to it. By the way, do you want to cluster a supposedly huge table taking into account Exclusive locks, execution time, and consumption of disk space during rebuilding?

但是，缺乏群集似乎令人困惑。看來，由于BRIN索引對行的物理順序很敏感，因此能夠根據索引對數據進行聚類是合乎邏輯的。但是事實并非如此。我們只能創建一個“常規”索引(B樹或GiST，取決于數據類型)并根據它進行聚類。順便說一句，您是否要考慮到排他鎖，執行時間以及重建過程中磁盤空間的消耗，來對一個據稱龐大的表進行聚類？

The following are the column-layer properties:

以下是列層屬性：

The only available property is the ability to manipulate NULLs.

唯一可用的屬性是操作NULL的能力。

Read on.繼續閱讀。

翻譯自: https://habr.com/en/company/postgrespro/blog/452900/

postgresql索引

總結

以上是生活随笔為你收集整理的postgresql索引_PostgreSQL中的索引— 9（BRIN）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：简单理解通大查询下学期课表原理
下一篇： Mysql-04-DQL-基础查询-条件