當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

浅谈PostgreSQL的索引

發布時間：2024/4/13 数据库 54 豆豆

生活随笔收集整理的這篇文章主要介紹了浅谈PostgreSQL的索引小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1. 索引的特性

1.1 加快條件的檢索的特性

當表數據量越來越大時查詢速度會下降，在表的條件字段上使用索引，快速定位到可能滿足條件的記錄，不需要遍歷所有記錄。

create table t(id int, info text); insert into t select generate_series(1,10000),'lottu'||generate_series(1,10000); create table t1 as select * from t; create table t2 as select * from t; create index ind_t2_id on t2(id); lottu=# analyze t1; ANALYZE lottu=# analyze t2; ANALYZE # 沒有索引 lottu=# explain (analyze,buffers,verbose) select * from t1 where id < 10;QUERY PLAN -----------------------------------------------------------------------------------------------------Seq Scan on lottu.t1 (cost=0.00..180.00 rows=9 width=13) (actual time=0.073..5.650 rows=9 loops=1)Output: id, infoFilter: (t1.id < 10)Rows Removed by Filter: 9991Buffers: shared hit=55Planning time: 25.904 msExecution time: 5.741 ms (7 rows) # 有索引 lottu=# explain (analyze,verbose,buffers) select * from t2 where id < 10;QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Index Scan using ind_t2_id on lottu.t2 (cost=0.29..8.44 rows=9 width=13) (actual time=0.008..0.014 rows=9 loops=1)Output: id, infoIndex Cond: (t2.id < 10)Buffers: shared hit=3Planning time: 0.400 msExecution time: 0.052 ms (6 rows)

#在這個案例中：執行同一條SQL。t2有索引的執行數據是0.052 ms；t1沒有索引的是：5.741 ms;?

1.2 有序的特性

索引本身就是有序的。

#沒有索引 lottu=# explain (analyze,verbose,buffers) select * from t1 where id > 2 order by id;QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Sort (cost=844.31..869.31 rows=9999 width=13) (actual time=8.737..11.995 rows=9998 loops=1)Output: id, infoSort Key: t1.idSort Method: quicksort Memory: 853kBBuffers: shared hit=55-> Seq Scan on lottu.t1 (cost=0.00..180.00 rows=9999 width=13) (actual time=0.038..5.133 rows=9998 loops=1)Output: id, infoFilter: (t1.id > 2)Rows Removed by Filter: 2Buffers: shared hit=55Planning time: 0.116 msExecution time: 15.205 ms (12 rows)#有索引 lottu=# explain (analyze,verbose,buffers) select * from t2 where id > 2 order by id;QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------Index Scan using ind_t2_id on lottu.t2 (cost=0.29..353.27 rows=9999 width=13) (actual time=0.030..5.304 rows=9998 loops=1)Output: id, infoIndex Cond: (t2.id > 2)Buffers: shared hit=84Planning time: 0.295 msExecution time: 7.027 ms (6 rows)

#在這個案例中：執行同一條SQL。

t2有索引的執行數據是7.027 ms；t1沒有索引的是：15.205 ms;
t1沒有索引執行還占用了 Memory: 853kB。

2. 索引掃描方式

索引的掃描方式有3種

2.1 Indexscan

先查索引找到匹配記錄的ctid，再通過ctid查堆表

2.2 bitmapscan

先查索引找到匹配記錄的ctid集合，把ctid通過bitmap做集合運算和排序后再查堆表

2.3 Indexonlyscan

如果索引字段中包含了所有返回字段，對可見性映射 (vm)中全為可見的數據塊，不查堆表直接返回索引中的值。

這里談談Indexscan掃描方式和Indexonlyscan掃描方式
對這兩種掃描方式區別；借用oracle中索引掃描方式來講；Indexscan掃描方式會產生回表讀。根據上面解釋來說；Indexscan掃描方式：查完索引之后還需要查表。 Indexonlyscan掃描方式只需要查索引。也就是說：Indexonlyscan掃描方式要優于Indexscan掃描方式？我們來看看

現有表t；在字段id上面建來ind_t_id索引 1. t表沒有VM文件。 lottu=# \d+ tTable "lottu.t"Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+-------------id | integer | | plain | | info | text | | extended | | Indexes:"ind_t_id" btree (id)lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Index Only Scan using ind_t_id on lottu.t (cost=0.29..8.44 rows=9 width=4) (actual time=0.009..0.015 rows=9 loops=1)Output: idIndex Cond: (t.id < 10)Heap Fetches: 9Buffers: shared hit=3Planning time: 0.177 msExecution time: 0.050 ms (7 rows) #人為更改執行計劃 lottu=# set enable_indexonlyscan = off; SET lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;QUERY PLAN ------------------------------------------------------------------------------------------------------------------Index Scan using ind_t_id on lottu.t (cost=0.29..8.44 rows=9 width=4) (actual time=0.008..0.014 rows=9 loops=1)Output: idIndex Cond: (t.id < 10)Buffers: shared hit=3Planning time: 0.188 msExecution time: 0.050 ms (6 rows) # 可以發現兩者幾乎沒有差異；唯一不同的是Indexonlyscan掃描方式存在掃描的Heap Fetches時間。這個時間是不在Execution time里面的。 2. t表有VM文件 lottu=# delete from t where id >200 and id < 500; DELETE 299 lottu=# vacuum t; VACUUM lottu=# analyze t; ANALYZE lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Index Only Scan using ind_t_id on lottu.t (cost=0.29..4.44 rows=9 width=4) (actual time=0.008..0.012 rows=9 loops=1)Output: idIndex Cond: (t.id < 10)Heap Fetches: 0Buffers: shared hit=3Planning time: 0.174 msExecution time: 0.048 ms (7 rows)lottu=# set enable_indexonlyscan = off; SET lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;QUERY PLAN ------------------------------------------------------------------------------------------------------------------Index Scan using ind_t_id on lottu.t (cost=0.29..8.44 rows=9 width=4) (actual time=0.012..0.022 rows=9 loops=1)Output: idIndex Cond: (t.id < 10)Buffers: shared hit=3Planning time: 0.179 msExecution time: 0.077 ms (6 rows)

總結：

Index Only Scan在沒有VM文件的情況下, 速度比Index Scan要慢, 因為要掃描所有的Heap page。差異幾乎不大。
Index Only Scan存在VM文件的情況下，是要比Index Scan要快。

知識點1：

VM文件：稱為可見性映射文件；該文件存在表示：該數據塊沒有需要清理的行。即已經做了vaccum操作。

知識點2：

人為選擇執行計劃。可設置enable_xxx參數有

enable_bitmapscan
enable_hashagg
enable_hashjoin
enable_indexonlyscan
enable_indexscan
enable_material
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan

參考文獻

參考德哥：《PostgreSQL 性能優化培訓 3 DAY.pdf》
https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

3. 索引的類型

PostgreSQL 支持索引類型有: B-tree, Hash, GiST, SP-GiST, GIN and BRIN。

postgresql----Btree索引:http://www.cnblogs.com/alianbog/p/5621749.html
postgresql----hash索引：一般只用于簡單等值查詢。不常用。
postgresql----Gist索引:http://www.cnblogs.com/alianbog/p/5628543.html

4. 索引的管理

4.1 創建索引

創建索引語法：

lottu=# \h create index Command: CREATE INDEX Description: define a new index Syntax: CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )[ WITH ( storage_parameter = value [, ... ] ) ][ TABLESPACE tablespace_name ][ WHERE predicate ] 接下來我們以t表為例。 1. 關鍵字【UNIQUE】 #創建唯一索引；主鍵就是一種唯一索引 CREATE UNIQUE INDEX ind_t_id_1 on t (id); 2. 關鍵字【CONCURRENTLY】 # 這是并發創建索引。跟oracle的online創建索引作用是一樣的。創建索引過程中；不會阻塞表更新，插入，刪除操作。當然創建的時間就會很漫長。 CREATE INDEX CONCURRENTLY ind_t_id_2 on t (id); 3. 關鍵字【IF NOT EXISTS】 #用該命令是用于確認索引名是否存在。若存在；也不會報錯。 CREATE INDEX IF NOT EXISTS ind_t_id_3 on t (id); 4. 關鍵字【USING】 # 創建哪種類型的索引。默認是B-tree。 CREATE INDEX ind_t_id_4 on t using btree (id); 5 關鍵字【[ ASC | DESC ] [ NULLS { FIRST | LAST]】 # 創建索引是采用降序還是升序。若字段存在null值，是把null值放在前面還是最后：例如采用降序，null放在前面。 CREATE INDEX ind_t_id_5 on t (id desc nulls first) 6. 關鍵字【WITH ( storage_parameter = value)】 #索引的填充因子設為。例如創建索引的填充因子設為75 CREATE INDEX ind_t_id_6 on t (id) with (fillfactor = 75); 7. 關鍵字【TABLESPACE】 #是把索引創建在哪個表空間。 CREATE INDEX ind_t_id_7 on t (id) TABLESPACE tsp_lottu; 8. 關鍵字【WHERE】 #只在自己感興趣的那部分數據上創建索引，而不是對每一行數據都創建索引，此種方式創建索引就需要使用WHERE條件了。 CREATE INDEX ind_t_id_8 on t (id) WHERE id < 1000;

4.2 修改索引

修改索引語法

lottu=# \h alter index Command: ALTER INDEX Description: change the definition of an index Syntax: #把索引重新命名 ALTER INDEX [ IF EXISTS ] name RENAME TO new_name #把索引遷移表空間 ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name #把索引重設置填充因子 ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] ) #把索引的填充因子設置為默認值 ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] ) #把表空間TSP1中索引遷移到新表空間 ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]SET TABLESPACE new_tablespace [ NOWAIT ]

4.3 刪除索引

刪除索引語法

lottu=# \h drop index Command: DROP INDEX Description: remove an index Syntax: DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

5. 索引的維護

索引能帶來加快對表中記錄的查詢，排序，以及唯一約束的作用。索引也是有代價

索引需要增加數據庫的存儲空間。
在表記錄執行插入，更新，刪除操作。索引也要更新。

5.1 查看索引的大小

select pg_size_pretty(pg_relation_size('ind_t_id'));

5.2 索引的利用率

--通過pg_stat_user_indexes.idx_scan可檢查利用索引進行掃描的次數；這樣可以確認那些索引可以清理掉。 select idx_scan from pg_stat_user_indexes where indexrelname = 'ind_t_id';

5.3 索引的重建

--如果一個表經過頻繁更新之后，索引性能不好；需要重建索引。 lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); pg_size_pretty ----------------2200 kB (1 row)lottu=# delete from t where id > 1000; DELETE 99000lottu=# analyze t; ANALYZE lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); pg_size_pretty ----------------2200 kBlottu=# insert into t select generate_series(2000,100000),'lottu'; INSERT 0 98001lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); pg_size_pretty ----------------4336 kB (1 row)lottu=# vacuum full t; VACUUMlottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); pg_size_pretty ----------------2176 kB重建方法： 1. reindex：reindex不支持并行重建【CONCURRENTLY】;索引會鎖表；會進行阻塞。 2. vacuum full; 對表進行重構；索引也會重建；同樣也會鎖表。 3. 創建一個新索引(索引名不同)；再刪除舊索引。

總結

以上是生活随笔為你收集整理的浅谈PostgreSQL的索引的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： SaaS市场普及网络推广策略最有效
下一篇： MySQL多个相同结构的表查询并把结果合