PostgreSQL 9种索引的原理和应用场景

2023年5月7日 299次阅读来源: 阿里云_云栖社区

阅读原文请点击

摘要：标签 PostgreSQL , btree , hash , gin , gist , sp-gist , brin , bloom , rum , zombodb , bitmap 背景 PostgreSQL 拥有众多开放特性，例如 1、开放的数据类型接口，使得PG支持超级丰富的数据类型，除了传统数据库支持的类型，还支持GIS，JSON，RANGE，IP，ISBN，图像特征值，化学，DNA等等扩展的类型，用户还可以根据实际业务扩展更多的类型。

标签

PostgreSQL , btree , hash , gin , gist , sp-gist , brin , bloom , rum , zombodb , bitmap

背景

PostgreSQL 拥有众多开放特性，例如

1、开放的数据类型接口，使得PG支持超级丰富的数据类型，除了传统数据库支持的类型，还支持GIS，JSON，RANGE，IP，ISBN，图像特征值，化学，DNA等等扩展的类型，用户还可以根据实际业务扩展更多的类型。

2、开放的操作符接口，使得PG不仅仅支持常见的类型操作符，还支持扩展的操作符，例如距离符，逻辑并、交、差符号，图像相似符号，几何计算符号等等扩展的符号，用户还可以根据实际业务扩展更多的操作符。

3、开放的外部数据源接口，使得PG支持丰富的外部数据源，例如可以通过FDW读写MySQL, redis, mongo, oracle, sqlserver, hive, www, hbase, ldap, 等等只要你能想到的数据源都可以通过FDW接口读写。

4、开放的语言接口，使得PG支持几乎地球上所有的编程语言作为数据库的函数、存储过程语言，例如plpython , plperl , pljava , plR , plCUDA , plshell等等。用户可以通过language handler扩展PG的语言支持。

5、开放的索引接口，使得PG支持非常丰富的索引方法，例如btree , hash , gin , gist , sp-gist , brin , bloom , rum , zombodb , bitmap (greenplum extend)，用户可以根据不同的数据类型，以及查询的场景，选择不同的索引。

6、PG内部还支持BitmapAnd, BitmapOr的优化方法，可以合并多个索引的扫描操作，从而提升多个索引数据访问的效率。

不同的索引接口针对的数据类型、业务场景是不一样的，接下来针对每一种索引，介绍一下它的原理和应用场景。

一、btree

原理

《深入浅出PostgreSQL B-Tree索引结构》

应用场景

b-tree适合所有的数据类型，支持排序，支持大于、小于、等于、大于或等于、小于或等于的搜索。

索引与递归查询结合，还能实现快速的稀疏检索。

《PostgrSQL 递归SQL的几个应用 – 极客与正常人的思维》

例子

postgres=# create table t_btree(id int, info text);

CREATE TABLE

postgres=# insert into t_btree select generate_series(1,10000), md5(random()::text) ;

INSERT 0 10000

postgres=# create index idx_t_btree_1 on t_btree using btree (id);

CREATE INDEX

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_btree where id=1;

QUERY PLAN

——————————————————————————————————————————-

Index Scan using idx_t_btree_1 on public.t_btree (cost=0.29..3.30 rows=1 width=37) (actual time=0.027..0.027 rows=1 loops=1)

Output: id, info

Index Cond: (t_btree.id = 1)

Buffers: shared hit=1 read=2

Planning time: 0.292 ms

Execution time: 0.050 ms

(6 rows)

二、hash

原理

src/backend/access/hash/README

（hash index entries store only the hash code, not the actual data value, for each indexed item. ）

应用场景

hash索引存储的是被索引字段VALUE的哈希值，只支持等值查询。

hash索引特别适用于字段VALUE非常长（不适合b-tree索引，因为b-tree一个PAGE至少要存储3个ENTRY，所以不支持特别长的VALUE）的场景，例如很长的字符串，并且用户只需要等值搜索，建议使用hash index。

例子

postgres=# create table t_hash (id int, info text);

CREATE TABLE

postgres=# insert into t_hash select generate_series(1,100), repeat(md5(random()::text),10000);

INSERT 0 100

— 使用b-tree索引会报错，因为长度超过了1/3的索引页大小

postgres=# create index idx_t_hash_1 on t_hash using btree (info);

ERROR: index row size 3720 exceeds maximum 2712 for index “idx_t_hash_1”

HINT: Values larger than 1/3 of a buffer page cannot be indexed.

Consider a function index of an MD5 hash of the value, or use full text indexing.

postgres=# create index idx_t_hash_1 on t_hash using hash (info);

CREATE INDEX

postgres=# set enable_hashjoin=off;

SET

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_hash where info in (select info from t_hash limit 1);

QUERY PLAN

————————————————————————————————————————————-

Nested Loop (cost=0.03..3.07 rows=1 width=22) (actual time=0.859..0.861 rows=1 loops=1)

Output: t_hash.id, t_hash.info

Buffers: shared hit=11

-> HashAggregate (cost=0.03..0.04 rows=1 width=18) (actual time=0.281..0.281 rows=1 loops=1)

Output: t_hash_1.info

Group Key: t_hash_1.info

Buffers: shared hit=3

-> Limit (cost=0.00..0.02 rows=1 width=18) (actual time=0.012..0.012 rows=1 loops=1)

Output: t_hash_1.info

Buffers: shared hit=1

-> Seq Scan on public.t_hash t_hash_1 (cost=0.00..2.00 rows=100 width=18) (actual time=0.011..0.011 rows=1 loops=1)

Output: t_hash_1.info

Buffers: shared hit=1

-> Index Scan using idx_t_hash_1 on public.t_hash (cost=0.00..3.02 rows=1 width=22) (actual time=0.526..0.527 rows=1 loops=1)

Output: t_hash.id, t_hash.info

Index Cond: (t_hash.info = t_hash_1.info)

Buffers: shared hit=6

Planning time: 0.159 ms

Execution time: 0.898 ms

(19 rows)

三、gin

原理

gin是倒排索引，存储被索引字段的VALUE或VALUE的元素，以及行号的list或tree。

（ col_val:(tid_list or tid_tree) ， col_val_elements:(tid_list or tid_tree) ）

《PostgreSQL GIN索引实现原理》

《宝剑赠英雄 – 任意组合字段等效查询, 探探PostgreSQL多列展开式B树 (GIN)》

应用场景

阅读原文请点击

    原文作者：阿里云_云栖社区
    原文地址: https://www.jianshu.com/p/45046cd6800d
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。