pg_pathman 的安装与使用简介

2023年7月13日 387次阅读来源: 遥想公瑾当年

pg_pathman是pg社区一款优秀的分区管理插件，提供了分区优化机制。当前该插件只支持postgresql9.5+版本(本人测试版本不能低于9.5.3）。本文简单简单介绍下安装与应用测试。

一 pg_pathman安装

1.1 下载与安装

#下载，从https://github.com/postgrespro/pg_pathman/releases地址，获取release版本，选择一个版本下载。
[root@bogon opt]# wget https://github.com/postgrespro/pg_pathman/archive/1.5.3.tar.gz
# 解压
[root@bogon opt]# tar -zxvf v1.5.3.tar.gz
# 导入pg安装的环境变量
[root@bogon opt]#  source /home/postgres/.bashrc
[root@bogon opt]# cd pg_pathman-1.5.3
[root@bogon pg_pathman-1.5.3]# make USE_PGXS=1
[root@bogon pg_pathman-1.5.3]# make USE_PGXS=1 install
#更改pg的配置文件
[root@bogon pg_pathman]# cd /home/postgres/data
[root@bogon data]# vi postgresql.conf
#将shared_preload_libraries注释取消，将下面变量赋值进去
shared_preload_libraries = 'pg_pathman,pg_stat_statements' 
# esc退出，wq!保存退出！

1.2 创建扩展

#修改配置文件后，重启生效
[root@bogon data]# su - postgres
[postgres@bogon ~]$ pg_ctl restart -D $PGDATA

[postgres@bogon ~]$ psql test
psql (9.6.0)
Type "help" for help.

test=# create extension pg_pathman;
CREATE EXTENSION
# 查看已安装的扩展
test=# \dx
                            List of installed extensions
    Name    | Version |   Schema   |                   Description             
      
------------+---------+------------+-------------------------------------------
------
 pg_pathman | 1.1     | public     | Partitioning tool ver. 1.1
 plpgsql    | 1.0     | pg_catalog | PL/pgSQL procedural language
 uuid-ossp  | 1.1     | public     | generate universally unique identifiers (U
UIDs)
(3 rows)

1.3 插件升级

随着时间推移，新版本插件会不断开放出来，pg_pathman提供简单的升级方案如下：

正常安装新版本的pg_pathman插件
重启pg服务
执行sql版本更新命令

ALTER EXTENSION pg_pathman UPDATE;
SET pg_pathman.enable = t;

二分区管理

目前支持两种分区类型，range与hash分区。

2.1 range分区

2.1.1 函数定义

指定起始值、分区间隔、分区个数：

create_range_partitions(relation       REGCLASS,  -- 主表OID
                        attribute      TEXT,      -- 分区列名或者分区表达式
                        start_value    ANYELEMENT,  -- 开始值
                        p_interval     ANYELEMENT,  -- 间隔；任意类型，适合任意类型的分区表
                        p_count        INTEGER DEFAULT NULL,   --  分多少个区
                        partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 
--不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

指定起始值、终值、分区间隔：

create_partitions_from_range(relation       REGCLASS,  -- 主表OID
                             attribute      TEXT,      -- 分区列名或者分区表达式
                             start_value    ANYELEMENT,  -- 开始值
                             end_value      ANYELEMENT,  -- 结束值
                             p_interval     INTERVAL,    -- 间隔；interval 类型，用于时间分区表
                             partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区
--不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

非阻塞式迁移：

partition_table_concurrently(relation   REGCLASS,              -- 主表OID
                             batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
                             sleep_time FLOAT8 DEFAULT 1.0)    -- 获得行锁失败时，休眠多久再次获取，重试60次退出任务。

2.1.2 分区案例

建立测试表：

CREATE TABLE journal (
    id      SERIAL,
    dt      TIMESTAMP NOT NULL,
    level   INTEGER,
    msg     TEXT);
CREATE INDEX ON journal(dt);

插入测试数据：

INSERT INTO journal (dt, level, msg) SELECT g, random() * 6, 
md5(g::text) FROM generate_series('2019-01-01'::date, '2019-12-31'::date, '1 minute') as g;

创建分区表：

SELECT create_range_partitions(
        'journal',--主表名
        'dt',   --分区字段
        '2019-01-01'::date, --分区起始日期
        '1 day'::interval, --分区间隔
        null,     --不指定分区数量，根据时间与间隔会自动计算出数量
        false --默认tue立即迁移数据，false是不迁移数据
);

查看数据：

只统计主表数据量（分区，但数据未迁移）
select count(*) from only journal;
 count  
--------
 524161
(1 row)

非堵塞式数据迁移，并查看数据：

select partition_table_concurrently('journal',10000,1.0);
select count(*) from only journal;
 count 
-------
     0
(1 row)
#父表中数据已经为0，迁移全部完毕

#查看子表数据
select * from journal_100 limit 10;
   id   |         dt          | level |               msg                
--------+---------------------+-------+----------------------------------
 142561 | 2019-04-10 00:00:00 |     6 | 9abfac5750d9bdbe393f20fafdef1910
 142562 | 2019-04-10 00:01:00 |     2 | a6d4a432988bfe2479ba015080b78371
 142563 | 2019-04-10 00:02:00 |     1 | 1cbaf78ef1bb808b4b1e5c97ed8ab90f
 142564 | 2019-04-10 00:03:00 |     4 | 3a6b114163ccca5cb51f98d56727ebb1
 142565 | 2019-04-10 00:04:00 |     3 | b95b04fb7baa5be92031ce3d077a7ade
 142566 | 2019-04-10 00:05:00 |     5 | 58140210fe51c37de4b32a1a200c9338
 142567 | 2019-04-10 00:06:00 |     2 | 24d43e90c1cac164816f0fd9c35675a7
 142568 | 2019-04-10 00:07:00 |     3 | 889f1edb26556ddafcda315e1f3dea92
 142569 | 2019-04-10 00:08:00 |     4 | 78e37d1aa41262492856bd560a4df9f2
 142570 | 2019-04-10 00:09:00 |     3 | d354a5c8bac6224fa0b505316a814e3d
(10 rows)

查看分区表执行计划：

 explain select * from journal where dt between '2019-03-29 06:00:00' and '2019-03-29 10:00:00' ;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..11.61 rows=242 width=49)
   ->  Seq Scan on journal  (cost=0.00..0.00 rows=1 width=49)
         Filter: ((dt >= '2019-03-29 06:00:00'::timestamp without time zone) AND (dt <= '2019-03-29 10:00:00'::timestamp without time zone))
   ->  Index Scan using journal_88_dt_idx on journal_88  (cost=0.28..10.40 rows=241 width=49)
         Index Cond: ((dt >= '2019-03-29 06:00:00'::timestamp without time zone) AND (dt <= '2019-03-29 10:00:00'::timestamp without time zone))
(5 rows)

注意:

分区列必须有not null约束
分区个数必须能覆盖已有的所有记录

2.2 hash分区

建立测试表

CREATE TABLE items (
    id       SERIAL PRIMARY KEY,
    name     TEXT,
    code     BIGINT);

插入测试数据

INSERT INTO items (id, name, code)
SELECT g, md5(g::text), random() * 100000
FROM generate_series(1, 100000) as g;

分区并迁移

SELECT create_hash_partitions('items', 'id', 100);

查询

 SELECT * FROM items WHERE id = 1234;
  id  |               name               | code  
------+----------------------------------+-------
 1234 | 81dc9bdb52d04dc20036dbd8313ed055 | 87938
(1 row)

EXPLAIN SELECT * FROM items WHERE id = 1234;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Append  (cost=0.28..2.50 rows=1 width=44)
   ->  Index Scan using items_11_pkey on items_11  (cost=0.28..2.50 rows=1 width=44)
         Index Cond: (id = 1234)
(3 rows)

    原文作者：遥想公瑾当年
    原文地址: https://www.jianshu.com/p/7a8737250456
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。