执行计划路径选择
postgresql查询规划过程中,查询请求的不同执行方案是通过建立不同的路径来表达的,在生成许多符合条件的路径之后,要从中选择出代价最小的路径,把它转化为一个计划,传递给执行器执行,规划器的核心工作就是生成多条路径,然后从中找出最优的那一条。
代价评估
评估路径优劣的依据是用系统表pg_statistic中的统计信息估算出来的不同路径的代价(cost),PostgreSQL估计计划成本的方式:基于统计信息估计计划中各个节点的成本。PostgreSQL会分析各个表来获取一个统计信息样本(这个操作通常是由autovacuum这个守护进程周期性的执行analyze,来收集这些统计信息,然后保存到pg_statistic和pg_class里面)。
用于估算代价的参数postgresql.conf
# - Planner Cost Constants -
#seq_page_cost = 1.0 # measured on an arbitrary scale 顺序磁盘扫描时单个页面的开销 #random_page_cost = 4.0 # same scale as above 随机磁盘访问时单页面的读取开销 #cpu_tuple_cost = 0.01 # same scale as above cpu处理每一行的开销 #cpu_index_tuple_cost = 0.005 # same scale as above cpu处理每个索引行的开销 #cpu_operator_cost = 0.0025 # same scale as above cpu处理每个运算符或者函数调用的开销 #parallel_tuple_cost = 0.1 # same scale as above 计算并行处理的成本,如果成本高于非并行,则不会开启并行处理。 #parallel_setup_cost = 1000.0 # same scale as above #min_parallel_relation_size = 8MB #effective_cache_size = 4GB 再一次索引扫描中可用的文件系统内核缓冲区有效大小 也可以使用 show all的方式查看
路径的选择
--查看表信息
db_jcxxglpt=# \d t_jcxxgl_tjaj
Table "db_jcxx.t_jcxxgl_tjaj" Column | Type | Modifiers --------------+--------------------------------+----------- c_bh | character(32) | not null c_xzdm | character varying(300) | c_jgid | character(32) | c_ajbm | character(22) | ... Indexes: "t_jcxxgl_tjaj_pkey" PRIMARY KEY, btree (c_bh) "idx_ttjaj_cah" btree (c_ah) "idx_ttjaj_dslrq" btree (d_slrq) 首先更新统计信息vacuum analyze t_jcxxgl_tjaj,许多时候可能因为统计信息的不准确导致了不正常的执行计划 --执行计划,全表扫描 db_jcxxglpt=# explain (analyze,verbose,costs,buffers,timing)select c_bh,c_xzdm,c_jgid,c_ajbm from t_jcxxgl_tjaj where d_slrq >='2018-03-18'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on db_jcxx.t_jcxxgl_tjaj (cost=0.00..9.76 rows=3 width=96) (actual time=1.031..1.055 rows=3 loops =1) Output: c_bh, c_xzdm, c_jgid, c_ajbm Filter: (t_jcxxgl_tjaj.d_slrq >= '2018-03-18'::date) Rows Removed by Filter: 138 Buffers: shared hit=8 Planning time: 6.579 ms Execution time: 1.163 ms (7 rows) --执行计划,关闭全表扫描 db_jcxxglpt=# set session enable_seqscan = off; SET db_jcxxglpt=# explain (analyze,verbose,costs,buffers,timing)select c_bh,c_xzdm,c_jgid,c_ajbm from t_jcxxgl_tjaj where d_slrq >='2018-03-18'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Index Scan using idx_ttjaj_dslrq on db_jcxx.t_jcxxgl_tjaj (cost=0.14..13.90 rows=3 width=96) (actual time=0.012..0.026 rows=3 loops=1) Output: c_bh, c_xzdm, c_jgid, c_ajbm Index Cond: (t_jcxxgl_tjaj.d_slrq >= '2018-03-18'::date) Buffers: shared hit=4 Planning time: 0.309 ms Execution time: 0.063 ms (6 rows) d_slrq上面有btree索引,但是查看执行计划并没有走索引,这是为什么呢? 代价计算: 一个路径的估算由三部分组成:启动代价(startup cost),总代价(totalcost),执行结果的排序方式(pathkeys) 代价估算公式:总代价=启动代价+I/O代价+CPU代价(cost=S+P+W*T) P:执行时要访问的页面数,反应磁盘的I/O次数 T:表示在执行时所要访问的元组数,反映了cpu开销 W:表示磁盘I/O代价和CPU开销建的权重因子 统计信息:统计信息的其中一部分是每个表和索引中项的总数,以及每个表和索引占用的磁盘块数。这些信息保存在pg_class表的reltuples和relpages列中。我们可以这样查询相关信息: --查看统计信息 db_jcxxglpt=# select relpages,reltuples from pg_class where relname ='t_jcxxgl_tjaj'; relpages | reltuples ----------+----------- 8 | 141 (1 row) total_cost = 1(seq_page_cost)*8(磁盘总页数)+0.01(cpu_tuple_cost)*141(表的总记录数)+0.0025(cpu_operation_cost)*141(表的总记录数)=9.7625 可以看到走索引的cost=13.90比全表扫描cost=9.76要大。在表较小的情况下,全表扫描比索引扫描更有效, index scan 至少要发生两次I/O,一次是读取索引块,一次是读取数据块。
seq_scan源码
/*
* cost_seqscan * Determines and returns the cost of scanning a relation sequentially. * * 'baserel' is the relation to be scanned * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL */ void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info) { Cost startup_cost = 0; Cost cpu_run_cost; Cost disk_run_cost; double spc_seq_page_cost; QualCost qpqual_cost; Cost cpu_per_tuple; /* Should only be applied to base relations */ Assert(baserel->relid > 0); Assert(baserel->rtekind == RTE_RELATION); /* Mark the path with the correct row estimate */ if (param_info) path->rows = param_info->ppi_rows; else path->rows = baserel->rows; if (!enable_seqscan) startup_cost += disable_cost; /* fetch estimated page cost for tablespace containing table */ get_tablespace_page_costs(baserel->reltablespace, NULL,&spc_seq_page_cost); /* * disk costs */ disk_run_cost = spc_seq_page_cost * baserel->pages; /* CPU costs */ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost); startup_cost += qpqual_cost.startup; cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; cpu_run_cost = cpu_per_tuple * baserel->tuples; /* tlist eval costs are paid per output row, not per tuple scanned */ startup_cost += path->pathtarget->cost.startup; cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows; /* Adjust costing for parallelism, if used. */ if (path->parallel_workers > 0) { double parallel_divisor = get_parallel_divisor(path); /* The CPU cost is divided among all the workers. */ cpu_run_cost /= parallel_divisor; /* * It may be possible to amortize some of the I/O cost, but probably * not very much, because most operating systems already do aggressive * prefetching. For now, we assume that the disk run cost can't be * amortized at all. */ /* * In the case of a parallel plan, the row count needs to represent * the number of tuples processed per worker. */ path->rows = clamp_row_est(path->rows / parallel_divisor); } path->startup_cost = startup_cost; path->total_cost = startup_cost + cpu_run_cost + disk_run_cost; }
一个SQL优化实例
慢SQL:
select c_ajbh, c_ah, c_cbfy, c_cbrxm, d_larq, d_jarq, n_dbjg, c_yqly from db_zxzhld.t_zhld_db dbxx join db_zxzhld.t_zhld_ajdbxx dbaj on dbxx.c_bh = dbaj.c_dbbh where dbxx.n_valid=1 and dbxx.n_state in (1,2,3) and dbxx.c_dbztbh='1003' and dbaj.c_zblx='1003' and dbaj.c_dbfy='0' and dbaj.c_gy = '2550' and c_ajbh in (select distinct c_ajbh from db_zxzhld.t_zhld_zbajxx where n_dbzt = 1 and c_zblx = '1003' and c_gy = '2550' ) order by d_larq asc, c_ajbh asc limit 15 offset 0 慢sql耗时:7s 咋们先过下这个sql是干什么的、首先dbxx和dbaj的一个join连接然后dbaj.c_ajbh要包含在zbaj表里面,做了个排序,取了15条记录、大概就这样。 Sql有个缺点就是我不知道查询的字段是从那个表里面取的、建议加上表别名.字段。 查看该sql的表的数据量: db_zxzhld.t_zhld_db :1311 db_zxzhld.t_zhld_ajdbxx :341296 db_zxzhld.t_zhld_zbajxx :1027619 执行计划: 01 Limit (cost=36328.67..36328.68 rows=1 width=107) (actual time=88957.677..88957.729 rows=15 loops=1) 02 -> Sort (cost=36328.67..36328.68 rows=1 width=107) (actual time=88957.653..88957.672 rows=15 loops=1) 03 Sort Key: dbaj.d_larq, dbaj.c_ajbh 04 Sort Method: top-N heapsort Memory: 27kB 05 -> Nested Loop Semi Join (cost=17099.76..36328.66 rows=1 width=107) (actual time=277.794..88932.662 rows=8605 loops=1) 06 Join Filter: ((dbaj.c_ajbh)::text = (t_zhld_zbajxx.c_ajbh)::text) 07 Rows Removed by Join Filter: 37018710 08 -> Nested Loop (cost=0.00..19200.59 rows=1 width=107) (actual time=199.141..601.845 rows=8605 loops=1) 09 Join Filter: (dbxx.c_bh = dbaj.c_dbbh) 10 Rows Removed by Join Filter: 111865 11 -> Seq Scan on t_zhld_ajdbxx dbaj (cost=0.00..19117.70 rows=219 width=140) (actual time=198.871..266.182 rows=8605 loops=1) 12 Filter: ((n_valid = 1) AND ((c_zblx)::text = '1003'::text) AND ((c_dbfy)::text = '0'::text) AND ((c_gy)::text = '2550'::text)) 13 Rows Removed by Filter: 332691 14 -> Materialize (cost=0.00..66.48 rows=5 width=33) (actual time=0.001..0.017 rows=14 loops=8605) 15 -> Seq Scan on t_zhld_db dbxx (cost=0.00..66.45 rows=5 width=33) (actual time=0.044..0.722 rows=14 loops=1) 16 Filter: ((n_valid = 1) AND ((c_dbztbh)::text = '1003'::text) AND (n_state = ANY ('{1,2,3}'::integer[]))) 17 Rows Removed by Filter: 1297 18 -> Materialize (cost=17099.76..17117.46 rows=708 width=32) (actual time=0.006..4.890 rows=4303 loops=8605) 19 -> HashAggregate (cost=17099.76..17106.84 rows=708 width=32) (actual time=44.011..54.924 rows=8605 loops=1) 20 Group Key: t_zhld_zbajxx.c_ajbh 21 -> Bitmap Heap Scan on t_zhld_zbajxx (cost=163.36..17097.99 rows=708 width=32) (actual time=5.218..30.278 rows=8605 loops=1) 22 Recheck Cond: ((n_dbzt = 1) AND ((c_zblx)::text = '1003'::text)) 23 Filter: ((c_gy)::text = '2550'::text) 24 Rows Removed by Filter: 21849 25 Heap Blocks: exact=960 26 -> Bitmap Index Scan on i_tzhldzbajxx_zblx_dbzt (cost=0.00..163.19 rows=5876 width=0) (actual time=5.011..5.011 rows=30458 loops=1) 27 Index Cond: ((n_dbzt = 1) AND ((c_zblx)::text = '1003'::text)) 28 Planning time: 1.258 ms 29 Execution time: 88958.029 ms 执行计划解读: 1:第27->21行,通过索引i_tzhldzbajxx_zblx_dbzt过滤表t_zhld_zbajxx的数据,然后根据过滤条件(c_gy)::text = '2550'::text过滤最终返回8605条数据 2:第17->15行,根据条件过滤t_zhld_db表的数据,最终返回了14条数据 3:第20->