数据库 – 错误地使用索引的Postgres命令

2023年8月19日 158次阅读

好吧有问题说postgres不使用order by但我的情况是错误使用的地方.

没有索引的排序 – 缓存结果后的热运行.需要8.48秒

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246372.98..246622.98 rows=100000 width=72) (actual time=8451.119..8479.138 rows=100000 loops=1)
   Buffers: shared hit=16134 read=35121
   ->  Sort  (cost=246372.98..251348.03 rows=1990021 width=72) (actual time=8451.117..8467.403 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=16134 read=35121
         ->  Seq Scan on users  (cost=0.00..71155.21 rows=1990021 width=72) (actual time=25.448..7782.830 rows=1995958 loops=1)
               Buffers: shared hit=16134 read=35121
 Planning time: 40.542 ms
 Execution time: 8487.556 ms
(10 rows)

使用userid列上的索引进行排序.使用更多磁盘I / O并占用高达6.2分钟

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12771.83 rows=100000 width=72) (actual time=35.498..372437.748 rows=100000 loops=1)
   Buffers: shared hit=60846 read=39425
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255288.96 rows=1998907 width=72) (actual time=35.496..372372.192 rows=100000 loops=1)
         Buffers: shared hit=60846 read=39425
 Planning time: 0.160 ms
 Execution time: 372476.536 ms
(6 rows)

很少有事情需要注意

>在运行两个查询之前,我运行了真空分析.
>两者都是热运行,即我在运行3-4次后接受它们
>有足够的工作mem,它使用前N个堆.虽然问题是没有索引的排序更快.

我的问题不是改善秩序,而是要理解规划师错误估计的原因.在写这个问题的那一刻,我在postgres 9.4上运行了我的Mac OSx上的这些查询.我没有任何其他具有不同操作系统的机器来测试那一刻,也许很快就会生病.

任何人都可以确认这是否是规划师的错误,或者我的机器有问题.

最佳答案我对实际发生的事情感到非常难过.在我做了以下步骤之后,这是新的统计数据.

>重启我的Mac
>将共享缓冲区更改为256 MB(以前为128 MB)
>重新启动postgres

在我做了这些之后,这里是新的统计数据.

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12788.49 rows=100000 width=72) (actual time=0.031..78.785 rows=100000 loops=1)
   Buffers: shared hit=100271
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255244.73 rows=1995958 width=72) (actual time=0.030..65.937 rows=100000 loops=1)
         Buffers: shared hit=100271
 Planning time: 0.119 ms
 Execution time: 84.985 ms
(6 rows)

唯一的变化是没有磁盘I / O,因为所有内容都被缓存,可能是因为增加了共享缓冲区.但实际时间变化超出了逻辑.

没有指数的正常的前N个堆也有所改善.

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246955.09..247205.09 rows=100000 width=72) (actual time=707.350..734.954 rows=100000 loops=1)
   Buffers: shared hit=26071 read=25184
   ->  Sort  (cost=246955.09..251944.99 rows=1995958 width=72) (actual time=707.348..723.127 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=26071 read=25184
         ->  Seq Scan on users  (cost=0.00..71214.58 rows=1995958 width=72) (actual time=9.922..270.684 rows=1995958 loops=1)
               Buffers: shared hit=26071 read=25184
 Planning time: 0.090 ms
 Execution time: 743.788 ms
(10 rows)

随着共享缓冲区更改回128 MB,结果仍然很好.

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12788.49 rows=100000 width=72) (actual time=0.098..232.314 rows=100000 loops=1)
   Buffers: shared hit=61313 read=38958
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255244.73 rows=1995958 width=72) (actual time=0.096..218.272 rows=100000 loops=1)
         Buffers: shared hit=61313 read=38958
 Planning time: 0.131 ms
 Execution time: 238.861 ms
(6 rows)


explain (analyze,buffers) select * from users order by userid limit 100000;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246955.09..247205.09 rows=100000 width=72) (actual time=722.003..749.696 rows=100000 loops=1)
   Buffers: shared hit=16192 read=35063
   ->  Sort  (cost=246955.09..251944.99 rows=1995958 width=72) (actual time=722.001..737.715 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=16192 read=35063
         ->  Seq Scan on users  (cost=0.00..71214.58 rows=1995958 width=72) (actual time=8.584..294.605 rows=1995958 loops=1)
               Buffers: shared hit=16192 read=35063
 Planning time: 0.070 ms
 Execution time: 757.495 ms
(10 rows)

我听说有人说不要在Mac /台式机上取得计时结果,但这完全是疯了.