postgresql - 即使增加了 work_mem 大小,性能也没有提高

标签 postgresql query-performance

我返回一个平均执行时间为 170 秒的查询。我浏览了 PSQL 文档,他们提到如果我们增加 work_mem 性能将会提高。我将 work_mem 增加到 1000 MB 即使性能没有提高。

注意:我索引了所有属于查询部分的字段。

下面我粘贴了数据库中存在的记录、查询执行计划、查询、结果。

  • 数据库中存在的记录数:
event_logs=> select count(*) from events;
  count   
----------
 18706734
(1 row)
  • 查询:
select raw->'request_payload'->'source'->0 as file, 
       count(raw->'request_payload'->>'status') as count, 
       raw->'request_payload'->>'status' as status 
from events 
where client = 'NTT' 
  and to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' 
  and event_name = 'wbs_indexing' 
group by raw->'request_payload'->'source'->0, 
         raw->'request_payload'->>'status';
  • 结果:
 file                   | count  | status  
-----------------------------+--------+--
 "xyz.csv"              |  91878 | failure
 "abc.csv"              |  91816 | failure
 "efg.csv"              | 398196 | failure
(3 rows)

  • 默认 work_mem(4 MB) 查询执行计划:
event_logs=> SHOW work_mem;
 work_mem 
----------
 4MB
(1 row)

event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count,  raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT'  and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
                                                                             QUERY PLAN                                                       

----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Finalize GroupAggregate  (cost=3256017.54..3267087.56 rows=78474 width=72) (actual time=172547.598..172965.581 rows=3 loops=1)
   Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
   ->  Gather Merge  (cost=3256017.54..3264829.34 rows=65674 width=72) (actual time=172295.204..172965.630 rows=9 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=3255017.52..3256248.91 rows=32837 width=72) (actual time=172258.342..172737.534 rows=3 loops=3)
               Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
               ->  Sort  (cost=3255017.52..3255099.61 rows=32837 width=533) (actual time=171794.584..172639.670 rows=193963 loops=3)
                     Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                     Sort Method: external merge  Disk: 131856kB
                     ->  Parallel Seq Scan on events  (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98846.155..169311.063 rows=193963 loops=3)
                           Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
                           Rows Removed by Filter: 6041677
 Planning time: 0.953 ms
 Execution time: 172983.273 ms
(15 rows)

  • 增加 work_mem(1000 MB) 查询执行计划:
event_logs=> SHOW work_mem;
 work_mem 
----------
 1000MB
(1 row)

event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count,  raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT'  and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
                                                                            QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=3248160.04..3259230.06 rows=78474 width=72) (actual time=167979.419..168189.228 rows=3 loops=1)
   Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
   ->  Gather Merge  (cost=3248160.04..3256971.84 rows=65674 width=72) (actual time=167949.951..168189.282 rows=9 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=3247160.02..3248391.41 rows=32837 width=72) (actual time=167945.607..168083.707 rows=3 loops=3)
               Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
               ->  Sort  (cost=3247160.02..3247242.11 rows=32837 width=533) (actual time=167917.891..167975.549 rows=193963 loops=3)
                     Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                     Sort Method: quicksort  Memory: 191822kB
                     ->  Parallel Seq Scan on events  (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98849.936..167570.669 rows=193963 loops=3)
                           Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
                           Rows Removed by Filter: 6041677
 Planning time: 0.238 ms
 Execution time: 168199.046 ms
(15 rows)

  • 有人可以帮助我改进此查询的性能吗?

最佳答案

增加 work_mem 似乎确实使排序速度提高了大约 8 倍:(172639.670 - 169311.063)/(167975.549 - 167570.669)。但是由于排序只占用了整个执行时间的一小部分,即使让它快 1000 倍也不能使事情变得更好。占用时间的是seq扫描。

seq 扫描中的大部分时间可能都花在了 IO 上。您可以在打开 track_io_timing 后运行 EXPLAIN (ANALYZE, BUFFERS) 来查看。

此外,并行化 seq 扫描通常不是很有帮助,因为 IO 系统通常能够将其全部容量提供给单个读取器,这要归功于预读的魔力。有时并行读者甚至会踩到对方的脚趾,使整体性能变差。您可以使用 set max_parallel_workers_per_gather TO 0; 禁用并行化,这可能会使事情变得更快,如果不是这样,至少会使 EXPLAIN 计划更容易理解。

您正在获取超过 3% 的表:193963/(193963 + 6041677)。当您获取如此多的索引时,索引可能不是很有帮助。如果他们是,你会想要一个组合索引,而不是单独的索引。因此,您需要在 (client, event_name, date(datetime)) 上建立索引。然后,您还需要将查询更改为使用 date(datetime) 而不是 to_char(datetime, 'YYYY-MM-DD')。您需要进行此更改,因为 to_char 不是不可变的,因此无法编制索引。

关于postgresql - 即使增加了 work_mem 大小,性能也没有提高,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58735307/

相关文章:

regex - 正则表达式匹配从 PostgreSQL 8.3 到 9.2 有什么变化?

json - 在 Postgres 中查询 JSON 错误 : function json_extract_path_text(text, 文本)不存在

sql - 在WHERE中使用Oracle内置UPPER函数会导致SELECT语句的性能下降吗?

mysql - 针对最近的事件组优化大型 MySQL 查询(73MM 行)

java - GAE/J 中的投影查询可带来性能提升

java - 如何使用 clojure.java.jdbc 从 postgresql 查询 JSON 数据类型?

ruby-on-rails - 使用字符串轮廓符号订购 Rails 模型

node.js - 正确使用 sequelize distinct 选项(在 postgres 上)

php - 为什么这个 UPDATE 查询会耗尽我的 CPU?

mysql - 在 ORDER BY 子句和聚合函数别名中使用聚合函数是否存在与性能相关的差异?