PostgreSQL HashAggregate 性能不足

我在一个大型外部(列存储) 表上有这个非常简单的查询:

select distinct "Business Unit", "Application", "Application Suite", "Account Name" 
    from __tmp_l1_11259 
    where "Date" between '2017-10-01' and '2017-10-31';

因此，我使用提示 哈希组 在 SQL Server 中启动此查询，它在不到两秒内执行。 PostgreSQL 有两种方式:它使用 HashAggregate 或通过对行进行排序和执行 Unique 来对行进行分组。这是计划:

                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2169041.54..2169041.55 rows=1 width=128) (actual time=11671.983..11672.065 rows=407 loops=1)
  Group Key: "Business Unit", "Application", "Application Suite", "Account Name"
  ->  Foreign Scan on tmp_l1_  (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.576..4830.866 rows=14237546 loops=1)
        Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
        Rows Removed by Filter: 8702454
        CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
        CStore File Size: 87457953966
 Planning time: 15.914 ms
 Execution time: 11672.927 ms
(9 rows)

                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2261840.10..2267272.29 rows=1 width=128) (actual time=44412.373..57237.559 rows=407 loops=1)
   ->  Sort  (cost=2261840.10..2262926.54 rows=434575 width=128) (actual time=44412.371..53115.637 rows=14237546 loops=1)
         Sort Key: "Business Unit", "Application", "Application Suite", "Account Name"
         Sort Method: external merge  Disk: 804440kB
         ->  Foreign Scan on tmp_l1_  (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.209..5488.539 rows=14237546 loops=1)
               Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 8702454
               CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
               CStore File Size: 87457953966
 Planning time: 19.011 ms
 Execution time: 76676.073 ms
(11 rows)

我可以在不更改 PostgreSQL 源中 HashAggregate 算法的情况下提高此查询的性能吗？如果是，如何？

最佳答案

如果您可以在插入数据时按日期列对数据进行排序，则由于使用了跳过索引，您的查询可以更快地运行。

关于PostgreSQL HashAggregate 性能不足，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51441193/

PostgreSQL HashAggregate 性能不足

上一篇：sql - 如何获取每个设备的第一个和最后一个元素？

下一篇：Laravel postgresql 加入错误。 "Operator does not exist"