PostgreSQL group by date_trunc聚合索引大于不使用索引

我有一个包含几百万行的表。我在这个表上有一个表达式索引(我创建了两个方向以查看它是否有效果。

CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC)
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC)

我正在尝试进行查询，使用分组依据收集每小时的状态计数，但仅限于今天(或过去 7 天)创建的状态。但是，尝试删除特定日期之前的所有条目不会使用索引，而是会过滤所有行。但是，如果我删除大于号并使用等于号，则使用索引。我将 EXPLAIN 的输出放在下面。希望有人可以帮助我使此查询使用索引或至少提高性能，使其以毫秒而不是秒为单位。

使用等于索引是正确使用的:

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00';
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
 Total runtime: 4.416 ms
(6 rows)

但是，一旦我使用大于(或小于)，这就会导致查询在没有索引的情况下过滤表。

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00';
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1)
   ->  Seq Scan on statuses  (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1)
         Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone)
         Rows Removed by Filter: 3620426
 Total runtime: 2916.049 ms
(5 rows)

在这种情况下，我可以通过使用 IN 并在我想选择的区域内每小时列出一次来解决这个问题，但我真的很想弄清楚为什么索引没有被用于大于查询？

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00');
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
 Total runtime: 7.305 ms
(6 rows)

最佳答案

statuses 表的估计 is 26 times more然后是“错误”查询返回的实际行数。

尝试运行 VACUUM ANALYZE status;
如果运气不好，请增加 statuses.created_at 列的统计目标 ALTER TABLE statuses ALTER created_at SET STATISTICS 500; 并再次分析。

这应该有所帮助。

编辑:您需要检查您的autovacuum 设置。

阅读this手册的一部分并像这样检查您的配置:

SELECT name,setting,source FROM pg_settings WHERE name ~ 'autovacuum';

如果您的表太大，您可以使用 ALTER TABLE tab SET (storage_parameter = ...) 调整 autovacuum_analyze_threshold 和/或 autovacuum_analyze_scale_factor语法。

关于PostgreSQL group by date_trunc聚合索引大于不使用索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14739402/

PostgreSQL group by date_trunc聚合索引大于不使用索引

上一篇：c# - 多行插入的 IDBCommand 参数

下一篇：ruby-on-rails-3 - 使用远程 Postgres 数据库安装 Postgres Gem PG