sql - 优化 BETWEEN 日期语句

我需要帮助优化 Postgres 查询，该查询使用带有 timestamp 字段的 BETWEEN 子句。

我有 2 个表:

ONE(int id_one(PK), datetime cut_time, int f1 ...)

包含大约 3394 行

TWO(int id_two(PK), int id_one(FK), int f2 ...)

包含大约 4000000 行

在 PKs id_one 和 id_two 上，在 FK id_one 和 cut_time 上都有 btree 索引。

我想执行如下查询:

select o.id_one, Date(o.cut_time), o.f1, t.f2 
from one o
inner join two t ON (o.id_one = t.id_one)
where o.cut_time between '2013-01-01' and '2013-01-31';

此查询在大约 7 秒内检索了大约 1.700.000 行。

解释分析报告如下:

Merge Join  (cost=20000000003.53..20000197562.38 rows=1680916 width=24) (actual time=0.017..741.718 rows=1692345 loops=1)"
  Merge Cond: (c.coilid = hf.coilid)
  ->  Index Scan using pk_coils on coils c  (cost=10000000000.00..10000000382.13 rows=1420 width=16) (actual time=0.008..4.539 rows=1404 loops=1)
        Filter: ((cut_time >= '2013-01-01 00:00:00'::timestamp without time zone) AND (cut_time <= '2013-01-31 00:00:00'::timestamp without time zone))
        Rows Removed by Filter: 1990
  ->  Index Scan using idx_fk_lf_data on hf_data hf  (cost=10000000000.00..10000166145.90 rows=4017625 width=16) (actual time=0.003..392.535 rows=1963386 loops=1)
Total runtime: 768.473 ms

未使用时间戳列上的索引。如何优化这个查询？

最佳答案

正确的 DDL 脚本

正确的设置应该是这样的:

db<> fiddle here
<子> Old sqlfiddle

更多关于这个 fiddle 的信息。
假设数据类型 timestamp对于 datetime 列。

查询不正确

BETWEEN 在 timestamp 列的主体上几乎总是错误。见:

Find overlapping date ranges in PostgreSQL

在您的查询中:

SELECT o.one_id, date(o.cut_time), o.f1, t.f2 
FROM   one o
JOIN   two t USING (one_id)
WHERE  o.cut_time BETWEEN '2013-01-01' AND '2013-01-31';

... 字符串常量 '2013-01-01' 和 '2013-01-31' 被强制转换为时间戳 '2013-01-01 00:00' 和 '2013-01-31 00:00' .这不包括 1 月 31 日的大部分时间。时间戳“2013-01-31 12:00”不符合条件，这肯定是错误。
如果您使用“2013-02-01”作为上限，它将包括“2013-02-01 00:00”。还是错了。

要获取 “2013 年 1 月”的所有时间戳，需要:

SELECT o.one_id, date(o.cut_time), o.f1, t.f2 
FROM   one o
JOIN   two t USING (one_id)
WHERE  o.cut_time <b>>=</b> '2013-01-01'
AND    o.cut_time <b><</b>  '2013-02-01';

排除上限。

优化查询

检索 170 万行可能毫无意义。在检索结果之前聚合。

由于表 two 大得多，因此从那里获得多少行至关重要。当检索超过 ~ 5 % 时，通常不会使用 two.one_id 上的普通索引，因为立即按顺序扫描表速度更快。

您的表统计数据已过时，或者您弄乱了成本常量和其他参数(您显然拥有这些参数，请参见下文)以强制 Postgres 无论如何都使用索引。

对于 two 上的索引，我看到的唯一机会是覆盖索引:

CREATE INDEX two_one_id_f2 ON two(one_id, f2);

这样，如果满足某些先决条件，Postgres 可以直接从索引中读取。可能会快一点，不多。没有测试。

`EXPLAIN` 输出中的奇怪数字

关于您在EXPLAIN ANALYZE 中的奇怪数字。 fiddle应该解释一下。

看起来你有这些调试设置:

SET enable_seqscan = off;
SET enable_indexscan = off;
SET enable_bitmapscan = off;

除调试外，所有这些都应on(默认设置)。否则会削弱性能!检查:

SELECT * FROM pg_settings WHERE name ~~ 'enable%';

关于sql - 优化 BETWEEN 日期语句，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16039649/

sql - 优化 BETWEEN 日期语句

正确的 DDL 脚本

查询不正确

优化查询

`EXPLAIN` 输出中的奇怪数字

上一篇：postgresql - 没有显式锁定的postgres死锁

下一篇：postgresql - 使用 PostgreSQL 的货币类型而不是数字

sql - 优化 BETWEEN 日期语句

正确的 DDL 脚本

查询不正确

优化查询

EXPLAIN 输出中的奇怪数字

上一篇：postgresql - 没有显式锁定的postgres死锁

下一篇：postgresql - 使用 PostgreSQL 的货币类型而不是数字

`EXPLAIN` 输出中的奇怪数字