sql - PostgreSQL 不使用部分索引

我在 PostgreSQL 9.2 中有一个表，它有一个 text 列。我们称其为 text_col。此列中的值非常独特(最多可能包含 5-6 个重复项)。该表有大约 500 万行。这些行中大约有一半包含 text_col 的 null 值。当我执行以下查询时，我预计会有 1-5 行。在大多数情况下 (>80%)，我只期望 1 行。

查询

explain analyze SELECT col1,col2.. colN
FROM table 
WHERE text_col = 'my_value';

btree 索引存在于 text_col 上。查询规划器从未使用过该索引，我不确定为什么。这是查询的输出。

规划师

Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384

我添加了另一个部分索引来尝试过滤掉那些不为空的值，但这没有帮助(有或没有 text_pattern_ops。我不需要 text_pattern_ops考虑到在我的查询中没有表达 LIKE 条件，但它们也匹配相等)。

CREATE INDEX name_idx
  ON table
  USING btree
  (text_col COLLATE pg_catalog."default" text_pattern_ops)
  WHERE text_col IS NOT NULL;

使用 set enable_seqscan = off; 禁用序列扫描使规划器仍然选择 seqscan 而不是 index_scan。总结...

此查询返回的行数很少。
鉴于非空行相当独特，对文本进行索引扫描应该会更快。
清空和分析表并没有帮助优化器选择索引。

我的问题

为什么数据库选择序列扫描而不是索引扫描？
当一个表有一个文本列，其相等条件应该被检查时，是否有任何我可以遵循的最佳实践？
如何减少此查询所用的时间？

[编辑 - 更多信息]

索引扫描是在我的本地数据库中获取的，该数据库包含生产中可用数据的大约 10%。

最佳答案

A partial index排除表中明显不需要的一半行是个好主意。更简单:

CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;

一定要在创建索引后运行ANALYZE table。 (如果您不手动执行此操作，Autovacuum 会在一段时间后自动执行此操作，但如果您在创建后立即进行测试，您的测试将失败。)

然后，为了说服查询计划者可以使用特定的部分索引，请在查询中重复 WHERE 条件 - 即使它看起来完全多余:

SELECT col1,col2, .. colN
FROM   table 
WHERE  text_col = 'my_value'
<b>AND   text_col IS NOT NULL</b>;  -- repeat condition

瞧。

Per documentation :

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index.

对于参数化查询:同样，将部分索引的(冗余)谓词添加为附加的常量 WHERE 条件，它工作得很好。

Postgres 9.6 中的一项重要更新大大提高了 index-only scans 的几率(这可以使查询更便宜，查询计划者将更容易选择这样的查询计划)。相关:

PostgreSQL not using index during count(*)

关于sql - PostgreSQL 不使用部分索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26030354/

sql - PostgreSQL 不使用部分索引

查询

规划师

我的问题

[编辑 - 更多信息]

上一篇：ruby-on-rails - 数据库中非唯一索引的目的是什么？

下一篇：插入语句中的sql问号