postgresql - 多列索引中的 boolean 列

标签 postgresql indexing boolean b-tree

测试表和索引:

CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)

INSERT INTO t(cb, ci, co) 
SELECT ((round(random()*1))::int)::boolean, round(random()*100), round(random()*100)
FROM generate_series(1, 1000000)

CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);

问题是我无法强制 PostgreSQL 使用“正确的”索引。下一个查询使用“错误”索引。它不是最佳的,因为它使用“Filter”(条件:cb = TRUE),因此从内存中读取更多数据(并且执行时间变得更长):

explain (analyze, buffers)
SELECT * FROM t WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000

"Limit  (cost=0.42..4063.87 rows=1000 width=13) (actual time=0.057..4.405 rows=1000 loops=1)"
"  Buffers: shared hit=1960"
"  ->  Index Scan using wrong on t  (cost=0.42..21784.57 rows=5361 width=13) (actual time=0.055..4.256 rows=1000 loops=1)"
"        Index Cond: (ci = 46)"
"        Filter: cb"
"        Rows Removed by Filter: 967"
"        Buffers: shared hit=1960"
"Planning time: 0.318 ms"
"Execution time: 4.530 ms"

但是当我将 bool 列转换为 int 时,效果很好。这还不清楚,因为两个索引(right 和 right_hack)的选择性保持不变。

explain (analyze, buffers)
SELECT * FROM t WHERE cb::int = 1 AND ci = 46 ORDER BY co LIMIT 1000

"Limit  (cost=0.42..2709.91 rows=1000 width=13) (actual time=0.027..1.484 rows=1000 loops=1)"
"  Buffers: shared hit=1003"
"  ->  Index Scan using right_hack on t  (cost=0.42..14525.95 rows=5361 width=13) (actual time=0.025..1.391 rows=1000 loops=1)"
"        Index Cond: ((ci = 46) AND ((cb)::integer = 1))"
"        Buffers: shared hit=1003"
"Planning time: 0.202 ms"
"Execution time: 1.565 ms"

在多列索引中使用 boolean 列有什么限制吗?

最佳答案

一个(或两个)条件索引似乎确实有效:

CREATE INDEX true_bits ON ttt (ci, co)
  WHERE cb = True ;

CREATE INDEX false_bits ON ttt (ci, co)
  WHERE cb = False ;

VACUUM ANALYZE ttt;

EXPLAIN (ANALYZE, buffers)
SELECT * FROM ttt
WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000
        ;

计划

                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.25..779.19 rows=1000 width=13) (actual time=0.024..1.804 rows=1000 loops=1)
   Buffers: shared hit=1001
   ->  Index Scan using true_bits on ttt  (cost=0.25..3653.46 rows=4690 width=13) (actual time=0.020..1.570 rows=1000 loops=1)
         Index Cond: (ci = 46)
         Buffers: shared hit=1001
 Planning time: 0.468 ms
 Execution time: 1.949 ms
(7 rows)

尽管如此,低基数列上的索引的增益仍然很小。索引条目避免页面读取的机会非常小。对于 8K 的页面大小和约 20 的行大小,每个页面上有约 400 条记录。任何页面上(几乎)总会有一个true记录(以及一个false记录),因此无论如何该页面都必须被读取。

关于postgresql - 多列索引中的 boolean 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41075036/

相关文章:

python - Pymongo 为某些键指定唯一约束

python - 如何在列表中查找多个最大值项

c++ - "if(T t = ...) { } else return t;"的优雅方式?

c# - boolean 状态未能阻止功能重复

c++ - 不可预测的 boolean 比较 C++

python - psycopg2 中带有标识符的动态 SQL?

python - 使用 .format() 时出现 psycopg2 编程错误

java - 获取 hibernate 中实体的最后一个ID

sql - 更新语句在预填充列上生成空违规

python - 按索引向后切片字符串