我在 PostgreSQL 中有一个包含大约 5000 万条记录的表。试图通过“标签”过滤选择“喜欢”最多的帖子。两个字段都有 b-tree 索引。对于“爱”标签,我得到
EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last limit 12
Limit (cost=0.57..218.52 rows=12 width=12) (actual time=2.658..14.243 rows=12 loops=1)
-> Index Scan using idx_likes on posts (cost=0.57..55759782.55 rows=3070010 width=12) (actual time=2.657..14.239 rows=12 loops=1)
Filter: (tags @> '{love}'::text[])
Rows Removed by Filter: 10584
Planning time: 0.297 ms
Execution time: 14.276 ms
14 毫秒很好,但如果我尝试为“tamir”获取它,它突然变成超过 22 秒!!显然查询规划器做错了什么。
EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last limit 12
Limit (cost=0.57..25747.73 rows=12 width=12) (actual time=17552.406..22839.503 rows=12 loops=1)
-> Index Scan using idx_likes on posts (cost=0.57..55759782.55 rows=25988 width=12) (actual time=17552.405..22839.484 rows=12 loops=1)
Filter: (tags @> '{tamir}'::text[])
Rows Removed by Filter: 11785083
Planning time: 0.253 ms
Execution time: 22839.569 ms
看完the article我在 ORDER BY 中添加了“user_id”,“tamir”的速度非常快,0.2 毫秒!现在它正在执行排序和位图堆扫描,而不是索引扫描。
EXPLAIN analyse select user_id from posts where tags @> array['tamir'] order by likes desc nulls last, user_id limit 12
Limit (cost=101566.17..101566.20 rows=12 width=12) (actual time=0.237..0.238 rows=12 loops=1)
-> Sort (cost=101566.17..101631.14 rows=25988 width=12) (actual time=0.237..0.237 rows=12 loops=1)
Sort Key: likes DESC NULLS LAST, user_id
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on posts (cost=265.40..100970.40 rows=25988 width=12) (actual time=0.074..0.214 rows=126 loops=1)
Recheck Cond: (tags @> '{tamir}'::text[])
Heap Blocks: exact=44
-> Bitmap Index Scan on idx_tags (cost=0.00..258.91 rows=25988 width=0) (actual time=0.056..0.056 rows=126 loops=1)
Index Cond: (tags @> '{tamir}'::text[])
Planning time: 0.287 ms
Execution time: 0.277 ms
但是“爱”会发生什么?现在它从 14 毫秒变为 2.3 秒...
EXPLAIN analyse select user_id from posts where tags @> array['love'] order by likes desc nulls last, user_id limit 12
Limit (cost=7347142.18..7347142.21 rows=12 width=12) (actual time=2360.784..2360.786 rows=12 loops=1)
-> Sort (cost=7347142.18..7354817.20 rows=3070010 width=12) (actual time=2360.783..2360.784 rows=12 loops=1)
Sort Key: likes DESC NULLS LAST, user_id
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on posts (cost=28316.58..7276762.77 rows=3070010 width=12) (actual time=595.274..2171.571 rows=1517679 loops=1)
Recheck Cond: (tags @> '{love}'::text[])
Heap Blocks: exact=642705
-> Bitmap Index Scan on idx_tags (cost=0.00..27549.08 rows=3070010 width=0) (actual time=367.080..367.080 rows=1517679 loops=1)
Index Cond: (tags @> '{love}'::text[])
Planning time: 0.226 ms
Execution time: 2360.863 ms
有人可以阐明为什么会发生这种情况以及解决方法是什么。
更新
“tag”字段有 gin 索引,不是 b-tree,只是拼写错误。
最佳答案
B 树索引对于在数组字段中搜索元素不是很有用。您应该从 tags
字段中删除 b-tree 索引并改用 gin 索引:
drop index idx_tags;
create index idx_tags using gin(tags);
并且不要按 user_id
添加顺序——当有很多行带有您搜索的标签时,这会破坏使用您的 idx_likes
进行排序的可能性。
此外,likes
字段应该not null default 0
。
关于postgresql - Postgres 使用 LIMIT 1 选择性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42960913/