有 2 个表,main_table
和 other_table
。如果用户的 user_id
与 main_table.user_id
列匹配或者 other_table< 中存在条目,则允许用户访问
其中他的 user_id 位于 read_acl 列(使用 gin 索引的数组)。两个表都有大约 200 万行。main_table
项目
查询速度非常慢,尤其是,Postgresql 似乎正在对 main_table
中的所有条目进行索引扫描。如果我删除 or
子句并将其重写为 2 个查询,那么速度会好得多。这个问题可以通过某种方式解决吗?
当前查询是这样的:
select *
from main_table
where user_id = 123 or exists (select * from other_table f
where main_table.main_table_id = f.main_table_id
and '{123}' && read_acl)
order by main_table_id limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..170.97 rows=10 width=2590) (actual time=172.734..389.819 rows=10 loops=1)
-> Index Scan using main_table_pk on main_table (cost=0.43..14158795.44 rows=830198 width=2590) (actual time=172.733..389.811 rows=10 loops=1)
Filter: ((user_id = 123) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 776709
SubPlan 1
-> Index Scan using other_table_main_table_id_idx on other_table f (cost=0.43..8.45 rows=1 width=0) (never executed)
Index Cond: (main_table.main_table_id = main_table_id)
Filter: ('{123}'::text[] && read_acl)
SubPlan 2
-> Bitmap Heap Scan on other_table f_1 (cost=1678.58..2992.04 rows=333 width=8) (actual time=9.413..9.432 rows=12 loops=1)
Recheck Cond: ('{123}'::text[] && read_acl)
Heap Blocks: exact=12
-> Bitmap Index Scan on other_table_read_acl (cost=0.00..1678.50 rows=333 width=0) (actual time=9.401..9.401 rows=12 loops=1)
Index Cond: ('{123}'::text[] && read_acl)
Planning Time: 0.395 ms
Execution Time: 389.877 ms
(16 rows)
重写查询 1 OR 子句的第一部分:
select *
from main_table
where user_id = 123
order by main_table_id limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=482.27..482.29 rows=10 width=2590) (actual time=0.039..0.040 rows=8 loops=1)
-> Sort (cost=482.27..482.58 rows=126 width=2590) (actual time=0.038..0.039 rows=8 loops=1)
Sort Key: main_table_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on main_table (cost=5.40..479.54 rows=126 width=2590) (actual time=0.020..0.031 rows=8 loops=1)
Recheck Cond: (user_id = 123)
Heap Blocks: exact=8
-> Bitmap Index Scan on test500 (cost=0.00..5.37 rows=126 width=0) (actual time=0.015..0.015 rows=8 loops=1)
Index Cond: (user_id = 123)
Planning Time: 0.130 ms
Execution Time: 0.066 ms
(11 rows)
重写查询2:
select *
from main_table
where exists (select * from other_table f
where main_table.main_table_id = f.main_table_id
and '{123}' && read_acl)
order by main_table_id limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5771.59..5771.62 rows=10 width=2590) (actual time=8.083..8.086 rows=10 loops=1)
-> Sort (cost=5771.59..5772.42 rows=333 width=2590) (actual time=8.082..8.083 rows=10 loops=1)
Sort Key: main_table.main_table_id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=2985.30..5764.40 rows=333 width=2590) (actual time=8.018..8.072 rows=12 loops=1)
-> HashAggregate (cost=2984.88..2988.21 rows=333 width=8) (actual time=7.999..8.004 rows=12 loops=1)
Group Key: f.main_table_id
-> Bitmap Heap Scan on other_table f (cost=1670.58..2984.04 rows=333 width=8) (actual time=7.969..7.990 rows=12 loops=1)
Recheck Cond: ('{123}'::text[] && read_acl)
Heap Blocks: exact=12
-> Bitmap Index Scan on other_table_read_acl (cost=0.00..1670.50 rows=333 width=0) (actual time=7.957..7.958 rows=12 loops=1)
Index Cond: ('{123}'::text[] && read_acl)
-> Index Scan using main_table_pk on main_table (cost=0.43..8.34 rows=1 width=2590) (actual time=0.005..0.005 rows=1 loops=12)
Index Cond: (main_table_id = f.main_table_id)
Planning Time: 0.431 ms
Execution Time: 8.137 ms
(16 rows)
最佳答案
这正如预期的那样。 WHERE
条件中的 OR
对于优化器来说通常是个问题。是的,您通常可以将查询重写为问题中显示的部分查询的 UNION
或 UNION ALL
,但这不是优化器可以自动执行的操作。
考虑这个例子:
CREATE TABLE a (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
x integer,
p integer
);
CREATE TABLE b (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
x integer,
q integer
);
INSERT INTO a (x, p) VALUES
(1, 1),
(1, 1),
(2, 1);
INSERT INTO b (x, q) VALUES
(1, 3),
(2, 3);
现在使用 OR
进行查询会给出:
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1 OR b.q = 3;
x │ p │ q
═══╪═══╪═══
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
(3 rows)
使用 UNION
或 UNION ALL
重写查询会产生不同的结果:
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;
x │ p │ q
═══╪═══╪═══
2 │ 1 │ 3
1 │ 1 │ 3
(2 rows)
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION ALL
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;
x │ p │ q
═══╪═══╪═══
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
(6 rows)
您问题中的情况使用 SELECT *
,它将包含主键,因此在这种情况下使用 UNION
是安全的。但这超出了优化器的能力范围。
关于sql - 在 where 条件中添加 OR 子句比两个单独的查询慢得多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73801858/