具有 OR 的 SQL 查询比 2 个单独的查询慢得多

标签 sql postgresql relational-database query-optimization postgresql-9.3

当我解释以下查询时:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status IN (2,3))
      OR NOT EXISTS (SELECT spl.processInstanceId
                     FROM ProcessInstanceLog spl
                     WHERE l.processinstanceid = spl.processinstanceid);

它产生:

Delete on audittaskimpl l  (cost=8.61..424652.49 rows=38144 width=6)
  ->  Seq Scan on audittaskimpl l  (cost=8.61..424652.49 rows=38144 width=6)
        Filter: ((hashed SubPlan 1) OR (NOT (SubPlan 2)))
        SubPlan 1
          ->  Index Scan using idx_pinstlog_status on processinstancelog spl  (cost=0.29..8.61 rows=1 width=8)
                Index Cond: (status = ANY ('{2,3}'::integer[]))
        SubPlan 2
          ->  Index Only Scan using idx_pinstlog_pinstid on processinstancelog spl_1  (cost=0.29..8.31 rows=1 width=0)
                Index Cond: (processinstanceid = l.processinstanceid)

大约 40 万次提取。但由于我使用了 OR,理论上我可以分别运行这两个查询,然后将它们合并。那么第一个:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status in (2,3))

产生:

Delete on audittaskimpl l  (cost=8.62..2147.72 rows=1 width=12)
  ->  Hash Semi Join  (cost=8.62..2147.72 rows=1 width=12)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=14)
        ->  Hash  (cost=8.61..8.61 rows=1 width=14)
              ->  Index Scan using idx_pinstlog_status on processinstancelog spl  (cost=0.29..8.61 rows=1 width=14)
                    Index Cond: (status = ANY ('{2,3}'::integer[]))

第二个:

EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
                  FROM ProcessInstanceLog spl
                  WHERE l.processinstanceid = spl.processinstanceid);

产生:

Delete on audittaskimpl l  (cost=2666.49..5736.94 rows=1 width=12)
  ->  Hash Anti Join  (cost=2666.49..5736.94 rows=1 width=12)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=14)
        ->  Hash  (cost=1781.66..1781.66 rows=50866 width=14)
              ->  Seq Scan on processinstancelog spl  (cost=0.00..1781.66 rows=50866 width=14)

所以总共 cca 8k 磁盘提取。 两个表都包含大约 50 000 行。数据库是 PostgreSQL 9.3。示例是使用 DML (DELETE FROM ...),但使用 DQL (SELECT...) 会产生相同的结果。

这里的另一个例子是使用 UNION ALL 的 SELECT:

EXPLAIN SELECT l.id
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
                  FROM ProcessInstanceLog spl
                  WHERE l.processinstanceid = spl.processinstanceid)

UNION ALL

SELECT l.id
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
                              FROM ProcessInstanceLog spl
                              WHERE spl.status IN (2,3))

产生:

Append  (cost=2616.49..7975.41 rows=2 width=8)
  ->  Hash Anti Join  (cost=2616.49..5827.67 rows=1 width=8)
        Hash Cond: (l.processinstanceid = spl.processinstanceid)
        ->  Seq Scan on audittaskimpl l  (cost=0.00..2005.59 rows=50859 width=16)
        ->  Hash  (cost=1781.66..1781.66 rows=50866 width=8)
              ->  Seq Scan on processinstancelog spl  (cost=0.00..1781.66 rows=50866 width=8)
  ->  Hash Semi Join  (cost=8.62..2147.72 rows=1 width=8)
        Hash Cond: (l_1.processinstanceid = spl_1.processinstanceid)
        ->  Seq Scan on audittaskimpl l_1  (cost=0.00..2005.59 rows=50859 width=16)
        ->  Hash  (cost=8.61..8.61 rows=1 width=8)
              ->  Index Scan using idx_pinstlog_status on processinstancelog spl_1  (cost=0.29..8.61 rows=1 width=8)
                    Index Cond: (status = ANY ('{2,3}'::integer[]))

所以总共 cca 8k 提取。为什么使用 OR 的 SQL 查询比 2 个单独的查询慢得多?可能是优化器问题?

感谢回复!

最佳答案

既然一个就够了,为什么还要浪费时间在两个查询上呢?

DELETE
  FROM AuditTaskImpl l
  WHERE not exists (
     SELECT null FROM ProcessInstanceLog spl
       WHERE spl.processInstanceId = l.processInstanceId
         and spl.status not IN (2,3))

关于具有 OR 的 SQL 查询比 2 个单独的查询慢得多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49711926/

相关文章:

database - 查看权限通常如何在关系数据库中实现?

MySQL 3 向 1..n 表关系

java - 如何在 Spring Data Jpa 中编写接受四种参数组合的选择查询

c# - 如何在 C# 中为 IN 子句编写 sql 查询文本

mysql - sql中每条记录有多个外键?

postgresql - "PostgreSQL Streaming Replication"适合这个用例吗?

Java PostgreSQL arrayList<Object[]> 插入代码-应用于任意数量的列

mysql - 在两个表之间的连接中查找最早的记录

postgresql - postgresql中带有临时表的存储函数

sql - 低选择性色谱柱的索引和替代品