mysql - 慢速依赖子查询的问题

标签 mysql sql optimization

我有一个正在 MySQL 中运行的查询。如您所见,查询的每个部分都在索引字段上。然而,查询需要永远(几十分钟,比我愿意等待的时间长)。 Connect 表由两个整数和两个索引组成(一个字段一,字段二,另一个字段二,字段一)。源和目标是具有单个索引 int 字段的表。鉴于所有索引,我预计此查询将在几秒钟内完成。关于 1 的任何建议:为什么要花这么长时间,以及 2:如何让它更快?

谢谢!

mysql> explain 
SELECT DISTINCT geneConnect.geneSymbolID FROM SNPEffectGeneConnector AS geneConnect 
  JOIN IndelSNPEffectConnector AS snpEConnect ON geneConnect.snpEffectID = snpEConnect.snpEffectID 
  JOIN InDels2 AS source ON source.id = snpEConnect.indelID 
  WHERE geneConnect.geneSymbolID NOT IN (
    SELECT geneConnect.geneSymbolID FROM SNPEffectGeneConnector AS geneConnect 
    JOIN IndelSNPEffectConnector AS snpEConnect ON geneConnect.snpEffectID = snpEConnect.snpEffectID 
    JOIN InDels3 AS target ON target.id = snpEConnect.indelID);
+----+--------------------+-------------+-------+-------------------+----------+---------+-----------------------------------------------------------------------+------+--------------------------------+
| id | select_type        | table       | type  | possible_keys     | key      | key_len | ref                                                                   | rows | Extra                          |
+----+--------------------+-------------+-------+-------------------+----------+---------+-----------------------------------------------------------------------+------+--------------------------------+
|  1 | PRIMARY            | source      | index | id                | id       | 4       | NULL                                                                  | 5771 | Using index; Using temporary   |
|  1 | PRIMARY            | snpEConnect | ref   | snpEList          | snpEList | 4       | treattablebrowser.source.id                                           |    2 | Using index                    |
|  1 | PRIMARY            | geneConnect | ref   | snpEList          | snpEList | 4       | treattablebrowser.snpEConnect.snpEffectID                             |    1 | Using where; Using index       |
|  2 | DEPENDENT SUBQUERY | geneConnect | ref   | snpEList,geneList | geneList | 4       | func                                                                  |    1 | Using index                    |
|  2 | DEPENDENT SUBQUERY | target      | index | id                | id       | 4       | NULL                                                                  | 6297 | Using index; Using join buffer |
|  2 | DEPENDENT SUBQUERY | snpEConnect | ref   | snpEList          | snpEList | 8       | treattablebrowser.target.id,treattablebrowser.geneConnect.snpEffectID |    1 | Using index                    |
+----+--------------------+-------------+-------+-------------------+----------+---------+-----------------------------------------------------------------------+------+--------------------------------+

一组 6 行(0.01 秒)

最佳答案

我想这主要是出于学术兴趣,现在 Greg 已经自己解决了。很高兴知道我对这些事情的直觉可以完全打破。我仍然可以用三种方式重写它。我认为第一个可以简化,但正如 Greg 指出的那样,简化不起作用。不确定这是否会比原来的更快,尽管它确实在我对 sql server 的测试中产生了不同的查询计划。

Select Distinct
    g1.geneSymbolID 
From
    SNPEffectGeneConnector AS g1 
        Inner Join
    IndelSNPEffectConnector AS s1 
        ON g1.snpEffectID = s1.snpEffectID 
        Inner Join
    InDels2 AS i2 ON i2.id = s1.indelID 
Where Not Exists (
    Select 'x'
        From
            SNPEffectGeneConnector As g2
                Inner Join
            IndelSNPEffectConnector AS s2 
                On g2.snpEffectID = s2.snpEffectID 
                Inner Join
            InDels3 As i3
                On i3.id = s2.indelID
        Where
            g2.geneSymbolID = g1.geneSymbolID
    );

我不是 100% 确定第二种方法,但它适用于我非常少量的测试数据。如果有效的话,它的查询计划要短得多(不一定更快,但这是一个很好的迹象):

Select
  geneSymbolID
From
  SNPEffectGeneConnector As g
    Inner Join 
  IndelSNPEffectConnector As s
    ON g.snpEffectID = s.snpEffectID 
    Left Outer Join
  InDels2 As i2 
    On i2.id = s.indelID 
    Left Outer Join
  InDels3 As i3
    On i3.id = s.indelID
Group By
    geneSymbolID
Having
    count(i2.id) > 0 And
    count(i3.id) = 0

另一种方法(为非描述性别名道歉):

Select
    g.geneSymbolID
From
    SNPEffectGeneConnector As g
        Inner Join
    IndelSNPEffectConnector AS s
        On g.snpEffectID = s.snpEffectID 
        Inner Join (
        Select 
            i2.id,
            0 As c
        From    
            InDels2 i2
        Union All
        Select
            i3.id,
            1
        From
            InDels3 i3
    ) as i23
    on s.indelID = i23.id
Group By
    g.geneSymbolID
Having
    max(i23.c) = 0;

http://sqlfiddle.com/#!2/944e1/10

关于mysql - 慢速依赖子查询的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13241119/

相关文章:

python - scipy.optimize 约束最小化 SLSQP - 无法 100% 匹配目标

php - 两个顺序 SELECT 表的最佳方法

mysql - 对我来说,分离公共(public)数据好吗?

mysql - 安全地取消 MySQL 导入

sql - 缩小数据库

java - jOOQ 中的交错/分阶段/条件 SQL 构建

java - 降低 Java 时间和空间复杂性的最佳方法是什么?

optimization - 在 MarkLogic 中优化 SPARQL 查询

mysql - 需要帮助在 sql 中连接表

mysql - Async_Http_Response_Handler 出错