mysql - 在具体化子查询上指定外部查询的条件

标签 mysql subquery view query-optimization

我得到了下面的查询,它引用了几个 View “goldedRunQueries”和“currentGoldMarkings”。我的问题似乎来自子查询中引用的 View - currentGoldMarkings。在执行时,MySQL首先具体化这个子查询,然后实现“queryCode”和“runId”的where子句,因此导致执行时间超过一个小时,因为 View 引用了拥有数百万行数据的表。我的问题是如何在子查询实现之前强制执行这两个 where 条件。

SELECT  goldedRunQueries.queryCode, goldedRunQueries.runId
    FROM  goldedRunQueries
    LEFT OUTER JOIN  
      ( SELECT  measuredRunId, queryCode, COUNT(resultId) as c
            FROM  currentGoldMarkings
            GROUP BY  measuredRunId, queryCode
      ) AS accuracy  ON accuracy.measuredRunId = goldedRunQueries.runId
      AND  accuracy.queryCode = goldedRunQueries.queryCode
    WHERE  goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
      and  goldedRunQueries.runid = 5000
    ORDER BY  goldedRunQueries.runId DESC, goldedRunQueries.queryCode;

这是两个 View 。这两者也都以独立模式使用,因此不可能将任何子句集成到其中。

CREATE VIEW currentGoldMarkings
AS
SELECT  result.resultId, result.runId AS measuredRunId, result.documentId,
        result.queryCode, result.queryValue AS measuredValue,
        gold.queryValue AS goldValue,
        CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
    FROM  results AS result
    INNER JOIN  gold  ON gold.documentId = result.documentId
      AND  gold.queryCode = result.queryCode
    WHERE  gold.isCurrent = 1 

CREATE VIEW goldedRunQueries
AS
SELECT  runId, queryCode
    FROM  runQueries
    WHERE  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  runs
            WHERE  (runId = runQueries.runId)
              AND  (isManual = 0)
      )
      AND  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  results
            WHERE  (runId = runQueries.runId)
              AND  (queryCode = runQueries.queryCode)
              AND  EXISTS 
              ( SELECT  1 AS Expr1
                    FROM  gold
                    WHERE  (documentId = results.documentId)
                      AND  (queryCode = results.queryCode)
              )
      ) 

注意:上面的查询仅反射(reflect)了我实际查询的一部分。还有 3 个其他左外连接,它们本质上与上面的子查询类似,这使得问题更加严重。

编辑:按照建议,这里是表的结构和一些示例数据

CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
 CONSTRAINT `PK_results` PRIMARY KEY 
(
`resultId`
)
);


insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)



CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY 
(
`goldId`
)
);



insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)



CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY 
(
`runId`,
`queryCode`
)
);


insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')



CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY 
(
`runId`
)
);


insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)

最佳答案

首先,让我们尝试通过索引来提高性能:

结果:INDEX(runId, queryCode) -- 以任一顺序 gold: INDEX(documentId, query_code, isCurrent) -- 按此顺序

之后,更新问题中的CREATE TABLEs并添加以下输出:

EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;

您运行的是哪个版本?您实际上拥有FROM ( SELECT ... ) JOIN ( SELECT ... )。在5.6之前,两个子查询都没有索引;在 5.6 中,索引是动态生成的。

遗憾的是,查询是这样构建的,因为您知道要使用哪一个:和 goldedRunQueries.runid = 5000

底线:添加索引;升级到5.6或5.7;如果这还不够,请重新考虑 VIEWs 的使用。

关于mysql - 在具体化子查询上指定外部查询的条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38342571/

相关文章:

mysql - 查询查找每年销售额超过平均销售额的销售员列表

MySQL 查询将值与前一行的值进行比较

mysql - SQL 子查询协助

使用子查询时的 MySQL 分组

Android - 使表面 View 大于父级

ruby-on-rails - 处理验证错误消息以查看滑轨4

mysql - 获取 MySQL 中最后 X 个条目的百分比

mysql - MySQL中使用触发器获取用户信息

mysql - 使用表中的子查询进行缓慢的 MySQL 查询

java - 无法获取 ListView 来刷新或删除项目