给定一个如下表,其中包含名称、任务、任务优先级和任务状态的列表:
mysql> select * from test;
+----+------+--------+----------+--------+
| id | name | task | priority | status |
+----+------+--------+----------+--------+
| 1 | bob | start | 1 | done |
| 2 | bob | work | 2 | NULL |
| 3 | bob | finish | 3 | NULL |
| 4 | jim | start | 1 | done |
| 5 | jim | work | 2 | done |
| 6 | jim | finish | 3 | NULL |
| 7 | mike | start | 1 | done |
| 8 | mike | work | 2 | failed |
| 9 | mike | finish | 3 | NULL |
| 10 | joan | start | 1 | NULL |
| 11 | joan | work | 2 | NULL |
| 12 | joan | finish | 3 | NULL |
+----+------+--------+----------+--------+
12 rows in set (0.00 sec)
我想构建一个查询,仅返回每个名称要运行的下一个任务。具体来说,我想返回包含最低数字优先级的行,每个人的状态为 NULL。
但有一个问题:如果所有前面的任务的状态均为“完成”,我只想返回该行。
鉴于上面的表和查询逻辑,该查询的最终结果应如下所示:
+----+------+--------+----------+--------+
| id | name | task | priority | status |
+----+------+--------+----------+--------+
| 2 | bob | work | 2 | NULL |
| 6 | jim | finish | 3 | NULL |
+----+------+--------+----------+--------+
最初,这是通过一大堆子查询和派生表来完成的,效率极低且缓慢。我通过使用几个临时表来获得我想要的结果,从而大大加快了速度。
在现实世界中,这将在一个包含大约 200k 记录的表上运行,并且多个服务器每分钟将执行此查询多次。我当前的解决方案需要大约 2 秒才能运行,这根本行不通。
这是获取示例数据的 DML/DDL:
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`task` varchar(20) DEFAULT NULL,
`priority` int(11) DEFAULT NULL,
`status` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `test` VALUES
(1,'bob','start',1,'done'),
(2,'bob','work',2,NULL),
(3,'bob','finish',3,NULL),
(4,'jim','start',1,'done'),
(5,'jim','work',2,'done'),
(6,'jim','finish',3,NULL),
(7,'mike','start',1,'done'),
(8,'mike','work',2,'failed'),
(9,'mike','finish',3,NULL),
(10,'joan','start',1,NULL),
(11,'joan','work',2,NULL),
(12,'joan','finish',3,NULL);
这是我目前正在做的事情,以获得所需的结果(有效,但速度很慢):
drop table if exists tmp1;
create temporary table tmp1 as
select
name,
min(priority) as priority
from test t
where status is null
group by name;
create index idx_pri on tmp1(priority);
create index idx_name on tmp1(name);
drop table if exists tmp2;
create temporary table tmp2 as
select tmp.*
from test t
join tmp1 tmp
on t.name = tmp.name
and t.priority < tmp.priority
group by name having sum(
case when status = 'done'
then 0
else 1
end
) = 0;
create index idx_pri on tmp2(priority);
create index idx_name on tmp2(name);
select
t.*
from test t
join tmp2 t2
on t.name = t2.name
and t.priority = t2.priority;
我在 SQL Fiddle 中也有 DDL/DML,但我无法将我的解决方案放在那里,因为从技术上讲,这些临时表的创建是 DDL,并且它不允许在查询框中使用 DDL。 http://sqlfiddle.com/#!2/2d9e2/1
请帮助我想出更好的方法来做到这一点。我愿意修改架构或逻辑,以适应现成的解决方案,只要所述解决方案有效。
最佳答案
您可以直接将逻辑转换为如下查询:
select t.*
from test t
where t.status is null and
not exists (select 1
from test t2
where t2.name = t.name and
t2.id < t.id and
(t2.status <> 'done' or
t2.status is null
)
) and
exists (select 1
from test t2
where t2.name = t.name and
t2.id < t.id and
t2.status = 'done'
);
为了提高性能,请在 test(name, id, status)
上创建索引。
Here是一个 SQL Fiddle。
关于Mysql查询: returning rows where all preceding rows in group match a condition,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24223245/