Mysql查询: returning rows where all preceding rows in group match a condition

给定一个如下表，其中包含名称、任务、任务优先级和任务状态的列表:

mysql> select * from test;
+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  1 | bob  | start  |        1 | done   |
|  2 | bob  | work   |        2 | NULL   |
|  3 | bob  | finish |        3 | NULL   |
|  4 | jim  | start  |        1 | done   |
|  5 | jim  | work   |        2 | done   |
|  6 | jim  | finish |        3 | NULL   |
|  7 | mike | start  |        1 | done   |
|  8 | mike | work   |        2 | failed |
|  9 | mike | finish |        3 | NULL   |
| 10 | joan | start  |        1 | NULL   |
| 11 | joan | work   |        2 | NULL   |
| 12 | joan | finish |        3 | NULL   |
+----+------+--------+----------+--------+
12 rows in set (0.00 sec)

我想构建一个查询，仅返回每个名称要运行的下一个任务。具体来说，我想返回包含最低数字优先级的行，每个人的状态为 NULL。

但有一个问题:如果所有前面的任务的状态均为“完成”，我只想返回该行。

鉴于上面的表和查询逻辑，该查询的最终结果应如下所示:

+----+------+--------+----------+--------+
| id | name | task   | priority | status |
+----+------+--------+----------+--------+
|  2 | bob  | work   |        2 | NULL   |
|  6 | jim  | finish |        3 | NULL   |
+----+------+--------+----------+--------+

最初，这是通过一大堆子查询和派生表来完成的，效率极低且缓慢。我通过使用几个临时表来获得我想要的结果，从而大大加快了速度。

在现实世界中，这将在一个包含大约 200k 记录的表上运行，并且多个服务器每分钟将执行此查询多次。我当前的解决方案需要大约 2 秒才能运行，这根本行不通。

这是获取示例数据的 DML/DDL:

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `task` varchar(20) DEFAULT NULL,
  `priority` int(11) DEFAULT NULL,
  `status` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO `test` VALUES 
(1,'bob','start',1,'done'),
(2,'bob','work',2,NULL),
(3,'bob','finish',3,NULL),
(4,'jim','start',1,'done'),
(5,'jim','work',2,'done'),
(6,'jim','finish',3,NULL),
(7,'mike','start',1,'done'),
(8,'mike','work',2,'failed'),
(9,'mike','finish',3,NULL),
(10,'joan','start',1,NULL),
(11,'joan','work',2,NULL),
(12,'joan','finish',3,NULL);

这是我目前正在做的事情，以获得所需的结果(有效，但速度很慢):

drop table if exists tmp1;
create temporary table tmp1 as 
select 
    name, 
    min(priority) as priority 
from test t 
where status is null 
group by name;
create index idx_pri on tmp1(priority);
create index idx_name on tmp1(name);

drop table if exists tmp2;
create temporary table tmp2 as 
select tmp.* 
from test t 
join tmp1 tmp 
    on t.name = tmp.name 
    and t.priority < tmp.priority 
group by name having sum(
    case when status = 'done' 
    then 0 
    else 1 
    end
) = 0;
create index idx_pri on tmp2(priority);
create index idx_name on tmp2(name);


select 
    t.*
from test t 
join tmp2 t2
    on t.name = t2.name
    and t.priority = t2.priority;

我在 SQL Fiddle 中也有 DDL/DML，但我无法将我的解决方案放在那里，因为从技术上讲，这些临时表的创建是 DDL，并且它不允许在查询框中使用 DDL。 http://sqlfiddle.com/#!2/2d9e2/1

请帮助我想出更好的方法来做到这一点。我愿意修改架构或逻辑，以适应现成的解决方案，只要所述解决方案有效。

最佳答案

您可以直接将逻辑转换为如下查询:

select t.*
from test t 
where t.status is null and
      not exists (select 1
                  from test t2
                  where t2.name = t.name and
                        t2.id < t.id and
                        (t2.status <> 'done' or
                         t2.status is null
                        )
                 ) and
      exists (select 1
              from test t2
              where t2.name = t.name and
                    t2.id < t.id and
                    t2.status = 'done'
             );

为了提高性能，请在 test(name, id, status) 上创建索引。

Here是一个 SQL Fiddle。

关于Mysql查询: returning rows where all preceding rows in group match a condition，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24223245/

Mysql查询: returning rows where all preceding rows in group match a condition

上一篇：php - Mysql:在单个查询中从数据库和php数组中获取值

下一篇：python - 日期时间元组到秒元组