考虑有一个具有以下模式的作业运行历史记录表:
job_runs
(
run_id integer not null, -- identifier of the run
job_id integer not null, -- identifier of the job
run_number integer not null, -- job run number, run numbers increment for each job
status text not null, -- status of the run (running, completed, killed, ...)
primary key (run_id)
-- ...
)
并且需要为每个作业使用
status != 'running'
进行最后10次运行(作业因job_id
而异)。为此,我编写了以下查询:SELECT
*
FROM
job_runs AS JR1
WHERE
JR1.run_number IN
(
SELECT
JR2.run_number
FROM
job_runs AS JR2
WHERE
JR2.job_id = JR1.job_id
AND
JR2.status != 'running'
ORDER BY
JR2.run_number
DESC
LIMIT
10
)
它可以满足我的需要,但是即使
job_id
表的run_num
和job_runs
字段上都有多字段索引,查询也很慢,因为它会扫描job_runs表,并且每行扫描一次子查询。索引可以帮助子查询每次快速运行,但是嵌套查询查询整个表的事实会降低性能。那么如何调整查询的性能呢?一些想法:
作业数量(不同的
job_id
)很小,如果SQLite中存在FOR循环,则很容易遍历所有不同的job_id
并运行子查询传递作业ID而不是
JR1.job_id
,然后将所有结果都合并。重要:
请不要建议在我的应用程序的源代码中运行循环。我需要纯SQL解决方案。
最佳答案
您可以通过为其创建covering index来进一步提高子查询的性能:
CREATE INDEX xxx ON job_runs(job_id, run_number, status);
但是最大的性能问题是,尽管只需要对每个唯一的作业ID运行子查询,但仍对每一行执行子查询。
因此,首先,仅获取唯一的作业ID:
SELECT DISTINCT job_id
FROM job_runs
然后,对于这些ID中的每一个,确定第十大运行编号:
SELECT job_id,
(SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
但是,如果一个作业的运行编号少于十个,则子查询返回NULL,因此让我们将其替换为一个较小的编号,以便下面的比较(
run_number >= first_run_number
)起作用:SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
因此,现在我们为每个工作进行了第一个有趣的运行。
最后,将这些值重新添加到原始表中:
SELECT job_runs.*
FROM job_runs
JOIN (SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
) AS firsts
ON job_runs.job_id = firsts.job_id
AND job_runs.run_number >= firsts.first_run_number;
关于performance - 在SQLite中获取每个组的最后10行的有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33483924/