sql - SELECT 中多重聚合的优化

标签 sql optimization aggregation correlated-subquery

我在 Microsoft T-SQL Performance Tuning whitepaper 中读到就大型表的性能而言,相关子查询的成本可能很高:

...Compare this to the first solution that would scan the whole table and execute a correlated subquery for every row. The difference in performance is negligible on a small table. But on a large table it may amount to hours of processing time...

是否有一种通用方法可以将具有基于不同条件的多个聚合的查询作为相关子查询转换为使用 JOIN 而不是相关子查询的单个查询?

考虑一个例子:

准备架构:

CREATE TABLE Student (
    ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
    Name NVARCHAR(255) NOT NULL
);

CREATE TABLE Grade (
    ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
    StudentID INT NOT NULL FOREIGN KEY REFERENCES Student(ID),
    Score INT NOT NULL,
    CONSTRAINT CK_Grade_Score CHECK (Score >= 0 AND Score <= 100)
);

INSERT INTO Student (Name) VALUES ('Steven');
INSERT INTO Student (Name) VALUES ('Timmy');
INSERT INTO Student (Name) VALUES ('Maria');
 
INSERT INTO Grade (StudentID, Score) VALUES (1, 90);
INSERT INTO Grade (StudentID, Score) VALUES (1, 81);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);

INSERT INTO Grade (StudentID, Score) VALUES (2, 99);
INSERT INTO Grade (StudentID, Score) VALUES (2, 63);
INSERT INTO Grade (StudentID, Score) VALUES (2, 97);
INSERT INTO Grade (StudentID, Score) VALUES (2, 90);

INSERT INTO Grade (StudentID, Score) VALUES (3, 66);
INSERT INTO Grade (StudentID, Score) VALUES (3, 61);
INSERT INTO Grade (StudentID, Score) VALUES (3, 60);

相关查询:

SELECT Name,
    (SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score < 65) AS 'F',
    (SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 65 AND Score < 70) AS 'D',
    (SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 70 AND Score < 80) AS 'C',
    (SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 80 AND Score < 90) AS 'B',
    (SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 90 AND Score <= 100) AS 'A'
FROM Student

产生以下结果:

Name    F     D     C     B     A
-----------------------------------------
Steven  NULL  NULL  NULL  81    90
Timmy   63    NULL  NULL  NULL  95
Maria   60    66    NULL  NULL  NULL

我知道您可以与 COUNT() 一起使用的技术,其中您使用 JOIN 执行单个 SELECT,然后使用一个 CASE 语句,当主键在连接之间排列并且条件为 true 时,可以选择向计数器添加 1。我正在寻找一种类似的技术,可以应用于不同类型的聚合(而不是仅仅 COUNT)。

是否有有效的方法将此示例查询转换为使用 JOIN 而不是多个子查询?

最佳答案

也许我遗漏了一些东西,但是使用 CASE 的解决方案也适用于聚合:

SELECT st.name, 
       avg(CASE WHEN g.score < 65 THEN g.score ELSE NULL END) as F,
       avg(CASE WHEN g.score >= 65 AND g.score < 70 THEN g.score ELSE NULL END) as D,
       avg(CASE WHEN g.score >= 70 AND g.score < 80 THEN g.score ELSE NULL END) as C,
       avg(CASE WHEN g.score >= 80 AND g.score < 90 THEN g.score ELSE NULL END) as B,
       avg(CASE WHEN g.score >= 90 AND g.score <= 100 THEN g.score ELSE NULL END) as A
FROM Grade g
  JOIN Student st ON g.studentid = st.ID
GROUP BY st.name

关于sql - SELECT 中多重聚合的优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8002456/

相关文章:

python - 使用基本库优化python代码

python - Django 条件聚合

json - N1QL 聚合查询 Couchbase

r - 在 R 数据框中按组应用计算

sql - Sqoop从Oracle导入到HDFS:不再需要从套接字读取数据

php - Codeigniter 在查询中添加空格,这就是为什么它返回错误的 Total + mysql?

mysql - 如何使用多个连接语句并进行更新?

sql - 使用发票行项目汇总订单总数

java - 使用 JavaStreams 过滤复杂列表元素 - 代码优化

Mysql删除和优化很慢