我在 Microsoft T-SQL Performance Tuning whitepaper 中读到就大型表的性能而言,相关子查询的成本可能很高:
...Compare this to the first solution that would scan the whole table and execute a correlated subquery for every row. The difference in performance is negligible on a small table. But on a large table it may amount to hours of processing time...
是否有一种通用方法可以将具有基于不同条件的多个聚合的查询作为相关子查询转换为使用 JOIN
而不是相关子查询的单个查询?
考虑一个例子:
准备架构:
CREATE TABLE Student (
ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
Name NVARCHAR(255) NOT NULL
);
CREATE TABLE Grade (
ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
StudentID INT NOT NULL FOREIGN KEY REFERENCES Student(ID),
Score INT NOT NULL,
CONSTRAINT CK_Grade_Score CHECK (Score >= 0 AND Score <= 100)
);
INSERT INTO Student (Name) VALUES ('Steven');
INSERT INTO Student (Name) VALUES ('Timmy');
INSERT INTO Student (Name) VALUES ('Maria');
INSERT INTO Grade (StudentID, Score) VALUES (1, 90);
INSERT INTO Grade (StudentID, Score) VALUES (1, 81);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);
INSERT INTO Grade (StudentID, Score) VALUES (2, 99);
INSERT INTO Grade (StudentID, Score) VALUES (2, 63);
INSERT INTO Grade (StudentID, Score) VALUES (2, 97);
INSERT INTO Grade (StudentID, Score) VALUES (2, 90);
INSERT INTO Grade (StudentID, Score) VALUES (3, 66);
INSERT INTO Grade (StudentID, Score) VALUES (3, 61);
INSERT INTO Grade (StudentID, Score) VALUES (3, 60);
相关查询:
SELECT Name,
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score < 65) AS 'F',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 65 AND Score < 70) AS 'D',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 70 AND Score < 80) AS 'C',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 80 AND Score < 90) AS 'B',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 90 AND Score <= 100) AS 'A'
FROM Student
产生以下结果:
Name F D C B A
-----------------------------------------
Steven NULL NULL NULL 81 90
Timmy 63 NULL NULL NULL 95
Maria 60 66 NULL NULL NULL
我知道您可以与 COUNT()
一起使用的技术,其中您使用 JOIN
执行单个 SELECT
,然后使用一个 CASE
语句,当主键在连接之间排列并且条件为 true 时,可以选择向计数器添加 1。我正在寻找一种类似的技术,可以应用于不同类型的聚合(而不是仅仅 COUNT
)。
是否有有效的方法将此示例查询转换为使用 JOIN
而不是多个子查询?
最佳答案
也许我遗漏了一些东西,但是使用 CASE 的解决方案也适用于聚合:
SELECT st.name,
avg(CASE WHEN g.score < 65 THEN g.score ELSE NULL END) as F,
avg(CASE WHEN g.score >= 65 AND g.score < 70 THEN g.score ELSE NULL END) as D,
avg(CASE WHEN g.score >= 70 AND g.score < 80 THEN g.score ELSE NULL END) as C,
avg(CASE WHEN g.score >= 80 AND g.score < 90 THEN g.score ELSE NULL END) as B,
avg(CASE WHEN g.score >= 90 AND g.score <= 100 THEN g.score ELSE NULL END) as A
FROM Grade g
JOIN Student st ON g.studentid = st.ID
GROUP BY st.name
关于sql - SELECT 中多重聚合的优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8002456/