我正在为一家课后教育公司开发一个数据库,该数据库跟踪学生记录,包括类(class)注册情况和学生信息。
我想做的是编写一个查询,它可以返回每所学校的注册学生人数,但也可以将贡献低于总数一定百分比的学校分组在一起(我想在中显示信息一张图表,但我们有很多学校只有 1 名学生来自那所学校,我不希望图表有 50 个条形图或饼图等)
所以代替
+-------------+------------+
| School Name | # Students |
+-------------+------------+
| School A | 52 |
| School B | 27 |
| School C | 15 |
| School D | 2 |
| School E | 1 |
| School F | 1 |
+-------------+------------+
我愿意
+---------------+------------+
| School Name | # Students |
+---------------+------------+
| School A | 52 |
| School B | 27 |
| School C | 15 |
| Other Schools | 4 |
+---------------+------------+
这是我现在拥有的查询的简化形式,它可以工作,但在使用多个 Select 查询相同信息时有点多余。有没有办法减少冗余?
SELECT @enrollmentSum := COUNT(StudentEnrollmentID) FROM StudentEnrollment;
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount >= .025 * @enrollmentSum
UNION ALL
SELECT "Other Schools" as SchoolName, SUM(Ecount) as ECount FROM
(
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount < .025 * @enrollmentSum
) t2
ORDER BY Ecount DESC
如果需要,相关表的基本结构:
学生
+-----------+-------------+----------+
| StudentID | StudentName | SchoolID |
+-----------+-------------+----------+
学校
+----------+------------+
| SchoolID | SchoolName |
+----------+------------+
学生注册
+---------------------+-----------+---------+
| StudentEnrollmentID | StudentID | ClassID |
+---------------------+-----------+---------+
谢谢你的帮助
最佳答案
提示:
count(x) 返回“x 不为空”的行数,因此 count(primary key) = count(*) 更易于阅读
“JOIN School ON Student.SchoolID = School.SchoolID”可以重写为“JOIN School USING (SchoolID)”,它更具可读性,并且如果您在结果集中只提供一列“SchoolID”使用类似“select *”的东西
现在,查询...
SELECT SchoolName, sum(cnt) ECount FROM
(SELECT IF(count(*)>=.025*@enrollmentSum, SchoolName, 'Others') AS SchoolName,
COUNT(*) cnt FROM Student
JOIN StudentEnrollment USING (StudentID)
JOIN School USING (SchoolID)
GROUP BY SchoolName) subq
GROUP BY SchoolName
ORDER BY Ecount DESC
对于低于阈值的所有学校,使用 IF() 会将学校名称替换为“其他”。请注意,这是在 GROUP BY 之后计算的,因此您实际上可以在选定的表达式中使用 count(*)。然后另一个 GROUP BY 将“其他”组合在一起。
编辑
这是一个很 hack,但它似乎做你想做的......
SET @total=0;
SELECT IF(cnt/@total>=0.2, SchoolName, 'Others') SN, sum(cnt) FROM (
SELECT SchoolName, cnt, @total:=@total+cnt FROM (
SELECT SchoolName, count(*) cnt FROM st GROUP BY SchoolName
) AS foo -- ORDER BY cnt DESC
) AS bar
GROUP BY SN ORDER BY sum(cnt) DESC;
这是变态。 MySQL 似乎总是首先具体化子查询“foo”并将结果存储在缓冲区中,然后再处理子查询“bar”。我认为“ORDER BY cnt DESC”是必要的,但如果它被注释掉,它似乎也有效。
运行子查询“foo”的副作用是将@total 设置为我们想要的值!
因此当运行外部子查询时,总数是可用的。
这种方法的问题是它可能会在没有警告的情况下停止工作,因为它是一种 hack。
关于mysql - 将 Count() 小于总数百分比的行分组在一起,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45708751/