我正在构建一个对评级数据执行一些过滤的查询。
假设我有一个名为 ratings
的简单表,如下所示,存储来自在线评级工具的数据:
+----------------+----------------+--------+ | page_title | timestamp | rating | +----------------+----------------+--------+ | Abc | 20110417092134 | 1 | | Abc | 20110418110831 | 2 | | Def | 20110417092205 | 3 | +----------------+----------------+--------+
I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM
(
SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title
AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
)
AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY a1.page_title HAVING COUNT(*) > 20
顶级查询查找在 2011-04-24 结束的一周内评分超过 20 的页面,子查询应该从最近的 10 个评分中检索值在 [1,2] 之间的评分数量顶级查询中的每篇文章。
MySQL提示子子查询的WHERE子句中的a1.page_title是一个未知列,我怀疑这是因为a1在二级查询中没有定义为别名,而只在顶级查询中,但是我不知道如何解决这个问题。
(已编辑)
我在上面添加了关于跨级别引用另一个绝对正常的查询的怀疑的解释,请注意,这里 a1 未在子查询中定义,但它在直接父查询中:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17' AND rating >=1
AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11'
GROUP BY a1.page_title HAVING COUNT(*) > 20
最佳答案
我认为您可能会考虑加入两个在线 View ,这可能会使事情变得更容易。
SELECT *
FROM (SELECT COUNT(*),
a2.page_title
FROM ratings a2
WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17'
AND rating >= 1
AND rating <= 2
GROUP BY a2.page_title) current
JOIN
(SELECT a1.page_title,
COUNT(*) AS rvol,
AVG(a1.rating) AS theavg
FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17'
AND DATE(a1.a_timestamp) >= '2011-04-11'
GROUP BY a1.page_title
HAVING COUNT(*) > 20) morethan20
ON current .page_title = morethan20.page_title
关于MySQL子查询引用父查询中的字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5781349/