我有两张 table 。第一个表(users)是一个简单的“id, username”,有 100,00 行,第二个表(stats)是“id, date, stat”,有 20M 行。
我试图找出哪个用户名在统计数据中上升最多,这是我的查询。在功能强大的机器上,此查询需要几分钟才能完成。有没有更好的写法来加快速度?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
我尝试过的另一种方法,但它似乎不是最佳的
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
最佳答案
简介
假设我们像这样重写句子:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
我们确保:
users
表在字段id
上有索引:stats
在复合字段date
、id
上有索引:在 stats ( date, id ) 上创建索引 stats_idx_d_i;
然后
数据库优化器可以使用索引来选择日期限制集 ('RSD'),即与筛选日期匹配的行。这很快。
但是
您正在按计算字段排序:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
它们不可能对此类进行优化,因为您应该逐一计算'RSD'(受限数据集)上的所有结果。
结论
问题是,它们在您的'RSD'上有多少行?如果只有几百行,您查询可能会运行得很快,否则,您的查询将会很慢。
无论如何,您应该确保查询的第一步(不排序)是通过索引进行的,并且没有全扫描。使用Explain
命令以确保。
关于mysql - 如何加快多重内连接查询的速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37827727/