我有一个表,现在有大约 100 万行。以下查询大约需要 5 秒才能完成。您建议如何优化查询速度?
# Thread_id: 14 Schema: defrop_defrop QC_hit: No
# Query_time: 5.573048 Lock_time: 0.591625 Rows_sent: 0 Rows_examined: 1006391
# Rows_affected: 1
UPDATE `backlinks` as a
INNER JOIN(SELECT b.`id` as bid
FROM `backlinks` b
WHERE b.`googlebot_id` IS NULL AND b.`used_time` IS NULL AND
b.`campaign_id` IN (SELECT `id` FROM `campaigns` WHERE `status`=true) GROUP BY b.`campaign_id` ORDER BY RAND() limit 1
) as c
ON (a.id = c.bid)
SET a.`crawler_id` = '10.0.0.13', a.`used_time`=NOW();
campaign_id、googlebot_id 是外键、索引器。 used_time和crawler_id是索引器 phpmyadmin 表的屏幕截图
最佳答案
这是经过格式化的查询,以便我可以更好地阅读它:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
bl2.campaign_id IN (SELECT c.id FROM campaigns c WHERE status = true)
GROUP BY b.campaign_id
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
首先,子查询中的GROUP BY
是不需要的。我会将 IN
替换为 EXISTS
:
UPDATE backlinks bl JOIN
(SELECT bl2.id as bid
FROM backlinks bl2
WHERE bl2.googlebot_id IS NULL AND
bl2.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl2.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1
) bl2
ON bl.id = bl2.bid
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW();
这会有一点帮助,但可能不会太大。我的猜测是,性能问题在于外部排序的大小(或者等效地,查询中 GROUP BY
所需的数据大小)。
您还可以完全摆脱子查询:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true)
ORDER BY RAND()
LIMIT 1;
这影响很小,但它稍微清理了逻辑。
我的猜测是,WHERE
条件的选择性不是很强,因此优化它们不会有太大帮助。
此时,问题在于ORDER BY RAND()
。如果您知道子查询返回了多少行,则可以使用 RAND()
进行预过滤。例如,假设至少返回 1,000 行。然后:
UPDATE backlinks bl
SET bl.crawler_id = '10.0.0.13',
bl.used_time = NOW()
WHERE bl.googlebot_id IS NULL AND
bl.used_time IS NULL AND
EXISTS (SELECT 1 FROM campaigns c WHERE bl.campaign_id = c.id AND c.status = true) AND
RAND() < 0.01 -- keep about 1/100
ORDER BY RAND()
LIMIT 1;
这显着加快了排序速度,因为它是数据的第 100 位。但是,如果没有足够的行符合条件,它可以过滤掉所有行。
关于mysql - 如何优化此内连接查询以减少查询时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55665748/