mysql - 使用 DATETIME 列上的 ORDER BY 优化查询

标签 mysql sql

以下查询取决于 links 中包含约 4k 行的表comments 中的表和 ~40k 行表,目前大约需要 0.2 秒,考虑到没有那么多数据,这似乎相当慢。

SELECT
    t1.id, t1.url, t1.dateAdded
FROM links AS t1 LEFT JOIN
comments AS t2
ON (t1.id = t2.linkId)
WHERE
    COALESCE(t2.dateAdded, t1.dateAdded) <= "2020-03-22 20:04:45"
GROUP BY t1.id
ORDER BY
    COALESCE(
        (
            SELECT
                MAX(dateAdded)
            FROM comments
            WHERE
                linkId = t1.id AND
                dateAdded <= "2020-03-22 20:04:45"
        ),
        t1.dateAdded
    ) DESC,
    t1.id DESC
    LIMIT 10

t1.id是主键,t2.linkId是外键;我还尝试为 dateAdded 添加索引在两个表中但这似乎没有帮助。

为了找出瓶颈,我将查询简化为以下内容,并注意到在按 t1.dateAdded 排序时按 t1.id 排序时查询需要 0.12s只需要0.003秒

SELECT
    t1.id, t1.url, t1.dateAdded
FROM links AS t1 LEFT JOIN
comments AS t2
ON (t1.id = t2.linkId)
WHERE
    COALESCE(t2.dateAdded, t1.dateAdded) <= "2020-03-22 20:04:45"
GROUP BY t1.id
ORDER BY
    t1.id DESC -- here I tried both t1.dateAdded and t1.id

因此,我尝试使用 EXPLAIN 来找出差异。似乎唯一的区别在于 Extra字段 ORDER BY t1.id它是空的,并且为 ORDER BY t1.dateAdded它是Using temporary; Using filesort (请注意,我在 t1.dateAdded 上有索引)。不幸的是,我有点困于解释这意味着什么,以及一般来说如何优化原始查询。请注意idINT(10)dateAddedDATETIME .

一般来说,我想要实现的目标是对链接进行排序,以便最新链接或带有最新评论的链接位于顶部,其中“最新”意味着相对于提供的时间(即不考虑链接/之后添加评论)。

预先感谢您的任何帮助或提示

编辑:添加更多详细信息

EXPLAIN用于使用 t1.id 进行简化查询

+------+-------------+-------+-------+---------------+------------+---------+--------------+------+-------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref          | rows | Extra       |
+------+-------------+-------+-------+---------------+------------+---------+--------------+------+-------------+
|    1 | SIMPLE      | t1    | index | NULL          | PRIMARY    | 4       | NULL         | 3674 |             |
|    1 | SIMPLE      | t2    | ref   | fk_link_id    | fk_link_id | 5       | db1.t1.id    |    8 | Using where |
+------+-------------+-------+-------+---------------+------------+---------+--------------+------+-------------+

EXPLAIN用于使用 t1.dateAdded 进行简化查询

+------+-------------+-------+-------+---------------+------------+---------+--------------+------+---------------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref          | rows | Extra                           |
+------+-------------+-------+-------+---------------+------------+---------+--------------+------+---------------------------------+
|    1 | SIMPLE      | t1    | index | NULL          | PRIMARY    | 4       | NULL         | 3674 | Using temporary; Using filesort |
|    1 | SIMPLE      | t2    | ref   | fk_link_id    | fk_link_id | 5       | db1.t1.id    |    8 | Using where                     |
+------+-------------+-------+-------+---------------+------------+---------+--------------+------+---------------------------------+

有关links的信息表:

CREATE TABLE `links` (
  `id` int(10) UNSIGNED NOT NULL,
  `url` varchar(2083) CHARACTER SET utf8mb4 NOT NULL,
  `dateAdded` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `links`
  ADD PRIMARY KEY (`id`),
  ADD KEY `dateAdded` (`dateAdded`);

有关comments的信息表:

CREATE TABLE `comments` (
  `id` int(10) UNSIGNED NOT NULL,
  `linkId` int(10) UNSIGNED DEFAULT NULL,
  `userId` int(10) UNSIGNED NOT NULL,
  `content` varchar(2000) CHARACTER SET utf8mb4 NOT NULL,
  `dateAdded` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `comments`
  ADD PRIMARY KEY (`id`),
  ADD KEY `fk_link_id` (`linkId`);

ALTER TABLE `comments`
  ADD CONSTRAINT `fk_link_id` FOREIGN KEY (`linkId`) REFERENCES `links` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;

最佳答案

我可以首先指出查询中的 GROUP BY 是不必要的(尽管没有错误),因为您没有选择任何聚合。除此之外,我觉得只需使用 MAX() 作为分析函数,然后按其排序,您的生活就会变得更轻松。考虑这个版本:

WITH cte AS (
    SELECT t1.id, t1.url, t1.dateAdded,
        MAX(t2.dateAdded) OVER (PARTITION BY t1.id) maxDateAdded
    FROM links AS t1
    LEFT JOIN comments AS t2 ON t1.id = t2.linkId
    WHERE
        (t2.dateAdded IS NOT NULL AND t2.dateAdded <= '2020-03-22 20:04:45') OR
        (t2.dateAdded IS NULL AND t1.dateAdded <= '2020-03-22 20:04:45')
)

SELECT id, url, dateAdded
FROM cte
ORDER BY maxDateAdded DESC, t1.id DESC
LIMIT 10;

此答案假设您使用的是 MySQL 8+。只需付出更多的努力,就可以为早期版本的 MySQL 重写它。

对于优化上述查询,以下索引可能会有所帮助:

CREATE INDEX idx2 ON comments (linkID, dateAdded);
CREATE INDEX idx1 ON links (dateAdded, url, id);

如果使用这些索引,将加快连接速度,并且还允许对 MAX 的调用快速评估。请注意,我已将 WHERE 子句重写为可排序,避免调用 COALESCE

关于mysql - 使用 DATETIME 列上的 ORDER BY 优化查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60817361/

相关文章:

c# - Entity Framework SQL 查询执行

mysql - SQL 查询 - 列出分会成员的平均年龄

c# - 为什么我在此代码中收到 "You have an error in your SQL syntax"错误?

MySQL 慢查询: count articles, group by category,有什么办法优化?

sql - 复杂的SQL取决于每个对象的插入时间?

ios - 复合谓词不返回结果

mysql - 在 SQL 中出现错误 "No matching Unique or Primary key for the Column List"

mysql - 使用 VMware 客户端和主机设置 MySQL 复制

MYSQL - Group BY 与 MAX 问题

sql - SQL OVER() 子句 - 何时以及为何有用?