查询与子查询中聚合顺序之间的 MySQL 差异

标签 mysql aggregate

我有 2 个关于订购数据的查询:

查询 1:

SELECT  * FROM    (
    SELECT      idprovince, COUNT(*) total
    FROM        cities
    JOIN        persons USE INDEX (index_5) USING (idcity)
    WHERE       is_tutor = 'Y'
    GROUP BY    idprovince
) A
ORDER BY total DESC

查询 2:

SELECT      idprovince, COUNT(*) total
FROM        cities
JOIN        persons USE INDEX (index_5) USING (idcity)
WHERE       is_tutor = 'Y'
GROUP BY    idprovince
ORDER BY    total DESC

查询 1 返回数据的速度比查询 2 快得多,我的问题是使用查询排序和在子查询中使用排序有什么区别?

注意:我的数据库版本是 mysql-5.0.96-x64。人口数据约为 40 万,城市数据约为 500。

更新: mysql 解释命令的输出:

查询 1:

mysql> EXPLAIN
    -> SELECT  *
    -> FROM    (
    ->     SELECT      idprovince, COUNT(*) total
    ->     FROM        cities
    ->     JOIN        persons USE INDEX (index_5) USING (idcity)
    ->     WHERE       is_tutor = 'Y'
    ->     GROUP BY    idprovince
    -> ) A
    -> ORDER BY total DESC
    -> ;
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                                | rows   | Extra                                        |
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL                               |     34 | Using filesort                               |
|  2 | DERIVED     | persons    | ref    | index_5       | index_5 | 2       |                                    | 163316 | Using where; Using temporary; Using filesort |
|  2 | DERIVED     | cities     | eq_ref | PRIMARY       | PRIMARY | 4       | _myproject_lesaja_2.persons.idcity |      1 |                                              |
+----+-------------+------------+--------+---------------+---------+---------+------------------------------------+--------+----------------------------------------------+
3 rows in set (1.22 sec)

查询 2:

mysql> EXPLAIN
    ->     SELECT      idprovince, COUNT(*) total
    ->     FROM        cities
    ->     JOIN        persons USE INDEX (index_5) USING (idcity)
    ->     WHERE       is_tutor = 'Y'
    ->     GROUP BY    idprovince
    ->     ORDER BY    total DESC;
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
| id | select_type | table   | type  | possible_keys | key         | key_len | ref   | rows   | Extra                                        |
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
|  1 | SIMPLE      | cities  | index | PRIMARY       | FK_cities_1 | 4       | NULL  |      4 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | persons | ref   | index_5       | index_5     | 2       | const | 163316 | Using where                                  |
+----+-------------+---------+-------+---------------+-------------+---------+-------+--------+----------------------------------------------+
2 rows in set (0.00 sec)

结果查询 1:

mysql> SELECT  *
    -> FROM    (
    ->     SELECT      idprovince, COUNT(*) total
    ->     FROM        cities
    ->     JOIN        persons USE INDEX (index_5) USING (idcity)
    ->     WHERE       is_tutor = 'Y'
    ->     GROUP BY    idprovince
    -> ) A
    -> ORDER BY total DESC
    -> ;
+------------+-------+
| idprovince | total |
+------------+-------+
|         35 | 15797 |
......................
......................
......................

|         76 |  2091 |
|         65 |  2018 |
+------------+-------+
34 rows in set (0.78 sec)

结果查询 2:

mysql> SELECT      idprovince, COUNT(*) total
    -> FROM        cities
    -> JOIN        persons USE INDEX (index_5) USING (idcity)
    -> WHERE       is_tutor = 'Y'
    -> GROUP BY    idprovince
    -> ORDER BY    total DESC;
+------------+-------+
| idprovince | total |
+------------+-------+
|         35 | 15797 |
|         33 | 14413 |
|         12 | 13683 |
......................
......................
......................
|         34 |  2135 |
|         76 |  2091 |
|         65 |  2018 |
+------------+-------+
34 rows in set (8 min 25.80 sec)

显示配置文件输出: 查询 1:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000240 |
| Opening tables       | 0.000043 |
| System lock          | 0.000004 |
| Table lock           | 0.000392 |
| optimizing           | 0.000084 |
| statistics           | 0.004455 |
| preparing            | 0.000026 |
| Creating tmp table   | 0.000221 |
| executing            | 0.000002 |
| Copying to tmp table | 0.913722 |
| Sorting result       | 0.000065 |
| Sending data         | 0.000020 |
| removing tmp table   | 0.000145 |
| Sending data         | 0.000008 |
| init                 | 0.000017 |
| optimizing           | 0.000002 |
| statistics           | 0.000038 |
| preparing            | 0.000007 |
| executing            | 0.000001 |
| Sorting result       | 0.000012 |
| Sending data         | 0.000337 |
| end                  | 0.000002 |
| end                  | 0.000002 |
| query end            | 0.000002 |
| freeing items        | 0.000020 |
| closing tables       | 0.000001 |
| removing tmp table   | 0.000074 |
| closing tables       | 0.000003 |
| logging slow query   | 0.000001 |
| cleaning up          | 0.000003 |
+----------------------+----------+

查询 2:

+----------------------+------------+
| Status               |   Duration |
+----------------------+------------+
| starting             |   0.000195 |
| Opening tables       |   0.000029 |
| System lock          |   0.000004 |
| Table lock           |   0.000011 |
| init                 |   0.000078 |
| optimizing           |   0.000021 |
| statistics           |   0.003399 |
| preparing            |   0.000025 |
| Creating tmp table   |   0.000259 |
| Sorting for group    |   0.000007 |
| executing            |   0.000001 |
| Copying to tmp table | 506.711308 |
| Sorting result       |   0.000049 |
| Sending data         |   0.000298 |
| end                  |   0.000004 |
| removing tmp table   |   0.000150 |
| end                  |   0.000002 |
| end                  |   0.000002 |
| query end            |   0.000002 |
| freeing items        |   0.000013 |
| closing tables       |   0.000003 |
| logging slow query   |   0.000001 |
| logging slow query   |   0.000042 |
| cleaning up          |   0.000003 |
+----------------------+------------+

创建语句

CREATE TABLE persons (
    idperson INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    is_tutor ENUM('Y','N') NULL DEFAULT 'N',
    name VARCHAR(64) NOT NULL,
    ...
    idcity INT(10) UNSIGNED NOT NULL,
    ...
    PRIMARY KEY (idperson),
    UNIQUE INDEX index_3 (name) USING BTREE,
    UNIQUE INDEX index_4 (email) USING BTREE,
    INDEX index_5 (is_tutor),
    ...
    CONSTRAINT FK_persons_1 FOREIGN KEY (idcity) REFERENCES cities (idcity)
)
ENGINE=InnoDB
AUTO_INCREMENT=414738;

CREATE TABLE cities (
    idcity INT(10) UNSIGNED NOT NULL,
    idprovince INT(10) UNSIGNED NOT NULL,
    city VARCHAR(64) NOT NULL,
    PRIMARY KEY (idcity),
    UNIQUE INDEX index_3 (city),
    INDEX FK_cities_1 (idprovince),
    CONSTRAINT FK_cities_1 FOREIGN KEY (idprovince) REFERENCES provinces (idprovince)
)
ENGINE=InnoDB;

最佳答案

诚然,我不是这方面的专家,但正在查看 MySQL DocumentationORDER BY Optimization 上,您在第 2 条查询中不仅有一个而且有两个未优化的 ORDER BY 使用:

SELECT      idprovince, COUNT(*) total
FROM        cities
JOIN        persons USE INDEX (index_5) USING (idcity)
WHERE       is_tutor = 'Y'
GROUP BY    idprovince
ORDER BY    total DESC

第一个:

用于获取行的键

WHERE is_tutor = 'Y'

ORDER BY 中使用的不同:

ORDER BY total DESC

第二个:

您有不同的 ORDER BYGROUP BY 表达式。

GROUP BY    idprovince
ORDER BY    total DESC

在上述两种情况下,MySQL 不会使用索引来解析 ORDER BY,尽管它可以使用索引来搜索与 WHERE 子句匹配的行。

另一方面,您的查询 1 遵循优化形式的 ORDER BY,尽管 ORDER BY 在子查询之外使用。

这可能是查询 2 比查询 1 慢得多的原因。

此外,在这两种情况下,Index (idCity) 在解析 ORDER BY 时几乎毫无用处,因为索引使用 idCityORDER BY 子句使用 Total,这是一个聚合结果。

参见讨论 here还有。

关于查询与子查询中聚合顺序之间的 MySQL 差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19737736/

相关文章:

php - 多个表的左连接

MySql - 子表中列的 SUM() (1 :n) in a Sub-query/Outer Join using LIMIT

r - 如何在保持日期间隔和其他值的同时聚合 5 分钟到 30 分钟的数据

数组中的 Mongodb 聚合匹配值

c# - DDD 元素 : Aggregates in c#

mysql - 如何获取使用 Mysql 排序的不同格式化日期列表

mysql - EXPLAIN 显示 "DEPENDENT SUBQUERY"并且从 MariaDB 迁移到 MySQL 后非常慢

java - 我如何允许我的 Java applet 使用 MySQL?

sql - 滑动1小时周期聚合查询

elasticsearch - 基于唯一键的术语汇总