MySql 分组优化 - 避免 tmp 表和/或文件排序

标签 mysql group-by

我的查询速度很慢,没有分组依据时速度很快(0.1-0.3 秒),但使用(必需)分组依据时,持续时间约为 10-15 秒。

该查询连接两个表:events(近 5000 万行)和 events_locations(500 万行)。

查询:

SELECT  `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
        `el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
        `el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
        `e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1       
    AND el.other_id = '1'  
    AND time_stamp >= '2018-01-01'  
    AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;

表事件:

CREATE TABLE `events` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_type_id` int(11) NOT NULL,
  `entity_type_id` int(11) NOT NULL,
  `entity_id` varchar(64) NOT NULL,
  `alias` varchar(64) NOT NULL,
  `time_stamp` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `entity_id` (`entity_id`),
  KEY `event_type_idx` (`event_type_id`),
  KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

表 events_locations

CREATE TABLE `events_locations` (
  `event_id` bigint(20) NOT NULL,
  `latitude` double NOT NULL,
  `longitude` double NOT NULL,
  `some_id` bigint(20) DEFAULT NULL,
  `other_id` bigint(20) DEFAULT NULL,
  `time_span` bigint(20) DEFAULT NULL,
  `group_alias` varchar(64) NOT NULL,
  KEY `some_id_idx` (`some_id`),
  KEY `idx_events_group_alias` (`group_alias`),
  KEY `idx_event_id` (`event_id`),
  CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

解释:

+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type   | possible_keys                   | key     | key_len | ref                                       | rows     | Extra                                          |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1  | SIMPLE      | ea    | ALL    | 'idx_event_id'                  | NULL    | NULL    | NULL                                      | 5152834  | 'Using where; Using temporary; Using filesort' |
| 1  | SIMPLE      | e     | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8'     | 'name.ea.event_id'                        | 1        |                                                |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)

来自doc :

Temporary tables can be created under conditions such as these:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

DISTINCT combined with ORDER BY may require a temporary table.

If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.

我已经尝试过:

  • 通过 'el.some_idel.group_alias' 创建索引
  • 将 varchar 大小减小到 20
  • 增加sort_buffer_size和read_rnd_buffer_size的大小;

任何有关性能调整的建议将不胜感激!

最佳答案

在您的情况下,events 表具有 time_span 作为索引属性。因此,在连接两个表之前,首先从 events 表中选择特定日期范围内所需的记录以及所需的详细信息。然后使用表关系属性加入 event_location

检查您的 MySql Explain 关键字以检查您如何处理表记录。它会告诉您在选择所需记录之前扫描了多少行。

扫描的行数也涉及查询执行时间。使用我的以下逻辑来减少扫描的行数。

SELECT  
    `e`.`id` AS `event_id`,
    `e`.`time_stamp` AS `time_stamp`,
    `el`.`latitude` AS `latitude`,
    `el`.`longitude` AS `longitude`,
    `el`.`time_span` AS `extra`,
    `e`.`entity_id` AS `asset_name`, 
    `el`.`other_id` AS `geozone_id`,
    `el`.`group_alias` AS `group_alias`,
    `e`.`event_type_id` AS `event_type_id`,
    `e`.`entity_type_id` AS `entity_type_id`, 
    `el`.`some_id` as `some_id`
FROM 
    (select
        `id` AS `event_id`,
        `time_stamp` AS `time_stamp`,
        `entity_id` AS `asset_name`,
        `event_type_id` AS `event_type_id`,
        `entity_type_id` AS `entity_type_id`
    from
        `events` 
    WHERE
        time_stamp >= '2018-01-01'  
        AND time_stamp <= '2019-06-02'
    ) AS `e`    
    JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE     
    `el`.`other_id` = '1'      
GROUP BY 
    `e`.`event_type_id` , 
    `el`.`some_id` , 
    `el`.`group_alias`;

关于MySql 分组优化 - 避免 tmp 表和/或文件排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55023615/

相关文章:

mysql - 为用户帐户编写系统

javascript - 如何使用 URL 参数过滤表中的结果

r - 选择具有其他列的特定值的列的不同值

mysql - 按字段中的子字符串分组

php - Yii 单元测试 CDbException

mysql - 扫描非常大的数据库以查找恶意软件

python - groupby 内迭代排序

sql - 可以对未在 GROUP BY 中列出的列使用 HAVING 子句吗?

python - Pandas 对每个唯一服务器的结果进行计数

mysql - 如何使用 LIKE 提高查询中的索引使用率