我有一个相当大的查询,它使用连接在多个表中收集大量信息。该数据库是来自城市公共(public)交通系统的GTFS公交信息。
我使用不同的 WHERE
子句运行相同的查询,所用时间可能从 200 毫秒到 200 秒不等。
如果您不需要解释,请直接向下滚动到问题。
数据库
表格是:
路线
trips
:使用route_id
连接到stop_times:连接到
tripsusing
trip_id`stops
:使用stop_id
连接到 stop_connections
:连接两个stop_id
routes
stop_times
我的目标是选择使用 2 个连接的旅程。这是我的查询在纸上的样子:
解释:
- 黑色信息是表格,每行一种表格类型(即第一行是
trips
表格)。 - 红色信息是我查询中的别名(
s
是stops
,st
是stop_times
,t
为trips
,a
为到达,d
为出发,1/2/3为行程索引) - 绿色信息是每张表的条件列表
基本上是:
[s1d ]
从给定的stop_id
开始[st1d]
获取从该站点出发的行程的出发时间[t1 ]
将这些行程限制在我们想要的route_id
集合[st1a]
获取站点到达时间[s1a ]
获取站点信息(站点名称)[cs1 ]
将此站点连接到步行距离内的所有其他站点
重复此操作 3 次,得到 3 趟(2 次连接),并将到达站过滤为我想要的。
查询
select
s1d.stop_id as s1d_id, s1d.stop_name as s1d_name, s1d.stop_lat as s1d_lat, s1d.stop_lon as s1d_lon,
st1d.departure_time as st1d_dep,
t1.trip_id as t1_id, t1.trip_headsign as t1_headsign, t1.route_id as t1_route, t1.direction_id as t1_dir,
st1a.departure_time as st1a_dep,
s1a.stop_id as s1a_id, s1a.stop_name as s1a_name, s1a.stop_lat as s1a_lat, s1a.stop_lon as s1a_lon,
cs1.from_stop_id, cs1.to_stop_id,
s2d.stop_id as s2d_id, s2d.stop_name as s2d_name, s2d.stop_lat as s2d_lat, s2d.stop_lon as s2d_lon,
st2d.departure_time as st2d_dep,
t2.trip_id as t2_id, t2.trip_headsign as t2_headsign, t2.route_id as t2_route, t2.direction_id as t2_dir,
st2a.departure_time as st2a_dep,
s2a.stop_id as s2a_id, s2a.stop_name as s2a_name, s2a.stop_lat as s2a_lat, s2a.stop_lon as s2a_lon,
cs2.from_stop_id, cs2.to_stop_id,
s3d.stop_id as s3d_id, s3d.stop_name as s3d_name, s3d.stop_lat as s3d_lat, s3d.stop_lon as s3d_lon,
st3d.departure_time as st3d_dep,
t3.trip_id as t3_id, t3.trip_headsign as t3_headsign, t3.route_id as t3_route, t3.direction_id as t3_dir,
st3a.departure_time as st3a_dep,
s3a.stop_id as s3a_id, s3a.stop_name as s3a_name, s3a.stop_lat as s3a_lat, s3a.stop_lon as s3a_lon
from stops s1d
left join stop_times st1d on st1d.stop_id = s1d.stop_id
and st1d.departure_time > '07:33:00' and st1d.departure_time < '08:33:00'
left join trips t1 on t1.trip_id = st1d.trip_id
and t1.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t1.route_id in ('11-0')
left join stop_times st1a on st1a.trip_id = t1.trip_id
and st1a.departure_time > st1d.departure_time
left join stops s1a on s1a.stop_id = st1a.stop_id
left join stop_connections cs1 on cs1.from_stop_id = st1a.stop_id
left join stops s2d on s2d.stop_id = cs1.to_stop_id
left join stop_times st2d on st2d.stop_id = s2d.stop_id
and st2d.departure_time > addtime(st1a.departure_time, '00:03:00')
and st2d.departure_time < addtime(st1a.departure_time, '01:03:00')
left join trips t2 on t2.trip_id = st2d.trip_id
and t2.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t2.route_id in ('3-0', 'NA-0', '4-0', '2-0')
left join stop_times st2a on st2a.trip_id = t2.trip_id and st2a.departure_time > st2d.departure_time
left join stops s2a on s2a.stop_id = st2a.stop_id
left join stop_connections cs2 on cs2.from_stop_id = st2a.stop_id
left join stops s3d on s3d.stop_id = cs2.to_stop_id
left join stop_times st3d on st3d.stop_id = s3d.stop_id
and st3d.departure_time > addtime(st2a.departure_time, '00:03:00')
and st3d.departure_time < addtime(st2a.departure_time, '01:03:00')
left join trips t3 on t3.trip_id = st3d.trip_id
and t3.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t3.route_id in ('36-0', '30-0', '97-0')
left join stop_times st3a on st3a.trip_id = t3.trip_id
and st3a.departure_time > st3d.departure_time
and st3a.stop_id in ('StopPoint:CLBO2',
'StopArea:CLBO',
'StopPoint:CLBO1',
'StopPoint:PLTI2',
'StopPoint:LCBU2',
'StopArea:LCBU',
'StopPoint:LCBU1',
'StopPoint:MHDI2',
'StopPoint:BILE2',
'StopArea:MHDI',
'StopPoint:MHDI1',
'StopPoint:MREZ2',
'StopArea:MRDI',
'StopPoint:MRDI1',
'StopArea:SORI',
'StopPoint:SORI1',
'StopArea:MREZ',
'StopPoint:MREZ1',
'StopPoint:SORI2',
'StopArea:BILE',
'StopPoint:BILE1',
'StopPoint:MRDI2',
'StopArea:PLTI',
'StopPoint:PLTI1',
'StopPoint:SEIL3',
'StopPoint:SEIL2',
'StopArea:SEIL',
'StopPoint:SEIL1')
left join stops s3a on s3a.stop_id = st3a.stop_id
where s1d.stop_id = 'StopPoint:DEMO1'
group by s1d_id, s3a_id
having s3a_id is not null
order by s1d_id asc, st1d_dep asc, st1a_dep asc, s1a_id asc, s2d_id asc, st2d_dep asc, st2a_dep asc, s2a_id asc, s3d_id asc, st3d_dep asc, st3a_dep asc, s3a_id asc
问题
我运行这个查询两次,唯一的区别是最后的 where 子句:
其中 s1d.stop_id = 'StopPoint:DEMO1'
:集合中有 13 行(2 分 58.81 秒)其中 s1d.stop_id = 'StopPoint:ECTE2'
:空集(0.25 秒)
这对我来说很奇怪。以下是对这两个查询的解释:
从 DEMO1 出发(13 个结果,缓慢)
使用 EXPLAIN SELECT...
:
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
| 1 | SIMPLE | s1d | ALL | NULL | NULL | NULL | NULL | 3411 | NULL |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s1d.stop_id | 163 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
使用 EXPLAIN EXTENDED...
:
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | 100.00 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | 100.00 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | 100.00 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | 100.00 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
从 ECTE2 出发(0 结果,快)
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
| 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
显然引擎以不同的方式处理这两个查询。现在为什么是另一个问题。
s1d
对象来自表stops
:
CREATE TABLE IF NOT EXISTS `stops` (
`stop_id` VARCHAR(100) NOT NULL,
`stop_code` VARCHAR(50) NULL DEFAULT NULL,
`stop_name` VARCHAR(255) NOT NULL,
`stop_desc` VARCHAR(255) NULL DEFAULT NULL,
`stop_lat` DECIMAL(10,6) NOT NULL,
`stop_lon` DECIMAL(10,6) NOT NULL,
`zone_id` VARCHAR(255) NULL DEFAULT NULL,
`stop_url` VARCHAR(255) NULL DEFAULT NULL,
`location_type` VARCHAR(2) NULL DEFAULT NULL,
`parent_station` VARCHAR(100) NOT NULL,
`stop_timezone` VARCHAR(50) NULL DEFAULT NULL,
`wheelchair_boarding` TINYINT(1) NULL DEFAULT NULL,
PRIMARY KEY (`stop_id`),
INDEX `zone_id` (`zone_id` ASC),
INDEX `stop_lat` (`stop_lat` ASC),
INDEX `stop_lon` (`stop_lon` ASC),
INDEX `location_type` (`location_type` ASC),
INDEX `parent_station` (`parent_station` ASC),
CONSTRAINT `stop_parent_station`
FOREIGN KEY (`parent_station`)
REFERENCES `stops` (`stop_id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
我不明白为什么在没有数据时引擎会正确使用索引和键;当有数据(13 行)时,引擎不使用索引和键,而是浏览 3000 行而不是 1 行。
我有什么办法可以强制引擎在特定表上使用键吗?
另外,为什么引擎会这样?
环境:
- 操作系统:Mac OS X 10.10
- SQL 客户端:mysql Ver 14.14 Distrib 5.6.17,适用于 osx10.6 (i386),使用 EditLine 包装器
- SQL 服务器:5.6.21 MySQL 社区服务器 (GPL)
- 硬件:MacBook Air、Intel Core i7、8GB RAM、256GB SSD(应该很快)
表格大小:
+-------------------+------------+
| table_name | TABLE_ROWS |
+-------------------+------------+
| agency | 0 |
| calendar | 28 |
| calendar_dates | 1005 |
| fare_attributes | 0 |
| fare_rules | 0 |
| feed_info | 0 |
| frequencies | 0 |
| route_connections | 20919 |
| routes | 60 |
| shapes | 0 |
| stop_connections | 11617 |
| stop_times | 768682 |
| stops | 3411 |
| stops_routes | 16652 |
| transfers | 0 |
| trips | 31913 |
+-------------------+------------+
每个连接表后的行数:
+---------+-------------+------------+
| Table | DEMO1 | ECTE2 |
+---------+-------------+------------+
| s1d | 1 | 1 |
| st1d | 16 | 18 |
| t1 | 16 | 18 |
| st1a | 271 | 117 |
| s1a | 271 | 117 |
| cs1 | 1286 | 495 |
| s1d | 1286 | 495 |
| st2d | 32958 | 5973 |
| t2 | 32958 | 5973 |
| st2a | 65891 | 5973 |
| s2a | 65891 | 5973 |
| cs2 | 206455 | 5973 |
| s3d | 206455 | 5973 |
| st3d | 4284871 | 5973 |
| t3 | 4284871 | 5973 |
| st3a | 4351249 | 5973 |
| s3a | 4351249 | 5973 |
| +having | 13 | 0 |
+---------+-------------+------------+
最佳答案
我想到了两个想法:
1) 将一些索引切换为BTEE
索引。默认值是 HASH
,它适用于相等/不相等比较,而不是 IN(...)
。参见 here
2) 查看优化器对您的查询做了什么。做一个
EXPLAIN EXTENDED SELECT ...
关于这两个查询。这会给你一个包含查询优化器输出的警告。您应该在这里看到不同之处。
关于mysql - 为什么同一个查询可能需要 1000 倍的时间,具体取决于 where 子句?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26438938/