我做了一个MySQL性能优化测试,但是测试结果让我很吃惊。
首先,我准备了几个表来测试,分别是“t_worker_attendance_300w(300万数据),t_worker_attendance_1000w(1000万数据),t_worker_attendance_1y(1亿数据),t_worker_attendance_4y(4亿数据)”。
每个表都有相同的字段,相同的索引,都是复制过来的,其中4亿数据量也是从300万数据量增加而来。
在我的理解中,MySQL的性能必然会受到数据量大小的严重影响,但结果却让我困惑了整整一个星期。我几乎测试了我能想到的场景,但它们的执行时间是一样的!
这是一个新的 MySQL 5.6.16 服务器,我测试了我能想到的任何场景,包括 INNER JOIN....
A) SHOW CREATE TABLE t_worker_attendance_4y
CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL ,
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
B) SHOW INDEX FROM t_worker_attendance_4y
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y | 1 | checkin_time | 1 | checkin_time | A | 5017494 | NULL | NULL | YES | BTREE | | |
| t_worker_attendance_4y | 1 | worker_id | 1 | worker_id | A | 1686552 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id | 1 | project_id | A | 102450 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | subcontractor_id | 1 | subcontractor_id | A | 380473 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | sign_date | 1 | sign_date | A | 512643 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 1 | project_id | A | 102059 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 2 | sign_date | A | 1776104 | NULL | NULL | | BTREE | | |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| 1 | SIMPLE | tw | ref | project_id,sign_date,project_id_2 | project_id_2 | 96 | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
他们都经历了相同的联合索引。
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.01 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
……
我的猜测是MySQL的查询性能会随着数据量的增加而急剧下降,但两者相差不大。所以我没有办法优化我的查询。不知道什么时候实现分表方案或者分库分表方案。
我想知道的是为什么小数据量的索引和大数据量的索引执行速度是一样的。如果你能帮助我,我非常感谢你。
最佳答案
由于 BTREE,在大数据量上的搜索性能相同指数。它有 O(log(n))
.相对而言,这意味着搜索算法必须完成:
对3m数据进行6次操作
10m数据7次操作
100m数据8次操作
400m数据8次操作
Аs 你可以看到操作的数量几乎相同。
My guess is that MySQL's query performance will decline dramatically with the increase of data volume
这对于 full table scan 是正确的例。
关于mysql - MySQL性能优化的烦恼,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56735178/