我正在编写一个应用程序,该应用程序使用 MySQL 将文件哈希数据保存到具有单个表的简单数据库中。我创建如下:
CREATE DATABASE IF NOT EXISTS hash_db;
CREATE TABLE IF NOT EXISTS hash_db.main_tbl
(
sha256 CHAR(64) PRIMARY KEY ,
sha1 CHAR(40) UNIQUE KEY ,
md5 CHAR(32) UNIQUE KEY ,
created DATETIME ,
modified DATETIME ,
size BIGINT ,
ext VARCHAR(260) ,
path TEXT(32768) ,
new_record BOOL
)
ENGINE = MyISAM
CREATE UNIQUE INDEX sha256_idx ON hash_db.main_tbl (sha256)
CREATE UNIQUE INDEX sha1_idx ON hash_db.main_tbl (sha1)
CREATE UNIQUE INDEX md5_idx ON hash_db.main_tbl (md5)
然后我只 进行简单的表单选择和插入:
SELECT * FROM hash_db.main_tbl WHERE
sha256 = '...' OR
sha1 = '...' OR
md5 = '...'
INSERT INTO hash_db.main_tbl
(sha256, sha1, md5, created, modified, size, ext, path, new_record) VALUES
(
'...' ,
'...' ,
'...' ,
FROM_UNIXTIME(...) ,
FROM_UNIXTIME(...) ,
... ,
'...' ,
'...' ,
TRUE
)
数据几乎是随机的,唯一性概率非常高(不是说它应该重要,还是应该重要?)。 第一个问题,对于这种用法,InnoDB 比 MyISAM 慢得多(慢 7 倍)是否正常?我读到它应该是相反的(尝试使用 512M innodb_buffer_pool_size,没有区别)。
其次...我已经测试了有无索引(MyISAM),有索引的版本实际上更慢。这些是我的应用测量的实际性能数据(使用 C 中的性能计数器):
With indexes:
Selects per second: 393.7
Inserts per second: 1056.1
Without indexes:
Selects per second: 585.3
Inserts per second: 1480.9
我得到的数据是可重复的。我已经测试了扩大的 key_buffer_size(32M,默认为 8M)。
我做错了什么或遗漏了什么?
============================================= =================================
根据 Gordon Linoff 的建议编辑:
我尝试过使用 UNION ALL,但实际上性能下降了,每秒精确选择 70 次。 EXPLAIN 的输出如下:
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...'
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...'
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx,md5_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha256 = '...'
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx,md5_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| 3 | UNION | main_tbl | const | PRIMARY,sha256_idx | PRIMARY | 192 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
这让我发现我在创建索引时出错(我正在为“sha1”列创建两个单独的索引)。但在修复之后,速度仍然很慢(每秒约 70 次选择),这是 EXPLAIN
的输出:
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5,md5_idx | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| 3 | UNION | main_tbl | const | PRIMARY,sha256_idx | PRIMARY | 192 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
============================================= =================================
经过进一步讨论后的第三次编辑(见下文)。这是原始查询的 EXPLAIN
输出(没有定义额外的索引,数据库是如上所述创建的):
explain extended select path from main_hash_db.main_tbl where sha256 = '...' or md5 = '...' or sha1 = '...' ;
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | main_tbl | index_merge | PRIMARY,sha1,md5 | PRIMARY,md5,sha1 | 192,97,121 | NULL | 3 | 100.00 | Using union(PRIMARY,md5,sha1); Using where |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
我的应用衡量的性能:
Selects per second: 500.6
Inserts per second: 1394.8
这是 3 次选择的结果(单独发布,而不是 UNION
):
Selects per second: 2525.1
Inserts per second: 1584.3
最佳答案
首先,您会期望没有索引的 insert
会更快。那里没有什么神秘之处。不必维护该索引。事实上,在进行大插入时,一个好的策略通常是先删除索引,进行插入,然后重建索引。
select
比较麻烦。毕竟,这是您希望使用索引的地方。您的查询是:
SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...' OR
sha1 = '...' OR
md5 = '...';
这恰好是索引使用的最坏情况。您需要查看 explain
以了解如何使用索引。
我的建议是这样写查询:
SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
sha1 = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
WHERE md5 = '...';
(或者如果你真的想消除重复项,请使用 union
。)
这应该利用每个子查询的每个索引,并且应该为您提供所需的性能。
关于MySQL 性能较慢 *with* 索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25632749/