我正在使用以下查询从 mediumblob
列中提取频繁出现的短值:
select bytes, count(*) as n
from pr_value
where bytes is not null && length(bytes)<11 and variable_id=5783
group by bytes order by n desc limit 10;
我遇到的问题是这个查询花费了太多时间(大约 10 秒,少于 100 万条记录):
mysql> select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+-------+----+
| bytes | n |
+-------+----+
| 32 | 21 |
| 27 | 20 |
| 52 | 20 |
| 23 | 19 |
| 25 | 19 |
| 26 | 19 |
| 28 | 19 |
| 29 | 19 |
| 30 | 19 |
| 31 | 19 |
+-------+----+
表格如下(无关列未显示):
mysql> describe pr_value;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| product_id | int(11) | NO | PRI | NULL | |
| variable_id | int(11) | NO | PRI | NULL | |
| author_id | int(11) | NO | PRI | NULL | |
| bytes | mediumblob | YES | MUL | NULL | |
+-------------+---------------+------+-----+---------+-------+
类型是 mediumblob 因为大多数值都很大。不到 10% 的内容与我通过此特定查询查找的内容一样短。
我有以下索引:
mysql> show index from pr_value;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pr_value | 0 | PRIMARY | 1 | product_id | A | 8961 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 3 | author_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 1 | bytes | 1 | bytes | A | 842402 | 10 | NULL | YES | BTREE | | |
| pr_value | 1 | bytes | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
MySQL 是这样解释我的查询的:
mysql> explain select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | pr_value | range | bytes | bytes | 13 | NULL | 421201 | Using where; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
请注意,可以在不更改持续时间的情况下删除字节列长度的条件。
我该怎么做才能使这个查询更快?
当然,我宁愿不必添加列。
最佳答案
你在 (bytes, variable_id) 上的索引不是很聪明。如果你的查询中总是有一个 variable_id 子句,你应该首先添加带有 variable_id 的索引:
(variable_id, bytes)
这取决于 variable_id 的判别力。但这应该有所帮助。
另一个技巧是添加一个新的索引列,其结果为“length(bytes)<11”:
update pr_value set small = length(bytes)<11;
使用 (small,variable_id) 添加新索引。
关于mysql - blob 列上的慢查询分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11311235/