MySQL:BTREE 慢速索引路径(表大小:723704015 行)

我有一个大型数据库，特别是一个表一直很慢，尽管在该列上有一个适当的索引(fk，实际上)。公平地说，该列的基数较低(只有 5 个可能的值，并且分布不均)，但执行 LIMIT 1 仍然需要 9 秒以上，但仅选择 btree 的特定路径。

这是表格:

CREATE TABLE `locking_scripts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `script_type_id` int(10) unsigned NOT NULL DEFAULT 1,
  `transaction_output_id` int(10) unsigned NOT NULL,
  `script` blob NOT NULL,
  `address_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `locking_scripts_uq` (`transaction_output_id`),
  KEY `locking_scripts_address_id_fk` (`address_id`),
  KEY `locking_scripts_type_id_fk` (`script_type_id`),
  CONSTRAINT `locking_scripts_address_id_fk` FOREIGN KEY (`address_id`) REFERENCES `addresses` (`id`),
  CONSTRAINT `locking_scripts_output_id_fk` FOREIGN KEY (`transaction_output_id`) REFERENCES `transaction_outputs` (`id`),
  CONSTRAINT `locking_scripts_type_id_fk` FOREIGN KEY (`script_type_id`) REFERENCES `script_types` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=748705501 DEFAULT CHARSET=utf8mb4;

有问题的列是 script_type_id。这个对应表的内容是:

+----+------------------------+
| id | type                   |
+----+------------------------+
|  2 | CUSTOM_SCRIPT          |
|  3 | PAY_TO_PUBLIC_KEY      |
|  4 | PAY_TO_PUBLIC_KEY_HASH |
|  5 | PAY_TO_SCRIPT_HASH     |
|  1 | UNKNOWN                |
+----+------------------------+

两个表的表状态是:

+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name              | Engine | Version | Row_format | Rows      | Avg_row_length | Data_length  | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation          | Checksum | Create_options | Comment |
+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| locking_scripts   | InnoDB |      10 | Dynamic    | 726718877 |             70 |  51335135232 |               0 |  34351300608 |   7340032 |      748705501 | 2018-10-25 01:31:20 | 2018-11-18 15:30:40 | NULL       | utf8mb4_general_ci |     NULL |                |         |
| script_types      | InnoDB |      10 | Dynamic    |         5 |           3276 |        16384 |               0 |        16384 |         0 |              6 | 2018-10-24 22:22:43 | NULL                | NULL       | utf8mb4_general_ci |     NULL |                |         |
+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+

我正在运行的查询是:

SELECT id FROM locking_scripts WHERE script_type_id = 1 LIMIT 1;

...执行大约需要 9.5s。此查询的配置文件是:

+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+
| id   | select_type | table           | type | possible_keys              | key                        | key_len | ref   | rows      | filtered | Extra       |
+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+
|    1 | SIMPLE      | locking_scripts | ref  | locking_scripts_type_id_fk | locking_scripts_type_id_fk | 4       | const | 363359438 |   100.00 | Using index |
+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+

探查器说它正在使用索引(公平地说，它必须过滤 363,359,438 行，但是使用 LIMIT 1 它应该相当快，因为它会在第一次匹配时退出)。然而，有趣的是这个查询的速度是原来的两倍:

SELECT id FROM locking_scripts WHERE script_type_id NOT IN (2, 3, 4, 5) LIMIT 1

...需要 4.5s 来执行。 (虽然这仍然是一个非常长的时间。)我有其他大小相似的表，并且从相似索引集中选择 LIMIT 1 几乎是即时的。

为了完整起见，对“更快”版本的解释如下:

+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+
| id   | select_type | table           | type  | possible_keys              | key                        | key_len | ref  | rows      | filtered | Extra                    |
+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+
|    1 | SIMPLE      | locking_scripts | range | locking_scripts_type_id_fk | locking_scripts_type_id_fk | 4       | NULL | 363359442 |   100.00 | Using where; Using index |
+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+

我想不出为什么会看到这种结果。如果有人有任何见解，我将不胜感激。谢谢。

附加信息，根据评论:

SELECT * 的执行计划:

EXPLAIN FORMAT=JSON SELECT * FROM locking_scripts WHERE script_type_id = 1 LIMIT 1;

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "locking_scripts",
      "access_type": "ref",
      "possible_keys": ["locking_scripts_type_id_fk"],
      "key": "locking_scripts_type_id_fk",
      "key_length": "4",
      "used_key_parts": ["script_type_id"],
      "ref": ["const"],
      "rows": 363359438,
      "filtered": 100
    }
  }
}

SELECT * ... NOT IN () 的执行计划:

EXPLAIN FORMAT=JSON SELECT * FROM locking_scripts WHERE script_type_id NOT IN (2, 3, 4, 5) LIMIT 1;

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "locking_scripts",
      "access_type": "range",
      "possible_keys": ["locking_scripts_type_id_fk"],
      "key": "locking_scripts_type_id_fk",
      "key_length": "4",
      "used_key_parts": ["script_type_id"],
      "rows": 363359442,
      "filtered": 100,
      "index_condition": "locking_scripts.script_type_id not in (2,3,4,5)"
    }
  }
}

最佳答案

(太大而无法评论。)

到目前为止，我很难过。 JSON 不同，但它们告诉我的还不够。这是另一件可以试验的事情:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

这是获取特定操作的实际计数的通用技术。它可能有助于确定“1”行在扫描中出现的早还是晚。

关于MySQL:BTREE 慢速索引路径(表大小:723704015 行)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53365288/

MySQL:BTREE 慢速索引路径(表大小:723704015 行)

上一篇：python - 建立数据源的目的是什么

下一篇：mysql - 从不同的列中选择相同的值