我使用mysql mariadb(服务器版本:10.3.20-MariaDB-1:10.3.20+maria~stretch mariadb.org 二进制发行版)。
我有大约 700 000 条包含列的记录:
- ID
- html(中文本)字段平均长度非常大:~150000
- 日期
- +2 个小其他
在 html 中,我有很长的文本(它是 html 的)。
现在我需要 select * from table;
来分析这个 html,但是这个查询每个查询占用约 0.03819 秒(我在较小的部分上进行了测试),所以:每个查询的总行数 700000*0.03819 秒 = (700000 *0.03819s)/60/60 = 超过 7 个小时的选择!
我有 8 个核心和 60GB RAM。分析查询显示传输数据的时间非常非常长。 如何加快速度?有可能吗,或者这么多数据对于mysql来说太多了,我需要mongodb?
query_cache_limit = 64M
query_cache_size = 1024M
max_allowed_packet = 64M
net_buffer_length = 16384
max_connect_errors = 1000
thread_concurrency = 32
concurrent_insert = 2
read_rnd_buffer_size = 8M
bulk_insert_buffer_size = 8M
query_cache_limit = 64M
query_cache_size = 1024M
query_cache_type = 1
query_prealloc_size = 262144
query_alloc_block_size = 65536
transaction_alloc_block_size = 8192
transaction_prealloc_size = 4096
max_write_lock_count = 16
innodb_buffer_pool_size=30G
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=16
innodb_flush_method=O_DIRECT
innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_buffer_pool_instances = 20
MariaDB [db]> explain select id, href, html from raw limit 10;
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | raw | ALL | NULL | NULL | NULL | NULL | 658793 | |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.227 sec)
使用索引后:
MariaDB [db]> show index from raw;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| raw | 0 | PRIMARY | 1 | id | A | 658793 | NULL | NULL | | BTREE | | |
| raw | 1 | id | 1 | id | A | 658793 | NULL | NULL | | BTREE | | |
| raw | 1 | href | 1 | href | A | 658793 | NULL | NULL | YES | BTREE | | |
| raw | 1 | date | 1 | date | A | 131758 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (3.724 sec)
最佳答案
38ms 从旋转磁盘获取 150Kb 数据相当快。
query_cache_size = 1024M -- 这太高了。停在大约50M处。
PRIMARY KEY
是唯一索引。因此,如果 id
是主键,则不要同时说 KEY(id)
。
It's is possible, or that much of data it's too much for mysql and I need mongodb?
假设您以磁盘速度运行,则不能指望任何其他产品运行得更快。
客户端将如何处理一批 100GB 的数据? MySQL 会很乐意提供它,但客户端可能会窒息。
关于mysql - 选择所有行 (700000) 很长时间 - 小时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59221986/