MySQL 向 MEMORY 表中插入非常慢

标签 mysql sql-insert database-performance

我正在尝试优化一个大型(68K 行)插入临时表。我创建的表是MEMORY引擎表,完全没有索引和外键。当我的 ETL 进程开始插入时,插入会执行,但速度非常慢;满载需要一个多小时。

这是来自 show table create 的表定义:

CREATE TABLE `pub_tair_germplasm` (
  `germplasm_id` int(12) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL,
  `original_name` varchar(100) DEFAULT NULL,
  `sp_growth_conditions` varchar(2048) DEFAULT NULL,
  `description` varchar(2048) DEFAULT NULL,
  `description_uc` varchar(2048) DEFAULT NULL,
  `is_mutant` varchar(1) DEFAULT NULL,
  `is_aneuploid` varchar(1) DEFAULT NULL,
  `ploidy` varchar(4) DEFAULT NULL,
  `species_variant_id` int(12) DEFAULT NULL,
  `taxon_id` int(12) DEFAULT NULL,
  `aneuploid_chromosome` int(10) DEFAULT NULL,
  `date_entered` date DEFAULT NULL,
  `date_last_modified` date DEFAULT NULL,
  `tair_object_id` bigint(19) DEFAULT NULL,
  `is_obsolete` varchar(1) DEFAULT NULL,
  `tair_object_type_id` int(12) DEFAULT NULL,
  `germplasm_type` varchar(20) DEFAULT NULL
) ENGINE=MEMORY DEFAULT CHARSET=latin1

这是插页:

INSERT INTO pub_tair_germplasm(
   germplasm_id,
   name,
   original_name,
   sp_growth_conditions,
   description,
   description_uc,
   is_mutant,
   is_aneuploid,
   ploidy,
   species_variant_id,
   taxon_id,
   aneuploid_chromosome,
   date_entered,
   date_last_modified,
   tair_object_id,
   is_obsolete,
   tair_object_type_id,
   germplasm_type)
VALUES (
   $germplasm_id,
   $name,
   $original_name,
   $sp_growth_conditions,
   $description,
   $description_uc,
   CASE $is_mutant WHEN 'F' THEN 'n' WHEN 'T' THEN 'y' ELSE 'y' END,
   CASE $is_aneuploid WHEN 'F' THEN 'n' WHEN 'T' THEN 'y' ELSE 'y' END,
   $ploidy,
   $species_variant_id, 
   $taxon_id,
   $aneuploid_chromosome,
   $date_entered,
   $date_last_modified,
   $tair_object_id,
   $is_obsolete,
   $tair_object_type_id,
   $type)

这是通过 Clover/ETL 完成的,通常插入速度非常快,使用批量大小为 5000 的 JDBC 批处理。值变量是 CloverETL 变量引用。 Oracle 上的类似插入到常规表中只需几秒钟。这一切都在单个事务中完成,在插入所有行之前不提交(应用程序要求)。

当插入正在运行时,top 显示两个 CPU 的利用率均为 0.3%。

编辑:

对于下一次测试运行,我将最大堆表大小增加到 1GB,足以容纳整个表:

mysql> select @@max_heap_table_size;
+-----------------------+
| @@max_heap_table_size |
+-----------------------+
|             999999488 |
+-----------------------+

开始时的进程列表:

mysql> SHOW FULL PROCESSLIST;
+----+------+-----------+-------+---------+------+-------+-----------------------+
| Id | User | Host      | db    | Command | Time | State | Info                  |
+----+------+-----------+-------+---------+------+-------+-----------------------+
|  3 | root | localhost | mysql | Query   |    0 | NULL  | SHOW FULL PROCESSLIST |
+----+------+-----------+-------+---------+------+-------+-----------------------+
1 row in set (0.00 sec)

运行过程中的进程列表:

mysql> SHOW FULL PROCESSLIST;
+----+---------+--------------------------------------------+-------+---------+------+-------+-----------------------+
| Id | User    | Host                                       | db    | Command | Time | State | Info                  |
+----+---------+--------------------------------------------+-------+---------+------+-------+-----------------------+
|  4 | pubuser | c-67-188-135-136.hsd1.ca.comcast.net:55928 | pub   | Sleep   |    0 |       | NULL                  |
|  5 | root    | localhost                                  | mysql | Query   |    0 | NULL  | SHOW FULL PROCESSLIST |
+----+---------+--------------------------------------------+-------+---------+------+-------+-----------------------+
2 rows in set (0.00 sec)

我启用了通用日志文件;它显示发出的 CloverETL 环境设置命令,然后进入系列插入:

150528 20:22:54     4 Connect   pubuser@c-67-188-135-136.hsd1.ca.comcast.net on pub
                    4 Query     /* mysql-connector-java-5.1.20 ( Revision: tonci.grgin@oracle.com-20111003110438-qfydx066wsbydkbw ) */SHOW VARIABLES WHERE Variable_name ='langua
ge' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name 
= 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isola
tion' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_cas
e_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_nam
e = 'query_cache_size' OR Variable_name = 'init_connect'
                    4 Query     /* mysql-connector-java-5.1.20 ( Revision: tonci.grgin@oracle.com-20111003110438-qfydx066wsbydkbw ) */SELECT @@session.auto_increment_increment
                    4 Query     SHOW COLLATION
150528 20:22:55     4 Query     SET NAMES latin1
                    4 Query     SET character_set_results = NULL
                    4 Query     SET autocommit=1
                    4 Query     SET sql_mode='STRICT_TRANS_TABLES'
                    4 Query     SET autocommit=0
                    4 Query     SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED
150528 20:23:08     4 Query     INSERT INTO pub_tair_germplasm(
   germplasm_id,
   name,
   original_name,
   sp_growth_conditions,
   description,
   description_uc,
   is_mutant,
   is_aneuploid,
   ploidy,
   species_variant_id,
   taxon_id,
   aneuploid_chromosome,
   date_entered,
   date_last_modified,
   tair_object_id,
   is_obsolete,
   tair_object_type_id,
   germplasm_type)
VALUES (
   500689369,
   'CS2000002',
   'CS2000002',
   'none',
   'Sequence-indexed T-DNA insertion line; from the GABI-Kat project (German Plant Genomics Program - Koelner Arabidopsis T-DNA lines); generated using flanking sequence tags (F
STs) in the Columbia (Col-0) background; genomic DNA was isolated from T1 plants; plant sequences adjacent to T-DNA borders were amplified by adapter-ligation PCR; automated pur
ification and sequencing of PCR product were conducted followed by computational trimming of the resulting sequence files; for details, see the GABI-Kat web site: http://www.gab
i-kat.de; this is a T4 generation single-plant line potentially homozygous for the insertion. May be segregating for phenotypes that are not linked to the insertion; may have ad
ditional insertions potentially segregating.',
   'SEQUENCE-INDEXED T-DNA INSERTION LINE; FROM THE GABI-KAT PROJECT (GERMAN PLANT GENOMICS PROGRAM - KOELNER ARABIDOPSIS T-DNA LINES); GENERATED USING FLANKING SEQUENCE TAGS (F
STS) IN THE COLUMBIA (COL-0) BACKGROUND; GENOMIC DNA WAS ISOLATED FROM T1 PLANTS; PLANT SEQUENCES ADJACENT TO T-DNA BORDERS WERE AMPLIFIED BY ADAPTER-LIGATION PCR; AUTOMATED PUR
IFICATION AND SEQUENCING OF PCR PRODUCT WERE CONDUCTED FOLLOWED BY COMPUTATIONAL TRIMMING OF THE RESULTING SEQUENCE FILES; FOR DETAILS, SEE THE GABI-KAT WEB SITE: HTTP://WWW.GAB
I-KAT.DE; THIS IS A T4 GENERATION SINGLE-PLANT LINE POTENTIALLY HOMOZYGOUS FOR THE INSERTION. MAY BE SEGREGATING FOR PHENOTYPES THAT ARE NOT LINKED TO THE INSERTION; MAY HAVE AD
DITIONAL INSERTIONS POTENTIALLY SEGREGATING.',
   CASE null WHEN 'F' THEN 'n' WHEN 'T' THEN 'y' ELSE 'y' END,
   CASE 'F' WHEN 'F' THEN 'n' WHEN 'T' THEN 'y' ELSE 'y' END,
   '2',
   null, 
   1,
   null,
   '2015-01-06 10:49:21',
   '2015-01-06 10:40:55',
   6530679980,
   'F',
   200016,
   'individual_line')

问题依旧。

最佳答案

好吧,我不知 Prop 体是什么问题,但是将制表符分隔版本的数据上传到 mysql 服务器并执行此操作:

LOAD DATA LOCAL INFILE '/tmp/pub_tair_grm_insert.csv' INTO TABLE pub_tair_germplasm;
Query OK, 68932 rows affected, 65535 warnings (1.26 sec)
Records: 68932  Deleted: 0  Skipped: 0  Warnings: 6

显然是答案,无论是什么问题。在由 Clover/ETL 完成的 JDBC 批处理中,一定有什么东西会显着减慢插入速度。如果有机会,我会调查一下,但现在 LOAD 给了我我需要的东西。

关于MySQL 向 MEMORY 表中插入非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30489720/

相关文章:

java - 在 MySQL 中插入多行

导入行时 MySQL "pileup"

mysql - 一级索引与二级索引 : performance differences

MySQL触发器: INSERT ON log BEFORE DELETE ON table

mysql - 带有 datediff 和来自 db 的列的 sql 语句

mysql - SQL 语句 - 基于列类型的总和

PHP 循环,单独的结果取决于 group_id

sql-server - SQL Server 2008插入失败报告

sql - 如何在 SQL 中组合WITH语句和INSERT INTO

sql-server - 为什么从派生表迁移到临时表解决方案时性能会提高?