PHP + MySQL : Big CSV file import

标签 php mysql sql csv zend-framework

我正在尝试加快这个用于将 CSV 文件导入 MySQL 的脚本。 对于 1000 行,加载它需要 130 秒。尝试将其用于 30 000 行,20 分钟后它超时并加载了 8681 行。

CSV 标题看起来像(可以是任何顺序的任意数量的列):

email;param1;..;paramX
test@test.com;something;..;value 

MySQL 创建邮件队列:

CREATE TABLE IF NOT EXISTS `mail_queue` (
  `mail_queue_id` INT NOT NULL AUTO_INCREMENT,
  `mailer_batch_id` INT NOT NULL,
  `to` VARCHAR(100) NOT NULL,
  `priority` INT NOT NULL DEFAULT 0,
  `created` DATETIME NOT NULL DEFAULT NOW(),
  `mail_status_id` INT NOT NULL,
  PRIMARY KEY (`mail_queue_id`),
  INDEX `fk_mail_queue_mailer_batch1_idx` (`mailer_batch_id` ASC),
  INDEX `fk_mail_queue_mail_status1_idx` (`mail_status_id` ASC),
  CONSTRAINT `fk_mail_queue_mailer_batch1`
    FOREIGN KEY (`mailer_batch_id`)
    REFERENCES `mailer_batch` (`mailer_batch_id`)
    ON DELETE CASCADE
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_mail_queue_mail_status1`
    FOREIGN KEY (`mail_status_id`)
    REFERENCES `mail_status` (`mail_status_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

MySQL 创建 mail_param:

CREATE TABLE IF NOT EXISTS `mail_param` (
 `mail_param_id` INT NOT NULL AUTO_INCREMENT,
 `mail_queue_id` INT NOT NULL,
 `param_key` VARCHAR(45) NOT NULL,
 `param_value` VARCHAR(45) NOT NULL,
 PRIMARY KEY (`mail_param_id`),
 INDEX `fk_mail_param_mail_queue1_idx` (`mail_queue_id` ASC),
  CONSTRAINT `fk_mail_param_mail_queue1`
  FOREIGN KEY (`mail_queue_id`)
  REFERENCES `mail_queue` (`mail_queue_id`)
  ON DELETE CASCADE
  ON UPDATE NO ACTION)
 ENGINE = InnoDB;

代码(Zend 框架)。运行良好,但速度较慢:

    if (($handle = fopen($this->filepath, 'r')) !== false)
    {
        // DB
        $mailQueueTable = new Application_Model_DbTable_MailQueue();
        $mailParamTable = new Application_Model_DbTable_MailParam();

        // Get header
        $header = \ForceUTF8\Encoding::toUTF8(fgetcsv($handle, 0, ';'));

        while(($data = fgetcsv($handle, 0, ';')) !== false)
        {
            // Save e-mail to e-mail queue 
            $mailQueueRow = $mailQueueTable->createRow();
            $mailQueueRow->mailer_batch_id = $mailerBatchId;
            $mailQueueRow->to = $data[$this->emailColumn];
            $mailQueueRow->priority = 0;
            $mailQueueRow->created = $created->toString('yyyy-MM-dd HH:mm:ss');
            $mailQueueRow->mail_status_id = 1;
            $mailQueueId = $mailQueueRow->save();

            // Save e-mail params
            foreach ($data as $key => $value) {
                $mailParamRow = $mailParamTable->createRow();
                $mailParamRow->mail_queue_id = $mailQueueId;
                $mailParamRow->param_key = $header[$key];
                $mailParamRow->param_value = \ForceUTF8\Encoding::toUTF8($value);
                $mailParamRow->save();
            }

            unset($data);
        }
        fclose($handle);

    }

我尝试了 LOAD DATA INTO,但由于 mail_param 表结构,我无法使用它。

1)创建临时表(确定)

    $columns = "";
    foreach ($this->header as $item) {
        if ($columns == "") {
            $columns = "`" . $item . "` VARCHAR(45)";
        } else {
            $columns .= ", `" . $item . "` VARCHAR(45)";
        }
    }

    $query = 'CREATE TEMPORARY TABLE `tmp_csv_import` (
    `id` int AUTO_INCREMENT,
    ' . $columns . '
    ) ENGINE MyISAM;';   

2) 加载数据入文件(确定)

    $query = "LOAD DATA INFILE '" . $this->filepath . "'
    INTO TABLE `tmp_csv_import`
    FIELDS TERMINATED BY ';'
    ENCLOSED BY '\"'
    LINES TERMINATED BY '\n'
    IGNORE 1 ROWS;";

3)插入mail_queue(确定)

// $mailerBatchId from last_insert_id()
$query = "INSERT INTO `mail_queue` (`mailer_batch_id`, `to`, `priority`, `created`, `mail_status_id`)
SELECT " . $mailerBatchId . ", `email`, 0, NOW(), 1 FROM `tmp_csv_import`";

4) 插入mail_param (???)

这里不知道写什么。我需要为表 tmp_csv_import 中的每一列插入新行。我需要获取 mail_queue_id - 表 mail_param 的外键。

$query = "INSERT INTO mail_param (mail_queue_id, param_key, param_value) 
SELECT ??? FROM `tmp_csv_import`";

是否可以在 MySQL 中执行此操作?还是我应该以不同的方式来做?

最佳答案

我找到了解决问题的方法。无需使用临时表。

1) 将使用此代码加载“mail_queue”的数据:

$query = "LOAD DATA INFILE '" . $this->filepath . "'
INTO TABLE `mail_queue`
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@dummy, @dummy, email, @dummy)
SET `mailer_batch_id` = " . (int) $mailerBatchId . ",
    `priority` = 0,
    `created` = NOW(),
    `mail_status_id` = 1;";

只导入一行,由header生成

(@dummy, @dummy, email, @dummy)

其他值将由 SET 设置。

2) 我将从插入的值中选择“mail_queue_id”并将其插入到数组中:

array('to' => 'mail_queue_id');

3) 我将从源文件创建临时 CSV 文件。结构:

mail_queue_id;key;value

4) 数据将加载到“mail_param”表中:

$query = "LOAD DATA INFILE " . $tmpFilepath . "
    INTO TABLE `mail_param`
    FIELDS TERMINATED BY ';'
    ENCLOSED BY '\"'
    LINES TERMINATED BY '\n'
    (`mail_queue_id`, `param_key`, `param_value`);";

5)

unlink($tmpFilepath)

6) 完成了。我尝试用 30 000 行加载 CSV,速度明显更快。 (< 1 秒)。

关于PHP + MySQL : Big CSV file import,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25361761/

相关文章:

MySQL : writing query for relational tables for strict categories

java - 如何在 NetBeans 上的 Java EE 项目中调用我自己的自定义 SQL 代码

sql - 进度 dbtool 错误

php - MySQL - 具有多对多关系的 INNER JOIN 查询

mysql - mysql 中的行解释是否意味着该表没有正确索引?

php - AJAX 加载后没有发生 jQuery 事件?

php - 使 $.ajax 失败

php - 在 PHP 中显示从 MYSQL 中提取的数据的重音

php - 使用 PHP 解析亚马逊 MWS 产品 API XML

javascript - jquery/php 重新运行 php 脚本