php - 如何在 PHP 中处理超过 100k 行的 CSV?

标签 php mysql csv

我有一个超过 100.000 行的 CSV 文件,每行有 3 个值,用分号分隔。总文件大小约为。 5MB。

CSV 文件采用以下格式:

stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...

我们有 10 只股票,在 CSV 中的索引为 1-10。在数据库中,我们将它们保存为 22-31。

CSV 按 stock_id、product_id 排序,但我认为这无关紧要。

我有什么

<?php

session_start();

require_once ('db.php');

echo '<meta charset="iso-8859-2">';

// convert table: `CSV stock id => DB stock id`
$stocks = array(
    1  => 22,
    2  => 23,
    3  => 24,
    4  => 25,
    5  => 26,
    6  => 27,
    7  => 28,
    8  => 29,
    9  => 30,
    10 => 31
);

$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");

while ($row = $sql->fetch_assoc()) {
    $products[$row['product_id']] = 1;
}

$csv = file('export.csv');

// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
    $data = explode(';', $row);
    // $data[0] - stock_id
    // $data[1] - product_id
    // $data[2] - amount

    if (isset($products[$data[1]])) {
        // in CSV are products which aren't in database
        // there is echo which should show me queries
        echo "  UPDATE t 
                SET value = " . (int)$data[2] . " 
                WHERE   fielddef_id = " . (int)$stocks[$data[0]] . " AND 
                        product_id = '" . $data[1] . "' -- product_id isn't just numeric
                LIMIT 1<br>";
    }
}

问题是用 echo 写下 100k 行太慢了,需要很长时间。我不确定 MySQL 会做什么,如果它会更快,或者花费 ± 相同的时间。我这里没有测试机,所以我担心在生产服务器上测试。

我的想法是将 CSV 文件加载到更多变量(更好的数组)中,如下所示,但我不知道为什么。

$csv[0] = lines 0      - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc. 

我发现例如。 Efficiently counting the number of lines of a text file. (200mb+) ,但我不确定它如何帮助我。

当我用 print_r 替换 foreach 时,我在 < 1 秒内得到了转储。任务是使带有数据库更新的 foreach 循环更快。

知道如何更新数据库中的这么多记录吗?
谢谢。

最佳答案

类似这样的东西(请注意这是 100% 未经测试的,我的头脑可能需要一些调整才能实际工作:))

//define array may (probably better ways of doing this
$stocks = array(
    1  => 22,
    2  => 23,
    3  => 24,
    4  => 25,
    5  => 26,
    6  => 27,
    7  => 28,
    8  => 29,
    9  => 30,
    10 => 31
);

$handle = fopen("file.csv", "r")); //open file
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
    //loop through csv

    $updatesql = "UPDATE t SET `value` = ".$data[2]." WHERE   fielddef_id = ".$stocks[$data[0]]." AND product_id = ".$data[1];
   echo "$updatesql<br>";//for debug only comment out on live
}

没有必要进行初始选择,因为您只是在代码中将产品数据设置为 1,并且从您的描述中可以看出,您的产品 ID 始终是正确的,它只是包含 map 的 fielddef 列.

另外,不要忘记将实际的 mysqli 执行命令放入 $updatesql 中;

给你一个与实际使用代码的比较(我可以做基准测试!) 这是我用于上传文件导入器的一些代码(它并不完美,但它完成了它的工作)

if (isset($_POST['action']) && $_POST['action']=="beginimport") {
            echo "<h4>Starting Import</h4><br />";
            // Ignore user abort and expand time limit 
            //ignore_user_abort(true);
            set_time_limit(60);
                if (($handle = fopen($_FILES['clientimport']['tmp_name'], "r")) !== FALSE) {
                    $row = 0;
                    //defaults 
                    $sitetype = 3;
                    $sitestatus = 1;
                    $startdate = "2013-01-01 00:00:00";
                    $enddate = "2013-12-31 23:59:59";
                    $createdby = 1;
                    //loop and insert
                    while (($data = fgetcsv($handle, 10000, ",")) !== FALSE) {  // loop through each line of CSV. Returns array of that line each time so we can hard reference it if we want.
                        if ($row>0) {
                            if (strlen($data[1])>0) {
                                $clientshortcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0])));
                                $sitename = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0]))." ".trim(stripslashes($data[1])));
                                $address = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[1])).",".trim(stripslashes($data[2])).",".trim(stripslashes($data[3])));
                                $postcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[4])));
                                //look up client ID
                                $client = $db->queryUniqueObject("SELECT ID FROM tblclients WHERE ShortCode='$clientshortcode'",ENABLE_DEBUG);

                                if ($client->ID>0 && is_numeric($client->ID)) {
                                    //got client ID so now check if site already exists we can trust the site name here since we only care about double matching against already imported sites.
                                    $sitecount = $db->countOf("tblsites","SiteName='$sitename'");
                                    if ($sitecount>0) {
                                        //site exists
                                        echo "<strong style=\"color:orange;\">SITE $sitename ALREADY EXISTS SKIPPING</strong><br />";
                                    } else {
                                        //site doesn't exist so do import
                                        $db->execute("INSERT INTO tblsites (SiteName,SiteAddress,SitePostcode,SiteType,SiteStatus,CreatedBy,StartDate,EndDate,CompanyID) VALUES 
                                        ('$sitename','$address','$postcode',$sitetype,$sitestatus,$createdby,'$startdate','$enddate',".$client->ID.")",ENABLE_DEBUG);
                                        echo "IMPORTED - ".$data[0]." - ".$data[1]."<br />";
                                    }
                                } else {
                                    echo "<strong style=\"color:red;\">CLIENT $clientshortcode NOT FOUND PLEASE ENTER AND RE-IMPORT</strong><br />";
                                }
                                fcflush();
                                set_time_limit(60); // reset timer on loop
                            }
                        } else {
                            $row++;
                        }
                    } 
                    echo "<br />COMPLETED<br />";
                }
                fclose($handle);
                unlink($_FILES['clientimport']['tmp_name']);
            echo "All Imports finished do not reload this page";
        }

在大约 10 秒内导入了 15 万行

关于php - 如何在 PHP 中处理超过 100k 行的 CSV?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29794468/

相关文章:

php - Symfony - 通过 ID 获取用户提供者

PHP: Mysql 连接

php - 用一个变量查询多个字段的PDO最佳实践

python制表符分隔检索列并删除空行

python - 去除字符的所有尾随出现

php - 找不到用户名和密码-php

php - PHP 中的 NOW() 函数

php - MySQL如何通过票数获取用户排名

php - 错误 1045 : access denied for user 'root' @'localhost' using password NO. .. 通过扩展类连接时,其他

javascript - 从 mvc web api httpresponse 生成 csv 并通过 angularjs 接收它以供下载