php - 用php处理大文件到数据库

我有一个文本文件(本质上是一个没有扩展名的 csv)，其中有 150,000 行。我需要按键删除重复项，然后将它们插入数据库。我正在尝试 fgetcvs 逐行读取它，但我不想进行 150,000 次查询。所以这是我到目前为止想出的:(请记住我正在使用 laravel)

    $count = 0;
    $insert = [];

    if (($handle = fopen("myHUGEfile.txt", "r")) !== FALSE) {
        while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
            $count++;

            //See if this is the top row, which in this case are column headers
            if ($count == 1) continue;

            //Get the parts needed for the new part
            $quantity = $data[0];
            $part_number = $data[1];
            $manufacturer = $data[2];

            $new_part = [
                'manufacturer' => $manufacturer,
                'part_number' => $part_number,
                'stock' => $quantity,
                'price' => '[]',
                'approved' => 0,
            ];

            $insert[] = $new_part;

        }
        fclose($handle);
    } else {
        throw new Exception('Could not open file for reading.');
    }

    //Remove duplicates
    $newRows = [];
    $parsedCount = 0;
    foreach ($insert as $row) {
        $x = 0;
        foreach ($newRows as $n) {
            if (strtoupper($row['part_number']) === strtoupper($n['part_number'])) {
                $x++;
            }
        }
        if ($x == 0) {
            $parsedCount++;
            $newRows[] = $row;
        }
    }
    $parsed_rows = array_chunk($newRows, 1000, true);

    $x = 0;
    foreach ($parsed_rows as $chunk) {
        //Insert
        if (count($chunk) > 0)
            if (DB::table('search_parts')->insert($chunk))
                $x++;
    }

    echo $x . " chunks inserted.<br/>" . $count . " parts started with<br/>" . $parsedCount . " rows after duplicates removed.";

但它非常笨重，我只测试了 1000 多行，并且它可以使用 localhost。但恐怕如果我将其投入生产，它将无法处理所有 150,000 行。该文件大约 4mb。

有人可以告诉我一个更好更有效的方法吗？

最佳答案

现在，您正在保留第一个重复记录。如果你可以保留 last 欺骗，你可以改变

 $insert[] = $new_part;

到

$insert[strtoupper($part_number)] = $new_part

这样一来，您的$insert 数组中的每个$part_number 将只有一个值。您的插入会慢一点，但您可以删除所有检查重复项的代码，这看起来非常非常慢。

关于php - 用php处理大文件到数据库，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23957687/

php - 用php处理大文件到数据库

上一篇：php - 在使用 GET url ID 创建的页面上更新 MySQL 记录

下一篇：mysql - SQL LEFT JOIN 没有返回正确的数据