PHP - 创建新文件，其中包含 file1 中的所有行，但不包含 file2 中的行中的任何文本

我在 StackExchange 上阅读了大量帖子，但找不到我需要的内容。注意:这不仅仅是删除重复项。我需要浏览 File1.csv 并创建一个新文件 - Results.csv - 其中包含的每一行不包含 File2.txt 中的行。

File1.csv 包含个人详细信息和电子邮件地址，每行 1 个:

"mr","Happy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="69041b0108191910290c11080419050c470a0604" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="acc1dedfcdc8ecc9d4cdc1dcc0c982cfc3c1" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455045425a474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="90fde2e3e4e2fffef7d0f5e8f1fde0fcf5bef3fffd" rel="noreferrer noopener nofollow">[email protected]</a>"

File2.txt 包含电子邮件地址，每行 1 个:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9af7e8f2fbeaeae3daffe2fbf7eaf6ffb4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfb2adacb0b2bab0b1babab3acba9fbaa7beb2afb3baf1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c7aab5b4a8aaa2a8a9a2a2abb4a2f587a2bfa6aab7aba2e9a4a8aa" rel="noreferrer noopener nofollow">[email protected]</a>

预期结果:Results.csv 应包含:

"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="422f3031232602273a232f322e276c212d2f" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ec819e8b9e99819c95ac89948d819c8089c28f8381" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="026f707176706d6c6542677a636f726e672c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>"

令人困惑的是，当 File2.txt 包含一行时，我的代码按预期工作。但是，当它包含多行时，Results.txt 包含 File1.csv 中的所有行(包括应删除的行)，并多次重复这些行(与 File2.txt 中的行数一样多)。我有一种感觉，我已经很接近了，但我无法弄清楚。

我的代码:

<?php
$to_be_searched = "File1.csv";

$items_to_catch = file("File2.txt");

// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();

// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
  // go line by line until end of file
  while (($line = fgets($handle)) !== false) {
    // check if line contains any items from $items_to_catch
    foreach($items_to_catch as $key => $value) {
      if(strpos($line, $value) === false) {
        // email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
        $good_lines[] = $line;
      } 
    }
  }
  fclose($handle);
} else {
  echo "Couldn't open " . $to_be_searched;
  exit();
}

// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
    file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}

?>

我做错了什么？

最佳答案

它目前不起作用，因为在 foreach 中，您将同一行多次添加到 $good_lines。

要解决此问题，您可以向循环中添加一个标志变量。

while (($line = fgets($handle)) !== false) {
    // Declare our flag variable as false by default
    $found = false;

    // Loop through each item to see if the email has been found
    foreach($items_to_catch as $key => $value) {
        // If the email was found, stop looping in the second file
        if(strpos($line, $value) !== false){
            $found = true;
            break;
        } 
    }

    // If the email was not found in the second file, add it to the good_lines array
    if(!$found)
        $good_lines[] = $line;
}

更新

除了循环之外，当您读取 File2.txt 时，还会遇到另一个问题，因为它会在字符串中添加换行符，因此，当您稍后将字符串与 进行比较时>strpos，它不起作用。要解决这个问题:

$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);

这是不带标志的 $items_to_catch 的 var_dump:

array (size=3)
    0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455f5647474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=20)
    1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="472a3534282a22282922222b342207223f262a372b226924282a" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=26)
    2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="28455a5b47454d47464d4d445b4d1a684d50494558444d064b4745" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=27)

这是带有标志的 $items_to_catch 的 var_dump:

array (size=3)
    0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="214c534940515158614459404c514d440f424e4c" rel="noreferrer noopener nofollow">[email protected]</a>' (length=19)
    1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="701d02031f1d151f1e15151c0315301508111d001c155e131f1d" rel="noreferrer noopener nofollow">[email protected]</a>' (length=25)
    2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="274a5554484a42484942424b54421567425f464a574b420944484a" rel="noreferrer noopener nofollow">[email protected]</a>' (length=26)

请注意每封电子邮件中的额外字符，即换行符。

关于PHP - 创建新文件，其中包含 file1 中的所有行，但不包含 file2 中的行中的任何文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37123613/

PHP - 创建新文件，其中包含 file1 中的所有行，但不包含 file2 中的行中的任何文本

更新

上一篇：python-2.7 - HTML 解析和错误\xa0

下一篇：html - 如何测试 srcset 算法