PHP - 创建新文件,其中包含 file1 中的所有行,但不包含 file2 中的行中的任何文本

标签 php

我在 StackExchange 上阅读了大量帖子,但找不到我需要的内容。注意:这不仅仅是删除重复项。我需要浏览 File1.csv 并创建一个新文件 - Results.csv - 其中包含的每一行不包含 File2.txt 中的行。

File1.csv 包含个人详细信息和电子邮件地址,每行 1 个:

"mr","Happy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="69041b0108191910290c11080419050c470a0604" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="acc1dedfcdc8ecc9d4cdc1dcc0c982cfc3c1" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455045425a474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="90fde2e3e4e2fffef7d0f5e8f1fde0fcf5bef3fffd" rel="noreferrer noopener nofollow">[email protected]</a>"

File2.txt 包含电子邮件地址,每行 1 个:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9af7e8f2fbeaeae3daffe2fbf7eaf6ffb4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfb2adacb0b2bab0b1babab3acba9fbaa7beb2afb3baf1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c7aab5b4a8aaa2a8a9a2a2abb4a2f587a2bfa6aab7aba2e9a4a8aa" rel="noreferrer noopener nofollow">[email protected]</a>

预期结果:Results.csv 应包含:

"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="422f3031232602273a232f322e276c212d2f" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ec819e8b9e99819c95ac89948d819c8089c28f8381" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="026f707176706d6c6542677a636f726e672c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>"

令人困惑的是,当 File2.txt 包含一行时,我的代码按预期工作。但是,当它包含多行时,Results.txt 包含 File1.csv 中的所有行(包括应删除的行),并多次重复这些行(与 File2.txt 中的行数一样多)。我有一种感觉,我已经很接近了,但我无法弄清楚。

我的代码:

<?php
$to_be_searched = "File1.csv";

$items_to_catch = file("File2.txt");

// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();

// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
  // go line by line until end of file
  while (($line = fgets($handle)) !== false) {
    // check if line contains any items from $items_to_catch
    foreach($items_to_catch as $key => $value) {
      if(strpos($line, $value) === false) {
        // email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
        $good_lines[] = $line;
      } 
    }
  }
  fclose($handle);
} else {
  echo "Couldn't open " . $to_be_searched;
  exit();
}

// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
    file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}

?>

我做错了什么?

最佳答案

它目前不起作用,因为在 foreach 中,您将同一行多次添加到 $good_lines

要解决此问题,您可以向循环中添加一个标志变量。

while (($line = fgets($handle)) !== false) {
    // Declare our flag variable as false by default
    $found = false;

    // Loop through each item to see if the email has been found
    foreach($items_to_catch as $key => $value) {
        // If the email was found, stop looping in the second file
        if(strpos($line, $value) !== false){
            $found = true;
            break;
        } 
    }

    // If the email was not found in the second file, add it to the good_lines array
    if(!$found)
        $good_lines[] = $line;
}

更新

除了循环之外,当您读取 File2.txt 时,还会遇到另一个问题,因为它会在字符串中添加换行符,因此,当您稍后将字符串与 进行比较时>strpos,它不起作用。要解决这个问题:

$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);

这是不带标志的 $items_to_catch 的 var_dump:

array (size=3)
    0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455f5647474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=20)
    1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="472a3534282a22282922222b342207223f262a372b226924282a" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=26)
    2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="28455a5b47454d47464d4d445b4d1a684d50494558444d064b4745" rel="noreferrer noopener nofollow">[email protected]</a>
    ' (length=27)

这是带有标志的 $items_to_catch 的 var_dump:

array (size=3)
    0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="214c534940515158614459404c514d440f424e4c" rel="noreferrer noopener nofollow">[email protected]</a>' (length=19)
    1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="701d02031f1d151f1e15151c0315301508111d001c155e131f1d" rel="noreferrer noopener nofollow">[email protected]</a>' (length=25)
    2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="274a5554484a42484942424b54421567425f464a574b420944484a" rel="noreferrer noopener nofollow">[email protected]</a>' (length=26)

请注意每封电子邮件中的额外字符,即换行符。

关于PHP - 创建新文件,其中包含 file1 中的所有行,但不包含 file2 中的行中的任何文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37123613/

相关文章:

php - 当用户发布某些内容时,如何更改 mysql 时间戳的格式

PHP 奇怪的回显行为

PHP:如何获取特定VLAN的IP地址

php - Woocommerce 添加到购物车 URL

PHP 合并数组,因此两个集合的数据都是连续的 $a[0] $b[0] $a[1] $b[1]

php - 在同一页面和表单中获取结果

php - 选项数据未在数据库中更新

php - 月份名称到月份编号 - 2 月为 03?

php - php 表单中的数据未插入数据库

php - Mysql Select where time difference 列之间的日期