我在 StackExchange 上阅读了大量帖子,但找不到我需要的内容。注意:这不仅仅是删除重复项。我需要浏览 File1.csv 并创建一个新文件 - Results.csv - 其中包含的每一行不包含 File2.txt 中的行。
File1.csv 包含个人详细信息和电子邮件地址,每行 1 个:
"mr","Happy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="69041b0108191910290c11080419050c470a0604" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="acc1dedfcdc8ecc9d4cdc1dcc0c982cfc3c1" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455045425a474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="90fde2e3e4e2fffef7d0f5e8f1fde0fcf5bef3fffd" rel="noreferrer noopener nofollow">[email protected]</a>"
File2.txt 包含电子邮件地址,每行 1 个:
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9af7e8f2fbeaeae3daffe2fbf7eaf6ffb4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dfb2adacb0b2bab0b1babab3acba9fbaa7beb2afb3baf1bcb0b2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c7aab5b4a8aaa2a8a9a2a2abb4a2f587a2bfa6aab7aba2e9a4a8aa" rel="noreferrer noopener nofollow">[email protected]</a>
预期结果:Results.csv 应包含:
"mr","Sad","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="422f3031232602273a232f322e276c212d2f" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Grumpy","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ec819e8b9e99819c95ac89948d819c8089c28f8381" rel="noreferrer noopener nofollow">[email protected]</a>"
"mr","Strong","Man","<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="026f707176706d6c6542677a636f726e672c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>"
令人困惑的是,当 File2.txt 包含一行时,我的代码按预期工作。但是,当它包含多行时,Results.txt 包含 File1.csv 中的所有行(包括应删除的行),并多次重复这些行(与 File2.txt 中的行数一样多)。我有一种感觉,我已经很接近了,但我无法弄清楚。
我的代码:
<?php
$to_be_searched = "File1.csv";
$items_to_catch = file("File2.txt");
// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();
// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
// go line by line until end of file
while (($line = fgets($handle)) !== false) {
// check if line contains any items from $items_to_catch
foreach($items_to_catch as $key => $value) {
if(strpos($line, $value) === false) {
// email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
$good_lines[] = $line;
}
}
}
fclose($handle);
} else {
echo "Couldn't open " . $to_be_searched;
exit();
}
// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}
?>
我做错了什么?
最佳答案
它目前不起作用,因为在 foreach 中,您将同一行多次添加到 $good_lines
。
要解决此问题,您可以向循环中添加一个标志变量。
while (($line = fgets($handle)) !== false) {
// Declare our flag variable as false by default
$found = false;
// Loop through each item to see if the email has been found
foreach($items_to_catch as $key => $value) {
// If the email was found, stop looping in the second file
if(strpos($line, $value) !== false){
$found = true;
break;
}
}
// If the email was not found in the second file, add it to the good_lines array
if(!$found)
$good_lines[] = $line;
}
更新
除了循环之外,当您读取 File2.txt
时,还会遇到另一个问题,因为它会在字符串中添加换行符,因此,当您稍后将字符串与 进行比较时>strpos
,它不起作用。要解决这个问题:
$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);
这是不带标志的 $items_to_catch 的 var_dump:
array (size=3)
0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375a455f5647474e77524f565a475b521954585a" rel="noreferrer noopener nofollow">[email protected]</a>
' (length=20)
1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="472a3534282a22282922222b342207223f262a372b226924282a" rel="noreferrer noopener nofollow">[email protected]</a>
' (length=26)
2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="28455a5b47454d47464d4d445b4d1a684d50494558444d064b4745" rel="noreferrer noopener nofollow">[email protected]</a>
' (length=27)
这是带有标志的 $items_to_catch 的 var_dump:
array (size=3)
0 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="214c534940515158614459404c514d440f424e4c" rel="noreferrer noopener nofollow">[email protected]</a>' (length=19)
1 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="701d02031f1d151f1e15151c0315301508111d001c155e131f1d" rel="noreferrer noopener nofollow">[email protected]</a>' (length=25)
2 => string '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="274a5554484a42484942424b54421567425f464a574b420944484a" rel="noreferrer noopener nofollow">[email protected]</a>' (length=26)
请注意每封电子邮件中的额外字符,即换行符。
关于PHP - 创建新文件,其中包含 file1 中的所有行,但不包含 file2 中的行中的任何文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37123613/