Python Pandas : Compare two CSV files and delete lines from both the file by matching a column

如果第一列的值不存在于另一个文件中，我们需要从两个文件中删除行。

让我们考虑两个 CSV 文件:

file1.csv:
yrdi_391    111    1.11    1.0    1.1    111.0
yfyrn_9132  222    2.22    2.0    2.2    222.0
kdkfke_392  999    9.99    9.0    9.9    999.0
hfeisk_3    333    3.33    3.0    3.3    333.0

file2.csv:
yrdi_391    444    4.44    4.0    4.4    444.0
yfyrn_9132  555    5.55    5.0    5.5    555.0
hfeisk_3    666    6.66    6.0    6.6    666.0
fhedn_271   888    8.88    8.0    8.8    888.0

现在，我们需要从 file1.csv 中删除以 kdkfke_392 开头的整行，因为 file2.csv 中不存在该行。

另一方面，我们需要删除以 fhedn_271 开头的整行，因为它不存在于 file1.csv 中。

预期结果:

file1.csv:
yrdi_391    111    1.11    1.0    1.1    111.0
yfyrn_9132  222    2.22    2.0    2.2    222.0
hfeisk_3    333    3.33    3.0    3.3    333.0

file2.csv:
yrdi_391    444    4.44    4.0    4.4    444.0
yfyrn_9132  555    5.55    5.0    5.5    555.0
hfeisk_3    666    6.66    6.0    6.6    666.0

到目前为止，file1.csv 和 file2.csv 中的行尚未排序。如果需要，我们可以先进行排序，然后进行删除。

Pandas CVS 相关操作是首选，因为我们在这两个文件中都有标题并且需要保留它们。

Python 脚本新手!

任何帮助将不胜感激!

最佳答案

您可以使用isin() .

print (df)

            0    1     2    3    4      5
0    yrdi_391  111  1.11  1.0  1.1  111.0
1  yfyrn_9132  222  2.22  2.0  2.2  222.0
2  kdkfke_392  999  9.99  9.0  9.9  999.0
3    hfeisk_3  333  3.33  3.0  3.3  333.0

print (df1)

            0    1     2    3    4      5
0    yrdi_391  444  4.44  4.0  4.4  444.0
1  yfyrn_9132  555  5.55  5.0  5.5  555.0
2    hfeisk_3  666  6.66  6.0  6.6  666.0
3   fhedn_271  888  8.88  8.0  8.8  888.0

<小时/>

csv_df = df[df[0].isin(df1[0])]

print (csv_df)
            0    1     2    3    4      5
0    yrdi_391  111  1.11  1.0  1.1  111.0
1  yfyrn_9132  222  2.22  2.0  2.2  222.0
3    hfeisk_3  333  3.33  3.0  3.3  333.0

csv_df1 = df1[df1[0].isin(df[0])]

print (csv_df1)
            0    1     2    3    4      5
0    yrdi_391  444  4.44  4.0  4.4  444.0
1  yfyrn_9132  555  5.55  5.0  5.5  555.0
2    hfeisk_3  666  6.66  6.0  6.6  666.0

csv_df.to_csv('temp.csv', index=False)
csv_df1.to_csv('temp1.csv', index=False)

关于Python Pandas : Compare two CSV files and delete lines from both the file by matching a column，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53283531/

Python Pandas : Compare two CSV files and delete lines from both the file by matching a column

上一篇：python - Tkinter Optionmenu StringVar.get() 返回空白

下一篇：python - 在指定位置提供 django 应用程序 (NGINX)