给定两个数据框如下:
import pandas as pd
# Creating a DataFrame object
df1 = pd.DataFrame([('Stuti', 28, 'Varanasi'),
('Saumya', 32, 'Delhi'),
('Aaditya', 25, 'Mumbai'),
('Saumya', 32, 'Delhi')],
columns = ['Name', 'Score', 'City'])
df2 = pd.DataFrame([('Saumya', 32, 'Delhi'),
('Saumya', 32, 'Mumbai'),
('Aaditya', 40, 'Mumbai'),
('Seema', 32, 'Delhi')],
columns = ['Name', 'Score', 'City'])
我如何为 df2
创建掩码以根据 df1
和列 Name
和 City
过滤重复的行>,如果df1
中存在相同的配对,则返回check
列Duplicated
,否则返回None
。
预期的结果会是这样的:
Name Score City Check
0 Saumya 32 Delhi Duplicated
1 Saumya 32 Mumbai None
2 Aaditya 40 Dehradun Duplicated
3 Seema 32 Delhi None
更新代码:
df = pd.concat([df1, df2])
df[df.duplicated(['Name', 'City'])]
输出:
Name Score City
3 Saumya 32 Delhi
0 Saumya 32 Delhi
2 Aaditya 40 Mumbai
最佳答案
In [65]: df2.merge(df1[['Name', 'City']].drop_duplicates(), how='left', indicator='Check').assign(Check=lambda x: np.where(x['Check'] == 'both', 'Duplicated', None))
Out[65]:
Name Score City Check
0 Saumya 32 Delhi Duplicated
1 Saumya 32 Mumbai None
2 Aaditya 40 Mumbai Duplicated
3 Seema 32 Delhi None
关于python - 根据所选列过滤重复的行并与 Pandas 中的另一个数据框进行比较,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65559950/