我有两个数据框:
df1 = pd.DataFrame(data =
{'Invoice' : [1, 2, 3, 4, 5], 'Value' : [10, 25, 40, 10, 15]})
df2 = pd.DataFrame(data =
{'Invoice' : [2, 3, 5, 2], 'Value' : [25, 11, 15,25], 'TestData':["A",'B','C','D']})
我已经合并它们并得到 df3:
df3=pd.merge(df1,df2, left_on=["Invoice","Value"], right_on=["Invoice","Value"])
Df3 输出:
Invoice Value TestData
0 2 25 A
1 2 25 D
2 5 15 C
我的问题是如何使用“一对一”合并数据框(我的意思是 - 当发票号 2 在两个数据框之一中仅出现一次(或通常更少)时,则不要使用发票创建另一行合并数据框中的第 2 个)。我想得到这样的东西:
Invoice Value TestData
0 2 25 A
1 5 15 C
或者这个:
Invoice Value TestData
0 2 25 D
1 5 15 C
我只尝试了左右合并,但这行不通 - 总是有两行发票编号为 2。
谢谢你,
亚里克
最佳答案
使用drop_duplicates
指定列名,参数 keep='last'
用于最后复制的行:
df2 = df2.drop_duplicates(["Invoice","Value"])
#same as
#df2 = df2.drop_duplicates(["Invoice","Value"], keep='first')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
df2 = df2.drop_duplicates(["Invoice","Value"], keep='last')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 D
1 5 15 C
编辑:
如果需要按所有行分组,有必要添加新列以确保唯一性:
df1['g'] = df1.groupby(['Invoice','Value']).cumcount()
df2['g'] = df2.groupby(['Invoice','Value']).cumcount()
print (df1)
Invoice Value g
0 1 10 0
1 2 25 0
2 3 40 0
3 4 10 0
4 5 15 0
print (df2)
Invoice TestData Value g
0 2 A 25 0
1 3 B 11 0
2 5 C 15 0
3 2 D 25 1
df3=pd.merge(df1,df2, on=["Invoice","Value", "g"]).drop('g', axis=1)
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
关于python - 在 Pandas 中合并 "one-to-one"数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47094551/