我想连接/合并两个 pandas 数据帧,但我没有得到正确的结果。我有以下数据框:
df1
Username | User_trim
-------------------------------
0 Maria M | Maria
1 FakeName | N/A
2 Achim B | Achim
3 FlashMaster11 | N/A
4 Fakename2 | N/A
5 Gustav W | Gustav
df2
0 |1 | 2
---------------------------------
0 Maria M | Maria | female
2 Achim B | Achim | male
5 Gustav W | Gustav | male
我想要以下结果数据框:
Username | User_trim | Gender
---------------------------------
0 Maria M | Maria | female
1 FakeName | N/A | N/A
2 Achim B | Achim | male
3 FlashMaster11 | N/A | N/A
4 Fakename2 | N/A | N/A
5 Gustav W | Gustav | male
我试过下面的代码
result = pd.concat([df1,df2], axis=1,ignore_index=True)
但是我得到了一个错误的结果,但是表格的长度是正确的。所以我尝试了这个:
df1.merge(df2,how='outer', left_on='Username', right_on=0)
这段代码似乎得到了正确的结果,但表格比 df1 大(我的意思是行)?
当我合并数据框并获取所有列时,我没有问题。我可以放下它们。只是将它们以不同的长度合并并让它们排在正确的行中是个问题。
有没有人可以给我建议如何获得结果表?
最佳答案
我认为 merge
中需要 left join
:
df = df1.merge(df2,how='left', left_on='Username', right_on=0)
print (df)
Username User_trim 0 1 2
0 Maria M Maria Maria M Maria female
1 FakeName NaN NaN NaN NaN
2 Achim B Achim Achim B Achim male
3 FlashMaster11 NaN NaN NaN NaN
4 Fakename2 NaN NaN NaN NaN
5 Gustav W Gustav Gustav W Gustav male
如果需要通过 merge
添加新列而不删除不必要的列,解决方案是首先 rename
至少一个列用于连接(这里是 Username
在两个 DataFrame
中),然后选择所有必要的列(总是连接列 + 所有其他新列):
df22 = df2.rename(columns={0:'Username', 2:'Gender'})[['Username', 'Gender']]
print (df22)
Username Gender
0 Maria M female
1 Achim B male
2 Gustav W male
df = df1.merge(df22,how='left', on='Username')
print (df)
Username User_trim Gender
0 Maria M Maria female
1 FakeName NaN NaN
2 Achim B Achim male
3 FlashMaster11 NaN NaN
4 Fakename2 NaN NaN
5 Gustav W Gustav male
如果只需要添加一个新列,请使用 map
由 set_index
创建的系列
:
df1['Gender'] = df1['Username'].map(df2.set_index(0)[2])
print (df1)
Username User_trim Gender
0 Maria M Maria female
1 FakeName NaN NaN
2 Achim B Achim male
3 FlashMaster11 NaN NaN
4 Fakename2 NaN NaN
5 Gustav W Gustav male
关于python - 如何合并/连接两个不同长度的 Pandas 数据帧?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49956302/