python - 连接数据帧行并在键相同时匹配

我有两个数据框 df1 和 df2，我正在尝试找出一种生成 df3 的方法，如您在屏幕截图中所见:

因此，这里的目标是保留 df1 的所有行并在其下附加 df2 的行。但是，我想要一行来匹配名称、纬度和经度。因此，Name、Lat 和 Lon 将用作键。

还有 ZIP 列的问题。对于连接的行，我想保留 df1 的 ZIP 值。

我试过:

df3=pandas.merge(df1,df2,on=['Name','Lat','Lon'],how='outer')

这产生了接近我想要的东西:

如您所见，上面的数据框有两个不同的 ZIP 和 Address 列。

关于如何获得干净的 df3 数据框的任何想法？

最佳答案

我认为“合并”不适合这项任务(即，将左侧 DF 连接到右侧 DF)，因为您实际上是将一个 DF 放在另一个 DF 之上，然后删除重复项。所以你可以尝试这样的事情:

#put one DF 'on top' of the other (like-named columns should drop into place)
df3 = pandas.concat([df1, df2])
#get rid of any duplicates
df3.drop_duplicates(inplace = True)

编辑

根据您的反馈，我意识到需要一个更肮脏的解决方案。您将使用合并，然后从重复的列中填充 NaN。有点像

df1 = pd.DataFrame({'test':[1,2,3,6,np.nan, np.nan]})
df2 = pd.DataFrame({'test':[np.nan,np.nan,3,6,10,24]})

#some merge statement to get them into together into the var 'df'
df = pd.merge(df1, df2, left_index = True, right_index=True)

#collect the _x columns
original_cols = [x for x in df.columns if x.endswith('_x')]

for col in original_cols:
    #use the duplicate column to fill the NaN's of the original column
    duplicate = col.replace('_x', '_y')
    df[col].fillna(df[duplicate], inplace = True)

    #drop the duplicate
    df.drop(duplicate, axis = 1, inplace = True)

    #rename the original to remove the '_x'
    df.rename(columns = {col:col.replace('_x', '')}, inplace = True)

让我知道这是否有效。

关于python - 连接数据帧行并在键相同时匹配，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35704619/

python - 连接数据帧行并在键相同时匹配

上一篇：python - 模糊搜索Python

下一篇：python - 查找 DataFrame 中两列之间的时间差