如何根据具有不同字典的公共(public)列比较和合并两个数据框?
我有以下两个数据框,
df1 = pd.DataFrame({'name':['tom','keith','sam','joe'],'assets':[{'laptop':1,'scanner':2},{'laptop':1,'printer':3}, {'car':12,'keys':34},{'power-cables':24}]})
df2 = pd.DataFrame({'place':['ca','bal-vm'],'default_assets':[{'laptop':4,'printer':3,'scanner':2,'bag':8},{'car':12,'keys':34,'mat':24,'holder':45}]})
df1:
name assets
0 tom {'laptop':1,'scanner':2}
1 keith {'laptop':1,'printer':3}
2 sam {'car':12,'keys':34}
3 joe {'power-cables':24}
df2:
place default_assets
0 ca {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 bal-vm {'car':12,'keys':34,'mat':24,'holder':45}
df2
应该与 df1
合并当 df1.assets
的所有键都被按下时位于 df2.default_assets
,否则None
应该填满。
所以结果 df
应该是,
df:
name place assets default_assets
0 tom ca {'laptop':1,'scanner':2} {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 keith ca {'laptop':1,'printer':3} {'laptop':4,'printer':3,'scanner':2,'bag':8}
2 sam bal-vm {'car':12,'keys':34} {'car':12,'keys':34,'mat':24,'holder':45}
3 joe None {'power-cables':24} None
最佳答案
您可以执行以下操作:
- 将 df1 的每一行与 df2 进行交叉连接(叉积)
- 然后过滤掉
df1.assets
的所有键不在df2.default_assets
中的行。 - 添加从 df1 中过滤掉的行,其中 pandas.concat .
例如:
# cross join
merged = df1.assign(key=1).merge(df2.assign(key=1), on='key').drop('key', axis=1)
# mask to filter
mask = [asset.keys() < default.keys() for asset, default in zip(merged['assets'], merged['default_assets'])]
# add those not in the mask
result = pd.concat([merged.loc[mask], df1], sort=True).drop_duplicates('name')
# print in full
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(result)
输出
assets \
0 {'laptop': 1, 'scanner': 2}
2 {'laptop': 1, 'printer': 3}
5 {'car': 12, 'keys': 34}
3 {'power-cables': 24}
default_assets name place
0 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... tom ca
2 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... keith ca
5 {'car': 12, 'keys': 34, 'mat': 24, 'holder': 45} sam bal-vm
3 NaN joe NaN
关于python - Pandas : Merge 2 dataframe based on common column which contains dictionary,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58858564/