所以我有一个这样的数据框。
df1 = pd.DataFrame({'period':['2021-02'], 'customer':['A'], 'product':['Apple'], 'sales_flag': ['Yes']})
df2 = pd.DataFrame({'period':['2021-03', '2021-04', '2021-04'], 'customer':['A', 'A', 'A'], 'product':['Banana', 'Apple', 'Tangerine'], 'feedback_flag': ['Yes', 'Yes', 'No']})
我想像这样加入数据框。
period customer product sales_flag feedback_flag
'2021-02' A Apple Yes NULL
'2021-02' A Banana NULL NULL
'2021-02' A Tangerine NULL NULL
'2021-03' A Apple NULL NULL
'2021-03' A Banana NULL Yes
'2021-03' A Tangerine NULL NULL
'2021-04' A Apple NULL Yes
'2021-04' A Banana NULL NULL
'2021-04' A Tangerine NULL No
我的代码是这样的。但它没有用。
df3 = df1.merge(df2, on = ['period', 'customer', 'product'], how = 'outer')
你知道如何让它发挥作用吗?
最佳答案
试穿外衣 merge
其次是 groupby apply
至 reindex
然后基于独特的产品ffill
+ bfill
填写期间和客户:
def reindex_group(g, idx):
g = g.set_index('product').reindex(idx)
g[['period', 'customer']] = g[['period', 'customer']].ffill().bfill()
return g
df3 = df1.merge(df2, on=['period', 'customer', 'product'], how='outer')
products = df3['product'].unique()
df3 = (
df3.groupby(['period', 'customer'], as_index=False)
.apply(reindex_group, idx=products)
.reset_index()
.drop(columns='level_0')
)[['period', 'customer', 'product', 'sales_flag', 'feedback_flag']]
df3
:
period customer product sales_flag feedback_flag
0 2021-02 A Apple Yes NaN
1 2021-02 A Banana NaN NaN
2 2021-02 A Tangerine NaN NaN
3 2021-03 A Apple NaN NaN
4 2021-03 A Banana NaN Yes
5 2021-03 A Tangerine NaN NaN
6 2021-04 A Apple NaN Yes
7 2021-04 A Banana NaN NaN
8 2021-04 A Tangerine NaN No
关于python - 如何加入数据框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67737660/