我有两个表,df1 和 df2。
df1 是销售列表。
df2 是组合产品列表。
我想在df1和df2的基础上扩展到df3。
df3是单个产品的销售列表。
df1(可以想象为销售 list )
df2(可以想象为组合产品列表)
df3(可以想象为单个产品销售列表)
代码:
data1 = [["Banana", "1"],
["Apple", "2"],
["Milk", "3"],
["Banana_milk", "1"],
["Apple_milk", "1"],
["Watermelon_milk", "2"]]
df1 = pd.DataFrame(data=data1,columns=['Part_No','Quantity'])
print(df1)
data2 = [["Banana_milk", "Banana", "1"],
["Banana_milk", "Milk", "1"],
["Apple_milk", "Apple", "1"],
["Apple_milk", "Milk", "1"],
["Watermelon_milk", "Watermelon", "2"],
["Watermelon_milk", "Milk", "1"]]
df2 = pd.DataFrame(data=data2,columns=['Combination_Part_No', 'Part_No', 'Quantity'])
print(df2)
最佳答案
首先使用 DataFrame.merge
进行左连接,然后用 df1
值和多个 Quantity 替换
code> 包含 df2
中 Part_No
的缺失值Series.mul
的列,最后聚合 sum
:
df1['Quantity'] = df1['Quantity'].astype(int)
df2['Quantity'] = df2['Quantity'].astype(int)
df = df1.merge(df2,
left_on='Part_No',
right_on='Combination_Part_No',
how='left')
df['Part_No'] = df['Part_No_y'].fillna(df['Part_No_x'])
df['Quantity'] = df['Quantity_y'].mul(df['Quantity_x'], fill_value=1).astype(int)
print (df)
Part_No_x Quantity_x Combination_Part_No Part_No_y Quantity_y \
0 Banana 1 NaN NaN NaN
1 Apple 2 NaN NaN NaN
2 Milk 3 NaN NaN NaN
3 Banana_milk 1 Banana_milk Banana 1.0
4 Banana_milk 1 Banana_milk Milk 1.0
5 Apple_milk 1 Apple_milk Apple 1.0
6 Apple_milk 1 Apple_milk Milk 1.0
7 Watermelon_milk 2 Watermelon_milk Watermelon 2.0
8 Watermelon_milk 2 Watermelon_milk Milk 1.0
Part_No Quantity
0 Banana 1
1 Apple 2
2 Milk 3
3 Banana 1
4 Milk 1
5 Apple 1
6 Milk 1
7 Watermelon 4
8 Milk 2
<小时/>
df = df.groupby('Part_No', as_index=False)['Quantity'].sum()
print (df)
Part_No Quantity
0 Apple 3
1 Banana 2
2 Milk 7
3 Watermelon 4
关于python pandas如何通过其他数据框扩展数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59084198/