考虑两个数据框:
>> import pandas as pd
>> df1 = pd.DataFrame({"category": ["foo", "foo", "bar", "bar", "bar"], "quantity": [1,2,1,2,3]})
>> print(df1)
category quantity
0 foo 1
1 foo 2
2 bar 1
3 bar 2
4 bar 3
>> df2 = pd.DataFrame({
"category": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "bar", "bar"],
"item": ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"]
})
>> print(df2)
category item
0 foo A
1 foo B
2 foo C
3 foo D
4 bar E
5 bar F
6 bar G
7 bar H
8 bar I
9 bar J
如何在 df1
中创建一个新列(名为 df3
的新数据帧),该列连接到 df1< 的
并在 df2 中分配 category
列item
列。因此,创建类似的内容:
>> df3 = pd.DataFrame({
"category": ["foo", "foo", "bar", "bar", "bar"],
"quantity": [1,2,1,2,3],
"item": ["A", "B,C", "E", "F,G", "H,I,J"]
})
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F,G
4 bar 3 H,I,J
最佳答案
您可以通过按数量
列重复行Index.repeat
来创建辅助DataFrame与 DataFrame.loc
,将索引转换为列以避免丢失索引
,并在两个DataFrame中创建辅助列g
,以便按GroupBy.cumcount
重复的类别
进行合并,然后使用 DataFrame.merge
与聚合加入
:
df11 = (df1.loc[df1.index.repeat(df1['quantity'])].reset_index()
.assign(g = lambda x: x.groupby('category').cumcount()))
df22 = df2.assign(g = df2.groupby('category').cumcount())
df = (df11.merge(df22, on=['g','category'], how='left')
.groupby(['index','category','quantity'])['item']
.agg(lambda x: ','.join(x.dropna()))
.droplevel(0)
.reset_index())
print (df)
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F,G
4 bar 3 H,I,J
关于pandas - 在两个 pandas 数据帧之间分配值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75258877/