pandas - 在两个 pandas 数据帧之间分配值

考虑两个数据框:

>> import pandas as pd
>> df1 = pd.DataFrame({"category": ["foo", "foo", "bar", "bar", "bar"], "quantity": [1,2,1,2,3]})
>> print(df1)

    category    quantity
0   foo         1
1   foo         2
2   bar         1
3   bar         2
4   bar         3

>> df2 = pd.DataFrame({
            "category": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "bar", "bar"], 
            "item": ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"]
        })
>> print(df2)
      category item
0      foo      A
1      foo      B
2      foo      C
3      foo      D
4      bar      E
5      bar      F
6      bar      G
7      bar      H
8      bar      I
9      bar      J

如何在 df1 中创建一个新列(名为 df3 的新数据帧)，该列连接到 df1< 的 category 列 并在 df2 中分配 item 列。因此，创建类似的内容:

>> df3 = pd.DataFrame({
           "category": ["foo", "foo", "bar", "bar", "bar"], 
           "quantity": [1,2,1,2,3],
           "item": ["A", "B,C", "E", "F,G", "H,I,J"] 
})

     category  quantity   item
0      foo         1      A
1      foo         2      B,C
2      bar         1      E
3      bar         2      F,G
4      bar         3      H,I,J

最佳答案

您可以通过按数量列重复行Index.repeat来创建辅助DataFrame与 DataFrame.loc ，将索引转换为列以避免丢失索引，并在两个DataFrame中创建辅助列g，以便按GroupBy.cumcount重复的类别进行合并，然后使用 DataFrame.merge与聚合加入:

df11 = (df1.loc[df1.index.repeat(df1['quantity'])].reset_index()
           .assign(g = lambda x: x.groupby('category').cumcount()))

df22 = df2.assign(g = df2.groupby('category').cumcount())

df = (df11.merge(df22, on=['g','category'], how='left')
          .groupby(['index','category','quantity'])['item']
          .agg(lambda x: ','.join(x.dropna()))
          .droplevel(0)
          .reset_index())
print (df)
  category  quantity   item
0      foo         1      A
1      foo         2    B,C
2      bar         1      E
3      bar         2    F,G
4      bar         3  H,I,J

关于pandas - 在两个 pandas 数据帧之间分配值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75258877/

pandas - 在两个 pandas 数据帧之间分配值

上一篇：r - 使用 Mann Whitney 创建 p 值列表

下一篇：python - 在图像锐化中进行相同操作后得到不同的图像阵列