python - pandas:groupby 多列，连接一列同时添加另一列

如果我有以下 df:

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        3.0    b      y       h
3        4.0    b      y       j
4        5.0    c      x       k
5        6.0    c      x       l
6        6.0    c      y       p

我想按 name 和 role 列进行分组，将 amount 相加，然后对 进行串联desc 与 , :

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        7.0    b      y       h,j
4        11.0   c      x       k,l
6        6.0    c      y       p

解决这个问题的正确方法是什么？

附带问题:假设 df 是从 .csv 中读取的并且它有其他不相关的列，我如何进行此计算然后写入新的 .csv 以及其他列(与阅读的模式相同)？

最佳答案

可能不是确切的骗局，但有很多与 groupby agg 相关的问题

df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})


    name    role    amount  desc
0   a       x       1.0     f
1   a       y       2.0     g
2   b       y       7.0     h,j
3   c       x       11.0    k,l
4   c       y       6.0     p

编辑:如果数据框中还有其他列，您可以使用“first”或“last”聚合它们，或者如果它们的值相同，则将它们包括在分组中。

选项 1:

df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})

选项 2:

df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})

关于python - pandas:groupby 多列，连接一列同时添加另一列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52546386/

上一篇：python - 更改python请求的IP

下一篇：python - 在 Python DataFrame 中如何找出具有有效列值的行数

相关文章：

python - Celery - 任务重试导致奇怪的错误

python - 有没有更优雅的方式来读取 CSV 列并与记录 ID 合并？

python - 忽略 NaN 的列比较

r - Pivot_wider，基于另一列进行排序

python - 通过遍历索引列 : python 中的重复项来运行 for 循环

python - 在 XML 文件中查找标签

python - 在 R 中运行 python 代码时出错

Python - 解析字符串，已知结构

python - 如何基于列合并 Pandas 数据框？

python - 如何停止在整数后面显示小数点 .0？ Python