python - 在 groupby 语句中聚合列

标签 python pandas

在 dataframe 中使用 group by 时,我可以将特定列的结果收集为列表吗?

我不确定这个细节在这里是否有意义,但是在 PostgreSQL 中有一个函数 array_agg(columnname) 可以实现相同的目的。

我还尝试在 API 文档中查找详细信息,但未能成功。

train
Out[6]: 
    TripType  VisitNumber Weekday  ScanCount  DepartmentDescription
1         30            7  Friday          1                  SHOES
2         30            7  Friday          1          PERSONAL CARE
3         26            8  Friday          2  PAINT AND ACCESSORIES
4         26            8  Friday          2  PAINT AND ACCESSORIES
5         26            8  Friday          2  PAINT AND ACCESSORIES
6         26            8  Friday          1  PAINT AND ACCESSORIES
7         26            8  Friday          1  PAINT AND ACCESSORIES
8         26            8  Friday          1  PAINT AND ACCESSORIES
9         26            8  Friday         -1  PAINT AND ACCESSORIES
10        26            8  Friday          1            DSD GROCERY
11        26            8  Friday          2  PAINT AND ACCESSORIES
12        26            8  Friday          1  MEAT - FRESH & FROZEN
13        26            8  Friday          1  PAINT AND ACCESSORIES
14        26            8  Friday         -1  PAINT AND ACCESSORIES
15        26            8  Friday          2  PAINT AND ACCESSORIES
16        26            8  Friday          1  PAINT AND ACCESSORIES
17        26            8  Friday          1  PAINT AND ACCESSORIES
18        26            8  Friday          2                  DAIRY
19        26            8  Friday          1      PETS AND SUPPLIES

train.groupby(['VisitNumber','Weekday','TripType']).count()
Out[7]: 
                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                      2
8           Friday  26               17                     17

我的意思是第一组行的结果如下所示,

                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                     [SHOES,PERSONAL CARE]

数据集:

{'DepartmentDescription': {1: 'SHOES',
  2: 'PERSONAL CARE',
  3: 'PAINT AND ACCESSORIES',
  4: 'PAINT AND ACCESSORIES',
  5: 'PAINT AND ACCESSORIES'},
 'ScanCount': {1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
 'TripType': {1: 30, 2: 30, 3: 26, 4: 26, 5: 26},
 'VisitNumber': {1: 7, 2: 7, 3: 8, 4: 8, 5: 8},
 'Weekday': {1: 'Friday', 2: 'Friday', 3: 'Friday', 4: 'Friday', 5: 'Friday'}}

最佳答案

IIUC 你想要以下内容:

In [248]:
df.groupby(['VisitNumber','Weekday','TripType'])['DepartmentDescription'].apply(list)

Out[248]:
VisitNumber  Weekday  TripType
7            Friday   30                                     [SHOES, PERSONAL CARE]
8            Friday   26          [PAINT AND ACCESSORIES, PAINT AND ACCESSORIES,...
Name: DepartmentDescription, dtype: object

关于python - 在 groupby 语句中聚合列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33415585/

相关文章:

python - Pandas 发送多个 DataFrames to_csv

python - 如何使用 Flask + Babel 选择语言?

python - Pandas groupby 变换累积条件

excel - 在 pandas 数据框架上使用运算符链接来读取数据、删除不需要的列、重命名列以及读取几行

打包后 Python 子模块不可见

python - 如何正确使用Flask的jsonify()返回json?

python - 图像分类器的混淆矩阵和 F1 分数

python - Pandas 将两列分组

python - 如何使用自定义 header 将 pandas.DataFrame 写入 csv 文件?

python - 每月重新采样数据帧并将每月的计数存储在新数据帧中,其中列为日期和计数