python - 如何在pandas中groupby之后压缩行

我已经对我的数据框执行了分组。

grouped = data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()

我得到以下输出:

data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()

输出[81]:

    Cluster  Visit Number Final
    0        1                     21846
             2                      1485
             3                       299
             4                        95
             5                        24
             6                         8
             7                         3
    1        1                     33600
             2                      2283
             3                       404
             4                       117
             5                        34
             6                         7
    2        1                      5858
             2                       311
             3                        55
             4                        14
             5                         6
             6                         3
             7                         1
    3        1                     19699
             2                      1101
             3                       214
             4                        78
             5                        14
             6                         8
             7                         3
    4        1                     10086
             2                       344
             3                        59
             4                        14
             5                         3
             6                         1
    Name: Visitor_ID, dtype: int64

现在我想压缩访问次数最终>3的行(添加一个新行，其中包含访问次数最终4、5、6的总和)。我正在尝试 groupby.filter 但没有得到预期的输出。我的最终输出应该类似于

Cluster  Visit Number Final 

    0        1                     21846 
             2                      1485 
             3                       299 
           >=4                       130 

    1        1                     33600 
             2                      2283 
             3                       404 
           >=4                       158 

    2        1                      5858 
             2                       311 
             3                        55 
           >=4                        24 

    3        1                     19699 
             2                      1101 
             3                       214 
           >=4                       103 

    4        1                     10086 
             2                       344 
             3                        59 
           >=4                        18

最佳答案

最简单的方法是在对数据帧进行分组之前替换大于 3 的“最终访问数”值:

df.loc[df['Visit Number Final'] > 3, 'Visit Number Final'] = '>=4'
df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()

关于python - 如何在pandas中groupby之后压缩行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54787448/

上一篇：python - 如何在 numpy 中仅将字符串数组的整数转换为 float 数组？

下一篇：python - 将不带引号的列表参数传递给脚本

python - 如何为变量赋值作为 if 语句的结果？

python - 如何使用 Panda 中的第一列保存结果

python-3.x - 基于自定义列表对数据框中的列进行排序

python - 蓝牙 (bluepy) 在 GATT 通知期间断开连接

python - 比较/组合两个字典

python - 为什么 Pandas qcut 给我大小不等的垃圾箱？

python - 按月和年对数据框列进行排序

python - 如何为两列之间的所有日期添加行？

java - boolean 条件验证