python - 24 小时内的桶时间,并对每个桶的金额进行求和

标签 python pandas pandas-groupby

我想对 24 小时制内的 Date_Time 计数进行存储,同时对每个存储桶的关联 Amt 求和。这只针对最大卷的书进行。分桶代码在 24 小时内完成,只需要帮助计算总和 Amt

数据框:

import pandas as pd
import numpy as np

df_Highest_Traded_Away_Book = [
                                ('Book', ['A', 'A','A','A','B','C','C','C']),
                                ('Amt', ['10', '10', '10', '10', '20', '30', '30', '30']),
                                ('Date_Time', ['2018-09-03 01:06:09', '2018-09-08 01:23:29',
                                                          '2018-09-15 02:23:29','2018-09-20 03:23:29',
                                                          '2018-09-20 00:23:29','2018-09-25 01:23:29',
                                                          '2018-09-25 02:23:29','2018-09-30 02:23:29',])
                              ]

获取最高销量的图书

df_Highest_Traded_Away_Book = pd.DataFrame.from_items(df_Highest_Traded_Away_Book)
df_Highest_Traded_Away_Book['Date_Time'] = pd.to_datetime(df_Highest_Traded_Away_Book['Date_Time'])
df_Highest_Traded_Away_Book['Time_in_GMT'] =  df_Highest_Traded_Away_Book['Date_Time'].dt.hour
print(df_Highest_Traded_Away_Book)

df_Highest_Book =  df_Highest_Traded_Away_Book.groupby(['Book']).size().idxmax()
print(df_Highest_Book)

24 小时内的桶时间:

df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book['Book']
              .eq(df_Highest_Book)
              .groupby(df_Highest_Traded_Away_Book['Time_in_GMT'])
              .sum()
              .astype(int)
              .reindex(np.arange(25), fill_value=0)
              .to_frame(df_Highest_Book))

print(df_Highest_Traded_Away_Book )
             A
Time_in_GMT   
0            0
1            2
2            1
3            1
4            0
5            0
6            0
7            0
8            0
9            0
10           0
11           0
12           0
13           0
14           0
15           0
16           0
17           0
18           0
19           0
20           0
21           0
22           0
23           0
24           0

所需输出:

             A
Time_in_GMT      Sum Amt
0            0   0
1            2   20
2            1   10
3            1   10
4            0   0
5            0   0
6            0   0
7            0   0
8            0   0
9            0   0
10           0   0
11           0   0
12           0   0
13           0   0
14           0   0
15           0   0
16           0   0
17           0   0
18           0   0
19           0   0
20           0   0
21           0   0
22           0   0
23           0   0
24           0   0

最佳答案

首先按boolean indexing过滤仅 df_Highest_Book 值,然后按 agg 聚合与 sizesum :

#convert column to integers
df_Highest_Traded_Away_Book['Amt'] = df_Highest_Traded_Away_Book['Amt'].astype(int)
df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book[df_Highest_Traded_Away_Book['Book']
                                  .eq(df_Highest_Book)]
                                  .groupby('Time_in_GMT')
                                  .agg({'Time_in_GMT':'size','Amt':'sum'})
                                  .reindex(np.arange(25), fill_value=0)
                                  .rename(columns={'Time_in_GMT':'Count','Amt':'Sum Amt'})
                                  )
<小时/>
print(df_Highest_Traded_Away_Book)
             Count  Sum Amt
Time_in_GMT                
0                0        0
1                2       20
2                1       10
3                1       10
4                0        0
5                0        0
6                0        0
7                0        0
8                0        0
9                0        0
10               0        0
11               0        0
12               0        0
13               0        0
14               0        0
15               0        0
16               0        0
17               0        0
18               0        0
19               0        0
20               0        0
21               0        0
22               0        0
23               0        0
24               0        0

关于python - 24 小时内的桶时间,并对每个桶的金额进行求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53879682/

相关文章:

python - 时区在 Django 中无法正常工作

python - 在循环中创建多个数据框

python - 如何用 Pandas 绘制年龄分布

python - Pandas 数据框按总和同时忽略非数值

python - 如何获取sqlite db文件的列名列表

python - 无法并排绘制多个水平条

python - 取经销商的总销售额,并得到每个经销商占所有经销商总销售额的百分比

python - 将多个条件值分配给新的 pandas 列中的百分位数

python - 使用 Pandas Group By 和 .sum() 获得 % Rate

python - python中不加控制符打印的方法