我有一个像这样的数据框:
Product_ID Quantity Year Quarter
1 100 2021 1
1 100 2021 2
1 50 2021 3
1 100 2021 4
1 100 2022 1
2 100 2021 1
2 100 2021 2
3 100 2021 1
3 100 2021 2
我想获取每个 Product_ID 的过去三个月(不包括当月)的总和。
因此我尝试了这个:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
.rolling(3).sum().reset_index(0,drop=True)
)
# Shifting 1, because I want to exclude the current row.
# Rolling 3, because I want to have the 3 'rows' before
# Grouping by, because I want to have the calculation PER product
我的代码失败了,因为它不仅计算每个产品的数量,而且还会为我提供其他产品的数字(假设产品 2,第 1 季度:为我提供产品 1 的 3 行)。
我提出的结果:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
1 100 2021 1 0 # because we dont historical data for this id
1 100 2021 2 100 # sum of last month of this product
1 50 2021 3 200 # sum of last 2 months of this product
1 100 2021 4 250 # sum of last 3 months of this product
1 100 2022 1 250 # sum of last 3 months of this product
2 100 2021 1 0 # because we dont have hist data for this id
2 100 2021 2 100 # sum of last month of this product
3 100 2021 1 0 # etc
3 100 2021 2 100 # etc
最佳答案
您需要应用每个组的滚动总和,您可以使用apply
来实现:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
.apply(lambda s: s.shift(1,fill_value=0)
.rolling(3, min_periods=1).sum())
)
输出:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
0 1 100 2021 1 0.0
1 1 100 2021 2 100.0
2 1 50 2021 3 200.0
3 1 100 2021 4 250.0
4 1 100 2022 1 250.0
5 2 100 2021 1 0.0
6 2 100 2021 2 100.0
7 3 100 2021 1 0.0
8 3 100 2021 2 100.0
关于python - 获取每组的滚动总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72377016/