这是我的 Pandas :
df = pd.DataFrame({
'location': ['USA','USA','USA','USA', 'France','France','France','France'],
'date':['2020-11-20','2020-11-21','2020-11-22','2020-11-23', '2020-11-20','2020-11-21','2020-11-22','2020-11-23'],
'dm':[5.,4.,2.,2.,17.,3.,3.,7.]
})
对于精确位置(因此需要 groupby),我想要 dm 超过 2 天的平均值。如果我使用这个:df['rolling']=df.groupby('location').dm.rolling(2).mean().values
我得到了这个不正确的 Pandas location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 10.0
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 5.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 4.5
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 2.0
虽然它应该是: location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
两个问题:最佳答案
这是问题groupby
创造新的水平MultiIndex
,因此为了匹配原始索引值,必须通过 Series.reset_index
将其删除与 drop=True
, 如果使用 .value
then 不是按索引对齐,因此顺序应该与此处不同:
df['rolling']=df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True)
print (df)
location date dm rolling
0 USA 2020-11-20 5.0 NaN
1 USA 2020-11-21 4.0 4.5
2 USA 2020-11-22 2.0 3.0
3 USA 2020-11-23 2.0 2.0
4 France 2020-11-20 17.0 NaN
5 France 2020-11-21 3.0 10.0
6 France 2020-11-22 3.0 3.0
7 France 2020-11-23 7.0 5.0
详情 :print (df.groupby('location').dm.rolling(2).mean())
location
France 4 NaN
5 10.0
6 3.0
7 5.0
USA 0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64
print (df.groupby('location').dm.rolling(2).mean().reset_index(level=0, drop=True))
4 NaN
5 10.0
6 3.0
7 5.0
0 NaN
1 4.5
2 3.0
3 2.0
Name: dm, dtype: float64
关于python - pandas groupby 滚动行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64987852/