python - Pandas 在移动的数据帧上滚动

标签 python pandas

这是一段代码,我不明白为什么在最后一列 rm-5 中,前 4 项的结果为 NaN。

我知道对于 rm 列,前 4 个项目没有填充,因为没有可用数据,但是如果我移动列计算应该进行,不是吗?

同样,我不明白为什么 rm-5 列中有 5 个而不是 4 个项目是 NaN

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])

df['rm']=pd.rolling_mean(df['A'],5)
df['rm-5']=pd.rolling_mean(df['A'].shift(-5),5)

print df.head(n=8)
print df.tail(n=8)

                   A        rm      rm-5
2000-01-01  0.109161       NaN       NaN
2000-01-02 -0.360286       NaN       NaN
2000-01-03 -0.092439       NaN       NaN
2000-01-04  0.169439       NaN       NaN
2000-01-05  0.185829  0.002341  0.091736
2000-01-06  0.432599  0.067028  0.295949
2000-01-07 -0.374317  0.064222  0.055903
2000-01-08  1.258054  0.334321 -0.132972
                   A        rm      rm-5
2000-04-02  0.499860 -0.422931 -0.140111
2000-04-03 -0.868718 -0.458962 -0.182373
2000-04-04  0.081059 -0.443494 -0.040646
2000-04-05  0.500275 -0.093048       NaN
2000-04-06 -0.253915 -0.008288       NaN
2000-04-07 -0.159256 -0.140111       NaN
2000-04-08 -1.080027 -0.182373       NaN
2000-04-09  0.789690 -0.040646       NaN

最佳答案

您可以更改操作顺序。现在你先移动然后取平均值。由于您的第一个类次,您最后创建了 NaN。

index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])

df['rm']=pd.rolling_mean(df['A'],5)
df['shift'] = df['A'].shift(-5)
df['rm-5-shift_first']=pd.rolling_mean(df['A'].shift(-5),5)
df['rm-5-mean_first']=pd.rolling_mean(df['A'],5).shift(-5)

print( df.head(n=8))
print( df.tail(n=8))

                   A        rm     shift  rm-5-shift_first  rm-5-mean_first
2000-01-01 -0.120808       NaN  0.830231               NaN         0.184197
2000-01-02  0.029547       NaN  0.047451               NaN         0.187778
2000-01-03  0.002652       NaN  1.040963               NaN         0.395440
2000-01-04 -1.078656       NaN -1.118723               NaN         0.387426
2000-01-05  1.137210 -0.006011  0.469557          0.253896         0.253896
2000-01-06  0.830231  0.184197 -0.390506          0.009748         0.009748
2000-01-07  0.047451  0.187778 -1.624492         -0.324640        -0.324640
2000-01-08  1.040963  0.395440 -1.259306         -0.784694        -0.784694
                   A        rm     shift  rm-5-shift_first  rm-5-mean_first
2000-04-02 -1.283123 -0.270381  0.226257          0.760370         0.760370
2000-04-03  1.369342  0.288072  2.367048          0.959912         0.959912
2000-04-04  0.003363  0.299997  1.143513          1.187941         1.187941
2000-04-05  0.694026  0.400442       NaN               NaN              NaN
2000-04-06  1.508863  0.458494       NaN               NaN              NaN
2000-04-07  0.226257  0.760370       NaN               NaN              NaN
2000-04-08  2.367048  0.959912       NaN               NaN              NaN
2000-04-09  1.143513  1.187941       NaN               NaN              NaN

更多请看:

http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html

关于python - Pandas 在移动的数据帧上滚动,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27479800/

相关文章:

python - Tkinter 页面未加载

python - 从嵌套字典构建数据框的最佳方法

python - python中两个时间戳的平均值

python - 为什么 Ubuntu 上的 PhantomJS 会被 Google map 注册为触摸设备?

python - 使用 Pandas Melt 获得交替结果

python - 计算 df pandas 中所有列的扩展平均值

python - pandas - 忽略填充为 0 的行或列

python - 调用函数时如何更改函数中的字符串?

python - 使用 botocore stub 时出现 UnrecognizedClientException

python - 这个打印语法是如何工作的?打印 ('something' , ['a' , 'list' ][ boolean 值])