python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同?

标签 python pandas dataframe

根据 Ender 的 Applied Econometric Time Series ,变量 y 的二阶差分定义为: double differencing

Pandas 提供了 diff 函数,它接收“periods”作为参数。尽管如此,df.diff(2) 给出的结果与 df.diff().diff() 不同。

显示上述内容的代码摘录:

In [8]: df
Out[8]:
       C.1   C.2    C.3     C.4     C.5   C.6
C.0
1990  16.0   6.0  256.0   216.0   65536  4352
1991  17.0   7.0  289.0   343.0  131072  5202
1992   6.0  -4.0   36.0   -64.0      64   252
1993   7.0  -3.0   49.0   -27.0     128   392
1994   8.0  -2.0   64.0    -8.0     256   576
1995  13.0   3.0  169.0    27.0    8192  2366
1996  10.0   0.5  100.0     0.5    1024  1100
1997  11.0   1.0  121.0     1.0    2048  1452
1998   4.0  -6.0   16.0  -216.0      16    80
1999   5.0  -5.0   25.0  -125.0      32   150
2000  18.0   8.0  324.0   512.0  262144  6156
2001   3.0  -7.0    9.0  -343.0       8    36
2002   0.5 -10.0    0.5 -1000.0      48    20
2003   1.0  -9.0    1.0  -729.0       2     2
2004  14.0   4.0  196.0    64.0   16384  2940
2005  15.0   5.0  225.0   125.0   32768  3600
2006  12.0   2.0  144.0     8.0    4096  1872
2007   9.0  -1.0   81.0    -1.0     512   810
2008   2.0  -8.0    4.0  -512.0       4    12
2009  19.0   9.0  361.0   729.0  524288  7220

In [9]: df.diff(2)
Out[9]:
       C.1   C.2    C.3     C.4       C.5     C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN     NaN
1991   NaN   NaN    NaN     NaN       NaN     NaN
1992 -10.0 -10.0 -220.0  -280.0  -65472.0 -4100.0
1993 -10.0 -10.0 -240.0  -370.0 -130944.0 -4810.0
1994   2.0   2.0   28.0    56.0     192.0   324.0
1995   6.0   6.0  120.0    54.0    8064.0  1974.0
1996   2.0   2.5   36.0     8.5     768.0   524.0
1997  -2.0  -2.0  -48.0   -26.0   -6144.0  -914.0
1998  -6.0  -6.5  -84.0  -216.5   -1008.0 -1020.0
1999  -6.0  -6.0  -96.0  -126.0   -2016.0 -1302.0
2000  14.0  14.0  308.0   728.0  262128.0  6076.0
2001  -2.0  -2.0  -16.0  -218.0     -24.0  -114.0
2002 -17.5 -18.0 -323.5 -1512.0 -262096.0 -6136.0
2003  -2.0  -2.0   -8.0  -386.0      -6.0   -34.0
2004  13.5  14.0  195.5  1064.0   16336.0  2920.0
2005  14.0  14.0  224.0   854.0   32766.0  3598.0
2006  -2.0  -2.0  -52.0   -56.0  -12288.0 -1068.0
2007  -6.0  -6.0 -144.0  -126.0  -32256.0 -2790.0
2008 -10.0 -10.0 -140.0  -520.0   -4092.0 -1860.0
2009  10.0  10.0  280.0   730.0  523776.0  6410.0

In [10]: df.diff().diff()
Out[10]:
       C.1   C.2    C.3     C.4       C.5      C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN      NaN
1991   NaN   NaN    NaN     NaN       NaN      NaN
1992 -12.0 -12.0 -286.0  -534.0 -196544.0  -5800.0
1993  12.0  12.0  266.0   444.0  131072.0   5090.0
1994   0.0   0.0    2.0   -18.0      64.0     44.0
1995   4.0   4.0   90.0    16.0    7808.0   1606.0
1996  -8.0  -7.5 -174.0   -61.5  -15104.0  -3056.0
1997   4.0   3.0   90.0    27.0    8192.0   1618.0
1998  -8.0  -7.5 -126.0  -217.5   -3056.0  -1724.0
1999   8.0   8.0  114.0   308.0    2048.0   1442.0
2000  12.0  12.0  290.0   546.0  262096.0   5936.0
2001 -28.0 -28.0 -614.0 -1492.0 -524248.0 -12126.0
2002  12.5  12.0  306.5   198.0  262176.0   6104.0
2003   3.0   4.0    9.0   928.0     -86.0     -2.0
2004  12.5  12.0  194.5   522.0   16428.0   2956.0
2005 -12.0 -12.0 -166.0  -732.0       2.0  -2278.0
2006  -4.0  -4.0 -110.0  -178.0  -45056.0  -2388.0
2007   0.0   0.0   18.0   108.0   25088.0    666.0
2008  -4.0  -4.0  -14.0  -502.0    3076.0    264.0
2009  24.0  24.0  434.0  1752.0  524792.0   8006.0

In [11]: df.diff(2) - df.diff().diff()
Out[11]:
       C.1   C.2    C.3     C.4       C.5      C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN      NaN
1991   NaN   NaN    NaN     NaN       NaN      NaN
1992   2.0   2.0   66.0   254.0  131072.0   1700.0
1993 -22.0 -22.0 -506.0  -814.0 -262016.0  -9900.0
1994   2.0   2.0   26.0    74.0     128.0    280.0
1995   2.0   2.0   30.0    38.0     256.0    368.0
1996  10.0  10.0  210.0    70.0   15872.0   3580.0
1997  -6.0  -5.0 -138.0   -53.0  -14336.0  -2532.0
1998   2.0   1.0   42.0     1.0    2048.0    704.0
1999 -14.0 -14.0 -210.0  -434.0   -4064.0  -2744.0
2000   2.0   2.0   18.0   182.0      32.0    140.0
2001  26.0  26.0  598.0  1274.0  524224.0  12012.0
2002 -30.0 -30.0 -630.0 -1710.0 -524272.0 -12240.0
2003  -5.0  -6.0  -17.0 -1314.0      80.0    -32.0
2004   1.0   2.0    1.0   542.0     -92.0    -36.0
2005  26.0  26.0  390.0  1586.0   32764.0   5876.0
2006   2.0   2.0   58.0   122.0   32768.0   1320.0
2007  -6.0  -6.0 -162.0  -234.0  -57344.0  -3456.0
2008  -6.0  -6.0 -126.0   -18.0   -7168.0  -2124.0
2009 -14.0 -14.0 -154.0 -1022.0   -1016.0  -1596.0

为什么不同?哪一个对应于安德书中定义的那个?

最佳答案

正是因为

Δ2 yt = yt - 2 yt - 1 + y< sub>t - 2 ≠ yt - yt - 2

左侧是 df.diff().diff(),而右侧是 df.diff(2)。对于差异中的差异,您想要左侧。

关于python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50162212/

相关文章:

python - 机器学习: Getting error in Confusion Matrix

python - 当开发服务器运行时(在 Kudu 上),基于 Django 的 Azure Web App 陷入困境

python - 如何根据条件选择列?

python - pandas 内的 np reshape 应用

python - csv行开头和结尾的多余逗号,如何删除?

python - 在 Python 中具有特定顺序的 OrderedDict

python - 根据 pandas 中多列的条件删除随机 N 行

Python 删除字符串范围内的列

r - 根据条件在数据框的单元格中添加值

无限循环内的 Raspberry Pi 用户输入中的 Python 在遇到许多输入时会丢失输入