python - 时间序列数据的平稳性

标签 python r time-series statsmodels

我正在尝试使用Python中的ARIMA建模来对时间序列数据进行建模。我在默认数据系列上使用了函数 statsmodels.tsa.stattools.arma_order_select_ic 并得到 p 和 q 的值分别为 2,2。代码如下,

dates=pd.date_range('2010-11-1','2011-01-30')
dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528,
          602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568,
          44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates)
df=pd.DataFrame({'Consumption':dataseries})
df

sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic')

结果如下,

{'aic':              0            1            2
 0  1262.244974  1264.052640  1264.601342
 1  1264.098325  1261.705513  1265.604662
 2  1264.743786  1265.015529  1246.347400
 3  1265.427440  1266.378709  1266.430373
 4  1266.358895  1267.674168          NaN, 'aic_min_order': (2, 2)}

但是当我使用 Augumented Dickey Fuller 测试时,测试结果表明该序列不是平稳的。

d_order0=sm.tsa.adfuller(dataseries)
print 'adf: ', d_order0[0] 
print 'p-value: ', d_order0[1]
print'Critical values: ', d_order0[4]

if d_order0[0]> d_order0[4]['5%']: 
    print 'Time Series is  nonstationary'
    print d
else:
    print 'Time Series is stationary'
    print d

输出如下,

adf:  -1.96448506629
p-value:  0.302358888762
Critical values:  {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}
Time Series is  nonstationary
1

当我与 R 交叉验证结果时,它表明默认序列是平稳的。那么为什么增强迪基富勒检验会得出非平稳序列呢?

最佳答案

显然,您的数据存在一些季节性。然后需要仔细进行arma模型和平稳性测试。

显然,Python 和 R 之间的 adf 测试存在差异的原因是每个软件使用的默认滞后数。

> (nobs=length(dataseries))
[1] 91
> 12*(nobs/100)^(1/4)  #python default
[1] 11.72038
> trunc((nobs-1)^(1/3)) #R default
[1] 4
> acf(coredata(dataseries),plot = F)

Autocorrelations of series ‘coredata(dataseries)’, by lag

     0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.039 -0.116 -0.124 -0.094 -0.148  0.083  0.645 -0.072 -0.135 -0.138 -0.146 
    12     13     14     15     16     17     18     19 
-0.185  0.066  0.502 -0.097 -0.151 -0.165 -0.195 -0.160 
> adf.test(dataseries,k=12)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -2.6172, Lag order = 12, p-value = 0.322
alternative hypothesis: stationary

> adf.test(dataseries,k=4)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -6.276, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(dataseries, k = 4) : p-value smaller than printed p-value
> adf.test(dataseries,k=7)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -2.2571, Lag order = 7, p-value = 0.4703
alternative hypothesis: stationary

关于python - 时间序列数据的平稳性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31008084/

相关文章:

Python - API 请求不刷新

python - 在scrapy中修改CSV导出

r - ggplot 不会标记点

r - 如何对数据框的对角线求和

python - Pandas:将一个时间序列中的值应用于另一个时间序列的先前实例

python - 将 Pandas 时间序列切成 n 个月的 block

c# - 将 Mat Lab 函数与 C# 集成

Python数据结构,字典?

r - 如何获取Web浏览器密码存储区以记住R/Shiny密码?

python - 使用字典时多次打印输出