python - 如何重现 statsmodels ARIMA 过滤器?

标签 python statistics time-series statsmodels arima

我正在尝试使用 stastmodels recursive_filterconvolution_filter 重现 ARIMA 模型中使用的滤波器。 (我的最终目标是使用这些过滤器对外源系列进行预白化。)

我首先使用 AR 模型和递归过滤器。这是简化的实验设置:

import numpy as np
import statsmodels as sm

np.random.seed(42)

# sample data
series = sm.tsa.arima_process.arma_generate_sample(ar=(1,-0.2,-0.5), ma=(1,), nsample=100)

model = sm.tsa.arima.model.ARIMA(series, order=(2,0,0)).fit()
print(model.summary())

它优雅地产生以下结果,这看起来很公平:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                 ARIMA(2, 0, 0)   Log Likelihood                -131.991
Date:                Wed, 07 Apr 2021   AIC                            271.982
Time:                        12:58:39   BIC                            282.403
Sample:                             0   HQIC                           276.200
                                - 100                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3136      0.266     -1.179      0.238      -0.835       0.208
ar.L1          0.2135      0.084      2.550      0.011       0.049       0.378
ar.L2          0.4467      0.101      4.427      0.000       0.249       0.645
sigma2         0.8154      0.126      6.482      0.000       0.569       1.062
===================================================================================
Ljung-Box (L1) (Q):                   0.10   Jarque-Bera (JB):                 0.53
Prob(Q):                              0.75   Prob(JB):                         0.77
Heteroskedasticity (H):               0.98   Skew:                            -0.16
Prob(H) (two-sided):                  0.96   Kurtosis:                         2.85
===================================================================================

我拟合 AR(2) 并根据 SARIMAX 结果获得滞后 1 和 2 的系数。我使用 statsmodels.tsa.filters.filtertools.recursive_filter 重现此模型的直觉如下:

filtered = sm.tsa.filters.filtertools.recursive_filter(series, ar_coeff=(-0.2135, -0.4467))

(也许还可以添加回归结果中的常数)。然而,结果的直接比较表明递归过滤器没有复制 AR 模型:

import matploylib.pyplot as plt

# ARIMA residuals
plt.plot(model.resid)

# Calculated residuals using recursive filter outcome
plt.plot(filtered)

我的做法是否错误?我应该使用不同的过滤功能吗?我的下一步是在 MA 模型上执行相同的任务,以便我可以将结果添加在一起以获得用于预白化的完整 ARMA 滤波器。

Note: this question may be valuable to somebody searching for "how can I prewhiten timeseries data?" particularly in Python using statsmodels.

最佳答案

我想您应该对 AR 部分使用 convolution_filter ,对 MA 部分使用 recursive_filter 。按顺序组合这些将适用于 ARMA 模型。或者,您可以使用 arma_innovations 获得同时适用于 AR 和 MA 部件的精确方法。以下是一些示例:

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.innovations import arma_innovations

AR(2)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, -0.2, -0.5), ma=(1,), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(2, 0, 0), trend='n').fit()
print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=res.params[:-1])[0],
    'convolution filter': sm.tsa.filters.convolution_filter(
        series, np.r_[1, -res.params[:-1]], nsides=1)}))

给出:

    ARIMA resid  arma_innovations  convolution filter
0      0.496714          0.496714                 NaN
1     -0.254235         -0.254235                 NaN
2      0.666326          0.666326            0.666326
3      1.493315          1.493315            1.493315
4     -0.256708         -0.256708           -0.256708
..          ...               ...                 ...
95    -1.438670         -1.438670           -1.438670
96     0.323470          0.323470            0.323470
97     0.218243          0.218243            0.218243
98     0.012264          0.012264            0.012264
99    -0.245584         -0.245584           -0.245584

MA(1)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1,), ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(0, 0, 1), trend='n').fit()
print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ma_params=res.params[:-1])[0],
    'convolution filter': sm.tsa.filters.recursive_filter(series, -res.params[:-1])}))

给出:

    ARIMA resid  arma_innovations    recursive filter
0      0.496714          0.496714            0.496714
1     -0.132893         -0.132893           -0.136521
2      0.646110          0.646110            0.646861
3      1.525620          1.525620            1.525466
4     -0.229316         -0.229316           -0.229286
..          ...               ...                 ...
95    -1.464786         -1.464786           -1.464786
96     0.291233          0.291233            0.291233
97     0.263055          0.263055            0.263055
98     0.005637          0.005637            0.005637
99    -0.234672         -0.234672           -0.234672

ARMA(1, 1)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, 0.5), ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), trend='n').fit()
a = res.resid

# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.params[1:2])
filtered = sm.tsa.filters.convolution_filter(tmp, np.r_[1, -res.params[:1]], nsides=1)

print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=res.params[:1], ma_params=res.params[1:2])[0],
    'combined filters': filtered}))

给出:

    ARIMA resid  arma_innovations    combined filters
0      0.496714          0.496714                 NaN
1     -0.134253         -0.134253           -0.136915
2      0.668094          0.668094            0.668246
3      1.507288          1.507288            1.507279
4     -0.193560         -0.193560           -0.193559
..          ...               ...                 ...
95    -1.448784         -1.448784           -1.448784
96     0.268421          0.268421            0.268421
97     0.212966          0.212966            0.212966
98     0.046281          0.046281            0.046281
99    -0.244725         -0.244725           -0.244725

SARIMA(1, 0, 1)x(1, 0, 0, 3)

季节性模型稍微复杂一些,因为它需要乘以滞后多项式。有关更多详细信息,请参阅 Statsmodels 文档中的 example notebook

np.random.seed(42)
ar_poly = [1, -0.5]
sar_poly = [1, 0, 0, -0.1]
ar = np.polymul(ar_poly, sar_poly)
series = sm.tsa.arma_generate_sample(ar=ar, ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), seasonal_order=(1, 0, 0, 3), trend='n').fit()
a = res.resid

# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.polynomial_reduced_ma[1:])
filtered = sm.tsa.filters.convolution_filter(tmp, res.polynomial_reduced_ar, nsides=1)

print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=-res.polynomial_reduced_ar[1:],
        ma_params=res.polynomial_reduced_ma[1:])[0],
    'combined filters': filtered}))

给出:

    ARIMA resid  arma_innovations combined filters
0      0.496714          0.496714              NaN
1     -0.100303         -0.100303              NaN
2      0.625066          0.625066              NaN
3      1.557418          1.557418              NaN
4     -0.209256         -0.209256        -0.205201
..          ...               ...              ...
95    -1.476702         -1.476702        -1.476702
96     0.269118          0.269118         0.269118
97     0.230697          0.230697         0.230697
98    -0.004561         -0.004561        -0.004561
99    -0.233466         -0.233466        -0.233466

关于python - 如何重现 statsmodels ARIMA 过滤器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66993263/

相关文章:

python - 二维 numpy 数组中行或列最常见的元素

machine-learning - sklearn TfidfVectorizer : Generate Custom NGrams by not removing stopword in them

open-source - 区域统计 QGIS

python - 运行时间序列模型时将索引更改为日期

time-series - 将随机森林用于时间序列数据集

r - 如何检测R中特定范围内的峰值

python - 使用 sys.path.append 导入 python 模块

python - 显示没有图像的 Sprite - PyGame

python - python3 之后的 ionic "Error with start undefined"

statistics - Julia |如何对 TimeArray 数据集执行线性回归