python - 使用 auto_arima(SARIMAX) 和傅立叶项预测具有多个季节性的时间序列

标签 python time-series fft forecasting pmdarima

我试图通过使用 auto_arima 并添加傅立叶项作为外生特征来预测 Python 中的时间序列。数据来自kaggle的Store item demand forecasting challenge 。它由 10 个商店和 50 个商品的长格式时间序列组成,导致 500 个时间序列相互堆叠。该时间序列的特殊性在于它包含每日数据以及每周和每年的季节性数据。

为了捕捉这两个级别的季节性,我首先使用 Rob J Hyndman 在 Forecasting with daily data 中推荐的 TBATS。实际上效果很好。

我也关注了这个medium article由 TBATS python 库的创建者发布,他将其与 SARIMAX + Fourier 项(也由 Hyndman 推荐)进行了比较。

但是现在,当我尝试使用第二种方法,将 pmdarima 的 auto_arima 和傅里叶项作为外生特征时,我得到了意想不到的结果。

在下面的代码中,我仅使用了分为训练数据和测试数据的 train.csv 文件(去年用于预测),并设置傅立叶项的最大阶数 K = 2。

我的问题是,我获得了一个平滑的预测(见下图),它似乎没有捕捉到每周的季节性,这与本文 article 结尾的结果不同。 。 我的代码有问题吗?

完整代码:

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consist in a long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) = 500 time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 2)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()

Actual data and forecasts over the last year of data

预先感谢您的回答!

最佳答案

如果有人感兴趣的话,这是答案。 再次感谢弗拉维亚·贾马里诺。

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consists long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 1)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, D=1, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()

enter image description here

关于python - 使用 auto_arima(SARIMAX) 和傅立叶项预测具有多个季节性的时间序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68923679/

相关文章:

python - Go 等效于 Python 中的 decode ('hex' )

python - Django Admin Media 前缀 URL 问题

python - C++ 类成员函数返回 PyObject* 段错误

python - 如何拖放到 OS X 上的 (python) 脚本?

android - AChartEngine - 通过 x 读取 y 值

c++ - C++ 中的 FFT 和 IFFT

iphone - iPhone上的频率检测

r - xts 图中的文本注释

matlab - 不等间隔时间序列的移动平均线

java - 检测仅与声音文件的特定部分相关的样本数据