python - 使用 6 年数据集预测销售额 - python

我正在尝试根据 6 年数据集 1/1/2014==> 1/1/2020 来预测需求。首先，我尝试按月重新组合需求，因此我最终得到了 2 列(月份和销售额)和 72 行(12 个月 * 6 年)的数据集。 P.s:我正在使用 python。

我的第一个问题是:在知道我只有 72 行的情况下，是否足以获得明年(2020 年)的预测。

我的第二个问题是，您是否可以建议我使用任何模型，并且这会给我带来很好的准确性？

我尝试过将 arima 模型与季节性(sarimax)和 LSTM 相结合，但它不起作用，我不确定我是否做得对。

我的第三个问题是:Python 中是否有任何测试可以告诉你是否存在季节性？

#shrink the dataset
dataa=data[(data['Produit']=='ACP NOR/STD')&(data['Région']=='Europe')]

gb2=dataa.groupby(by=[dataa['Mois'].dt.strftime('%Y, %m')])['Chargé (T)'].sum().reset_index()
gb2.Mois=pd.to_datetime(gb2.Mois)

[#create a time serie][2]
series = pd.Series(gb2['Chargé (T)'].values, index=gb2.Mois)


#decompose the dataset to 3 things: trend, seasonality and noise
from pylab import rcParams
import statsmodels.api as sm
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(series, model='additive')
fig = decomposition.plot()
plt.show()


    #calculate acf and pacf to know in which order to stop

    from statsmodels.graphics.tsaplots import plot_acf
    from statsmodels.graphics.tsaplots import plot_pacf
    from matplotlib import pyplot

    pyplot.figure()
    pyplot.subplot(211)
    plot_acf(series, ax=pyplot.gca())
    pyplot.subplot(212)
    plot_pacf(series, ax=pyplot.gca())
    pyplot.show()

import itertools
p = d = q = range(0, 5)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))


    import warnings
    warnings.filterwarnings("ignore")
    for param in pdq:
        for param_seasonal in seasonal_pdq:
            try:
                mod = sm.tsa.statespace.SARIMAX(series,
                                                order=param,
                                                seasonal_order=param_seasonal,
                                                enforce_stationarity=False,
                                                enforce_invertibility=False)

                results = mod.fit()

                print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            except:
                continue

mod = sm.tsa.statespace.SARIMAX(series,
                                order=(0, 1, 2),
                                seasonal_order=(0, 4, 0, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)

    results = mod.fit()

    print(results.summary().tables[1])
    results.plot_diagnostics(figsize=(16, 8))
    plt.show()
    #get predictions
    pred = results.get_prediction(start=pd.to_datetime('2019-01-01'), dynamic=False)
    pred_ci = pred.conf_int()

    ax = series['2014':].plot(label='observed')
    pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.8, figsize=(14, 7))

    ax.fill_between(pred_ci.index,
                    pred_ci.iloc[:, 0],
                    pred_ci.iloc[:, 1], color='k', alpha=.2)

    ax.set_xlabel('Date')
    ax.set_ylabel('Chargé (T)')
    plt.legend()

    plt.show()

预测与现实无关...... 我真的很感激任何人的帮助。

最佳答案

据我所知，我们可以使用如此多的数据来产生有意义的预测数据(这意味着每个月您使用 6 个数据点来拟合模型)，但尝试使用尽可能多的数据 - 然后你的准确率只会提高。
时间序列几乎总是存在一些季节性，甚至更多，还有一种趋势。所以你需要分解你原来的时间系列到趋势、季节和残差，所有预测都将用残差完成。关于模型 - ARIMA 足以预测时间序列，为了使其更精确，只需调整你的使用 PACF 和 ACF 的参数(p 和 q)。
换句话说，我们进行分解以使时间序列平稳
- 从中提取残差(我们应该仅在固定数据上训练我们的模型)。您宁愿检查平稳性，而不是季节性
- 有 ADF 测试。

我对此做了很多研究，并且有一个关于 ts 预测的项目，here是示例，其中描述了所有步骤:

关于python - 使用 6 年数据集预测销售额 - python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60185181/

python - 使用 6 年数据集预测销售额 - python

上一篇：python - 从不同文件夹级别导入类 - Python

下一篇：python - 有没有办法在 Neo4j 或 NetworkX 中找到图的中心？