Python:简单 OLS super 词典

标签 python arrays pandas for-loop statsmodels

我正在尝试构建一个 super 字典,其中包含许多较低级别的库

概念

我的零售银行有过去 12 年的利率,我正在尝试使用不同债券的投资组合来模拟利率。

回归公式

Y_i - Y_i-1 = A + B(X_i - X_i-1) + E

换句话来说,Y_Lag = alpha + beta(X_Lag) + 误差项

数据

Note: Y = Historic Rate

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])

到目前为止的代码

#Import packages required for the analysis

import pandas as pd
import numpy as np
import statsmodels.api as sm

def Simulation(TotalSim,j):
    #super dictionary to hold all iterations of the loop
    Super_fit_d = {}
    for i in range(1,TotalSim):
        #Create a introductory loop to run the first set of regressions
        #Each loop produces a univariate regression
        #Each loop has a fixed lag of i

        fit_d = {}  # This will hold all of the fit results and summaries
        for col in [x for x in df.columns if x != 'Historic Rate']:
            Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
            # Need to remove the NaN for fit
            Y = Y[Y.notnull()]

            X = df[col] - df[col].shift(i)
            X = X[X.notnull()]
            #Y now has more observations than X due to lag, drop rows to match
            Y = Y.drop(Y.index[0:i-1])

            if j = 1:
                X = sm.add_constant(X)  # Add a constant to the fit

            fit_d[col] = sm.OLS(Y,X).fit()
        #append the dictionary for each lag onto the super dictionary
        Super_fit_d[lag_i] = fit_d

#Check the output for one column
fit_d['Overnight'].summary()

#Check the output for one column in one segment of the super dictionary
Super_fit_d['lag_5'].fit_d['Overnight'].summary()

Simulation(11,1)

问题

我似乎每次循环都会覆盖我的字典,并且我没有正确评估 i 来将迭代索引为 lag_1、lag_2、lag_3 等。我该如何解决这个问题?

提前致谢

最佳答案

这里有几个问题:

  1. 有时使用 i,有时使用 lag_i,但仅定义了 i。为了保持一致性,我全部更改为 lag_i
  2. if j = 1 语法不正确。你需要 if j == 1
  3. 您需要返回 fit_d,以便它在循环后持续存在

我通过应用这些更改来完成它

import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])

def Simulation(TotalSim,j):
    Super_fit_d = {}
    for lag_i in range(1,TotalSim):
        #Create a introductory loop to run the first set of regressions
        #Each loop produces a univariate regression
        #Each loop has a fixed lag of i

        fit_d = {}  # This will hold all of the fit results and summaries
        for col in [x for x in df.columns if x != 'Historic Rate']:
            Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
            # Need to remove the NaN for fit
            Y = Y[Y.notnull()]

            X = df[col] - df[col].shift(lag_i)
            X = X[X.notnull()]
            #Y now has more observations than X due to lag, drop rows to match
            Y = Y.drop(Y.index[0:lag_i-1])

            if j == 1:
                X = sm.add_constant(X)  # Add a constant to the fit

            fit_d[col] = sm.OLS(Y,X).fit()
        #append the dictionary for each lag onto the super dictionary
      #  return fit_d
            Super_fit_d[lag_i] = fit_d
    return Super_fit_d



test_dict = Simulation(11,1)

第一次滞后

test_dict[1]['Overnight'].summary()

Out[76]: 
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Historic Rate   R-squared:                       0.042
Model:                            OLS   Adj. R-squared:                  0.033
Method:                 Least Squares   F-statistic:                     4.303
Date:                Fri, 28 Sep 2018   Prob (F-statistic):             0.0407
Time:                        11:15:13   Log-Likelihood:                -280.39
No. Observations:                  99   AIC:                             564.8
Df Residuals:                      97   BIC:                             570.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0048      0.417     -0.012      0.991      -0.833       0.823
Overnight      0.2176      0.105      2.074      0.041       0.009       0.426
==============================================================================
Omnibus:                        1.449   Durbin-Watson:                   2.756
Prob(Omnibus):                  0.485   Jarque-Bera (JB):                1.180
Skew:                           0.005   Prob(JB):                        0.554
Kurtosis:                       2.465   Cond. No.                         3.98
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""

第二个滞后

test_dict[2]['Overnight'].summary()

Out[77]: 
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Historic Rate   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.010
Method:                 Least Squares   F-statistic:                   0.06845
Date:                Fri, 28 Sep 2018   Prob (F-statistic):              0.794
Time:                        11:15:15   Log-Likelihood:                -279.44
No. Observations:                  98   AIC:                             562.9
Df Residuals:                      96   BIC:                             568.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0315      0.428      0.074      0.941      -0.817       0.880
Overnight      0.0291      0.111      0.262      0.794      -0.192       0.250
==============================================================================
Omnibus:                        2.457   Durbin-Watson:                   2.798
Prob(Omnibus):                  0.293   Jarque-Bera (JB):                1.735
Skew:                           0.115   Prob(JB):                        0.420
Kurtosis:                       2.391   Cond. No.                         3.84
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""

关于Python:简单 OLS super 词典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52551247/

相关文章:

python - scatter 的 'numpy.float64' 属性收到的类型 'y' 的值无效

python - Python中利用栈实现深度优先树遍历

python - 如何使用 openCV python 更改现有视频的帧速率 FPS

c++ - 从文件中获取矩阵并将其分配给具有最大大小的数组

java - 日期和时间相减

Python Pandas 选择指数大于 x 的指数

python - SQLAlchemy:可以急切加载延迟列吗?

python - 使用 python 请求的 SSLError

java - 使用 Varargs 实现函数有哪些缺点?

Python/Panda 字符串拆分 - 保留拆分器(分隔符)