Python Pandas - 数据框中多列的滚动回归

我有一个大型数据框，其中包含 20 年期间 10,000 列(股票)的每日价格时间序列(5000 行 x 10000 列)。缺失的观测值由 NaN 表示。

            0      1      2      3      4      5       6      7      8      \
31.12.2009  30.75  66.99    NaN    NaN    NaN    NaN  393.87  57.04    NaN   
01.01.2010  30.75  66.99    NaN    NaN    NaN    NaN  393.87  57.04    NaN   
04.01.2010  31.85  66.99    NaN    NaN    NaN    NaN  404.93  57.04    NaN   
05.01.2010  33.26  66.99    NaN    NaN    NaN    NaN  400.00  58.75    NaN   
06.01.2010  33.26  66.99    NaN    NaN    NaN    NaN  400.00  58.75    NaN

现在我想在整个样本期间对每列运行 250 天窗口的滚动回归，并将系数保存在另一个数据框中

使用两个 for 循环迭代列和行效率不是很高，因此我尝试了此操作，但收到以下错误消息

def regress(start, end):
    y = df_returns.iloc[start:end].values

    if np.isnan(y).any() == False:
        X = np.arange(len(y))
        X = sm.add_constant(X, has_constant="add")
        model = sm.OLS(y,X).fit()

        return model.params[1]

    else:
        return np.nan


regression_window = 250

for t in (regression_window, len(df_returns.index)):

    df_coef[t] = df_returns.apply(regress(t-regression_window, t), axis=1)

TypeError: ("'float' object is not callable", 'occurred at index 31.12.2009')

最佳答案

这是我的版本，使用 df.rolling() 代替并迭代列。我不完全确定这就是您要找的东西，请随时发表评论

import statsmodels.regression.linear_model as sm
import statsmodels.tools.tools as sm2
df_returns =pd.DataFrame({'0':[30,30,31,32,32],'1':[60,60,60,60,60],'2':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})


def regress(X,Z):

    if np.isnan(X).any() == False:
        model = sm.OLS(X,Z).fit()        
        return model.params[1]

    else:
        return np.NaN


regression_window = 3
Z = np.arange(regression_window)
Z= sm2.add_constant(Z, has_constant="add")
df_coef=pd.DataFrame()
for col in df_returns.columns:
    df_coef[col]=df_returns[col].rolling(window=regression_window).apply(lambda col : regress(col, Z))
df_coef

关于Python Pandas - 数据框中多列的滚动回归，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58435374/

Python Pandas - 数据框中多列的滚动回归

上一篇：python - 优化Pyspark的Collect_List函数

下一篇：python - 如何匹配两个列表并且只更改每对中的第二个？