我想循环遍历线性回归的几个规范,并将每个模型的结果保存在 python 字典中。下面的代码有些成功,但字典中包含附加文本(例如数据类型信息),使其不可读。此外,关于置信区间,我希望有两列单独的列 - 一列用于上限,另一列用于下限 - 但我无法做到这一点。
代码:
import patsy
import statsmodels.api as sm
from collections import defaultdict
colleges = ['ARC_g',u'CCSF_g',u'DAC_g',u'DVC_g',u'LC_g',u'NVC_g',u'SAC_g', u'SRJC_g',u'SC_g',u'SCC_g']
results = defaultdict(lambda: defaultdict(int))
for exog in colleges:
exog = exog.encode('ascii')
f1 = 'GRADE_PT_103 ~ %s -1' % exog
y,X = patsy.dmatrices(f1, data,return_type='dataframe')
mod = sm.OLS(y, X) # Describe model
res = mod.fit() # Fit model
results[exog]['beta'] = res.params
#I'd like the confidence interval to be separated into two columns ('upper' and 'lower')
results[exog]['CI'] = res.conf_int()
results[exog]['rsq'] = res.rsquared
pd.DataFrame(results)
______电流输出
ARC_g | CCSF_g | ...
beta | ARC_g 0.79304 dtype: float64 | CCSF_g 0.833644 dtype: float64
CI | 0 1 ARC_g 0.557422 1.0... 0 1| CCSF_g 0.655746 1...
rsq | 0.122551 | 0.213053
最佳答案
这就是我对您所展示内容的总结。希望它能给您带来一些想法。
import pandas as pd
import statsmodels.formula.api as smf
data = pd.DataFrame(np.random.randn(30, 5), columns=list('YABCD'))
results = {}
for c in data.columns[1:]:
f = 'Y ~ {}'.format(c)
r = smf.ols(formula=f, data=data).fit()
coef = pd.concat([r.params,
r.conf_int().iloc[:, 0],
r.conf_int().iloc[:, 1]], axis=1, keys=['coef', 'lower', 'upper'])
coef.index = ['Intercept', 'Beta']
results[c] = dict(coef=coef, rsq=r.rsquared)
keys = data.columns[1:]
summary = pd.concat([results[k]['coef'].stack() for k in keys], axis=1, keys=keys)
summary.index = summary.index.to_series().str.join(' - ')
summary.append(pd.Series([results[k]['rsq'] for k in keys], keys, name='R Squared'))
关于python - 如何更好地格式化我试图从多个回归中保存的输出?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38316727/