python - 如何使用 statsmodels 获取多元线性回归的标准化(Beta)系数

标签 python pandas regression statsmodels coefficients

当使用 pandas statsmodels 的 .summary() 函数时，OLS 回归结果包括以下字段。

coef    std err          t      P>|t|      [0.025      0.975]

如何获得标准化系数(不包括截距)，与 SPSS 中可实现的类似？

最佳答案

您只需首先使用 z 分布(即 z 分数)标准化原始 DataFrame，然后执行线性回归。

假设您将数据框命名为 df，它具有自变量 x1、x2 和 x3，和因变量y。考虑以下代码:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

现在，coef 将向您显示标准化(β)系数，以便您可以比较它们对因变量的影响。

注释:

请记住，您需要 .dropna()。否则，如果列有任何缺失值，stats.zscore 将返回所有 NaN。
您可以手动选择列，但要确保选择的所有列都是数字，而不是使用 .select_dtypes()。
如果您只关心标准化(beta)系数，也可以使用 result.params 仅返回它。它通常以科学计数法的方式显示。您可以使用诸如 round(result.params, 5) 之类的方法对它们进行四舍五入。

关于python - 如何使用 statsmodels 获取多元线性回归的标准化(Beta)系数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50842397/

上一篇：python - SQLAlchemy - 获取表的创建日期

下一篇：python列表到字典的数据流

相关文章：

python - 在 x 轴上使用几十年

python - 'shuffled' NumPy 数组上的维度不匹配

python - 删除超过 70% 零的列

python - 多个条件下的数据框切片 Python

r - R 中 LOOCV 拆分的线性回归返回错误

r - 使用 rpart 在回归树中搜索对应的节点

python - 使用 SQLalchemy 将数据从一个数据库移动到另一个备份数据库的最简单方法是什么？

python - 在 Google App Engine 中定义 Python 函数

python - 将特定单元格定义为 pandas 数据框中的变量

python - 高斯过程回归: standard deviation meaning