python - 具有分类变量的 statsmodels 中的聚类标准误差 (Python)

我想在使用分类变量和聚类标准误差的 statsmodels 中运行回归。

我有一个包含机构、治疗、年份和注册列的数据集。 Treatment 是一个 dummy，institution 是一个字符串，其他都是数字。我已确保删除所有空值。

df.dropna()    
reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df)
.fit(cov_type='cluster', cov_kwds={'groups': df['institution']})

我得到以下信息:

ValueError: The weights and list don't have the same length.

有没有办法解决这个问题，以便我的标准错误集群？

最佳答案

您需要合适的cov_type='cluster'。

cov_type 是关键字参数，当关键字用作位置参数时，位置不正确。 http://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.fit.html

一般来说，当关键字参数用作位置参数时，statsmodels 不保证向后兼容性，即关键字位置可能会在未来版本中发生变化。

但是，我不明白 ValueError 是从哪里来的。 Python 具有非常有用的回溯信息，在提出问题以添加完整回溯信息或至少添加显示异常发生位置的最后几行时非常有用。

关于python - 具有分类变量的 statsmodels 中的聚类标准误差 (Python)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54349525/

上一篇：c# - 如何转义 Razor 页面中属性内的引号

下一篇：azure-hdinsight - 在 Azure Databricks 集群上使用 HDInsights SPARK 的优势

python - 使用 Pandas 中数据框的最后一列单独回归每一列

python - 在 Python 中将数据帧拆分为多个 5 秒数据帧

python - 为什么 PyImport_Import 无法从当前目录加载模块？

python - Python中LOWESS的置信区间

python - 初学者统计数据 : Predict binary outcome of set of numbers given history (Logistic regression)

python - SQLAlchemy:如何选择是在一个列表还是另一个列表中？

python - 在Python中使用OpenCV库对图像进行阈值处理，并使用for循环使用不同的标志

pytorch - 如何使用 RoBERTa 执行多输出回归？

r - 重命名data.frame中的整数