python - 如何在 IV2SLS 中使用线性模型按实体和年份进行聚类?

标签 python pandas dataframe panel-data linearmodels

我正在研究非洲国家的面板模型,其民主得分、log(人均国内生产总值)有 3 个滞后,log(降雨量)也有 3 个滞后。我正在尝试使用 IV2SLS 来查找由 log(rain) (及其滞后)引起的 log(人均 GDP)(及其滞后)中的经济冲击。我还想按国家和年份进行聚类。

我的索引是“国家”,然后是“年份”。我的列名称如下:

['民主'、'log_gdp_per_cap'、'log_gdp_per_cap_lag1'、'log_gdp_per_cap_lag2'、'log_gdp_per_cap_lag3'、'lrain_mw_l'、'lrain_mw_l2'、'lrain_mw_l3'、'lrain_mw_l4']

>

需要明确的是,“lrain_mw_l_” 列是日志(雨)和滞后。

a sample image of the dataset; the last rain lags extend off the right

我目前编写的模型如下:

数据框:

iv_log_gdp_per_cap = pd.DataFrame({"democracy": panel_df["polity2"],
                                "log_gdp_per_cap": panel_df["lgdpc"],
                                "log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
                                "log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
                                "log_gdp_per_cap_lag3": panel_df["lgpcp_l3"],
                                "lrain_mw_l": panel_df["lrain_mw_l"],
                                "lrain_mw_l2": panel_df["lrain_mw_l2"],
                                "lrain_mw_l3": panel_df["lrain_mw_l3"],
                                "lrain_mw_l4": panel_df["lrain_mw_l4"]}

 
endog_vars = pd.DataFrame({"log_gdp_per_cap": panel_df["lgdpc"],
                                "log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
                                "log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
                                "log_gdp_per_cap_lag3": panel_df["lgpcp_l3"]})


instrument_vars = pd.DataFrame({"lrain_mw_l": panel_df["lrain_mw_l"],
                                "lrain_mw_l2": panel_df["lrain_mw_l2"],
                                "lrain_mw_l3": panel_df["lrain_mw_l3"]
                                "lrain_mw_l4": panel_df["lrain_mw_l4"] })

模型本身:

iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
                 exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
                 endog = endog_vars,
                 instruments = instrument_vars)


 iv_result = iv_model.fit(cov_type = "clustered", 
                         clusters = [iv_log_gdp_per_cap["country"], 
                                     iv_log_gdp_per_cap["year"]])

它返回以下错误消息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'country'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-64-af8102ff7296> in <module>
      1 iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
----> 2                  exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
      3                  endog = endog_vars,
      4                  instruments = instrument_vars)
      5 # model_3_result = mod3.fit(cov_type = "clustered", cluster_entity = True, cluster_time = True)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'country'

问题似乎出在索引中,但我不确定是什么。那么,任何人都可以就如何使用 IV2SLS 按实体和年份使用集群标准误差提供建议吗?

编辑:

重置索引后我尝试了以下操作:

year_country = pd.DataFrame({"country": iv_log_gdp_per_cap["country"],
                                "year": iv_log_gdp_per_cap["year"]})

endog_vars = pd.DataFrame({"log_gdp_per_cap": iv_log_gdp_per_cap["log_gdp_per_cap"],
                                "log_gdp_per_cap_lag1": iv_log_gdp_per_cap["log_gdp_per_cap_lag1"],
                                "log_gdp_per_cap_lag2": iv_log_gdp_per_cap["log_gdp_per_cap_lag2"],
                                "log_gdp_per_cap_lag3": iv_log_gdp_per_cap["log_gdp_per_cap_lag3"]})

instrument_vars = pd.DataFrame({"lrain_mw_l": iv_log_gdp_per_cap["lrain_mw_l"],
                                "lrain_mw_l2": iv_log_gdp_per_cap["lrain_mw_l2"],
                                "lrain_mw_l3": iv_log_gdp_per_cap["lrain_mw_l3"],
                                "lrain_mw_l4": iv_log_gdp_per_cap["lrain_mw_l4"] })


iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
                exog = None,
                endog = endog_vars,
                instruments = instrument_vars)

iv_result = iv_model.fit(cov_type = "clustered")


iv_result

它返回一个答案,尽管我对此不太有信心。它是按“年份”和“国家/地区”进行聚类吗?它说它只是以一种方式进行聚类,我不确定通过什么方式进行聚类。如果我尝试使用降雨变量作为 gdp 变量的工具,以了解对数(人均 gdp)变化对民主的影响,我这样做对吗?我也不确定是否使用了固定效果。

最佳答案

好的,我找到了解决方案

供引用使用此https://pypi.org/project/linearmodels/

在您的情况下,它看起来如下:

from linearmodels.iv import IV2SLS

iv_result = iv_model.fit(cov_type = "clustered")
data= iv_log_gdp_per_cap
mod = IV2SLS.from_formula('democracy ~ 1 + control1 + control2 + [endog~ 
instrument1 + instrument2]', data)
res = mod.(cov_type = "clustered", 
                     clusters = [iv_log_gdp_per_cap["country"], 
                                 iv_log_gdp_per_cap["year"]])
res.summary

关于python - 如何在 IV2SLS 中使用线性模型按实体和年份进行聚类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66572544/

相关文章:

python - 在 pandas 系列中使用前一个 "row"的值

python - Python 2 中的字母在 Python 3 中的等价性是什么?

python - 想读两列日期,但只得到一列

python - 生成PCA加载矩阵时如何将pandas dataframe列设置为索引

python - 检查 pandas DataFrame 列中的字符串是否包含字符串

python - 进行多条件求和的循环

python - 在 Elasticsearch 中搜索句点和连字符分隔的字段

python - 加速 numpy 3D 数组的卷积循环?

android - 音频索引 : Generating time-aligned text tags from audio file

python - Pandas 测试失败,错误代码为 "No module named discovery"