我正在研究非洲国家的面板模型,其民主得分、log(人均国内生产总值)有 3 个滞后,log(降雨量)也有 3 个滞后。我正在尝试使用 IV2SLS 来查找由 log(rain) (及其滞后)引起的 log(人均 GDP)(及其滞后)中的经济冲击。我还想按国家和年份进行聚类。
我的索引是“国家”,然后是“年份”。我的列名称如下:
['民主'、'log_gdp_per_cap'、'log_gdp_per_cap_lag1'、'log_gdp_per_cap_lag2'、'log_gdp_per_cap_lag3'、'lrain_mw_l'、'lrain_mw_l2'、'lrain_mw_l3'、'lrain_mw_l4']
需要明确的是,“lrain_mw_l_”
列是日志(雨)和滞后。
我目前编写的模型如下:
数据框:
iv_log_gdp_per_cap = pd.DataFrame({"democracy": panel_df["polity2"],
"log_gdp_per_cap": panel_df["lgdpc"],
"log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
"log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
"log_gdp_per_cap_lag3": panel_df["lgpcp_l3"],
"lrain_mw_l": panel_df["lrain_mw_l"],
"lrain_mw_l2": panel_df["lrain_mw_l2"],
"lrain_mw_l3": panel_df["lrain_mw_l3"],
"lrain_mw_l4": panel_df["lrain_mw_l4"]}
endog_vars = pd.DataFrame({"log_gdp_per_cap": panel_df["lgdpc"],
"log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
"log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
"log_gdp_per_cap_lag3": panel_df["lgpcp_l3"]})
instrument_vars = pd.DataFrame({"lrain_mw_l": panel_df["lrain_mw_l"],
"lrain_mw_l2": panel_df["lrain_mw_l2"],
"lrain_mw_l3": panel_df["lrain_mw_l3"]
"lrain_mw_l4": panel_df["lrain_mw_l4"] })
模型本身:
iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
endog = endog_vars,
instruments = instrument_vars)
iv_result = iv_model.fit(cov_type = "clustered",
clusters = [iv_log_gdp_per_cap["country"],
iv_log_gdp_per_cap["year"]])
它返回以下错误消息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'country'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-64-af8102ff7296> in <module>
1 iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
----> 2 exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
3 endog = endog_vars,
4 instruments = instrument_vars)
5 # model_3_result = mod3.fit(cov_type = "clustered", cluster_entity = True, cluster_time = True)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'country'
问题似乎出在索引中,但我不确定是什么。那么,任何人都可以就如何使用 IV2SLS 按实体和年份使用集群标准误差提供建议吗?
编辑:
重置索引后我尝试了以下操作:
year_country = pd.DataFrame({"country": iv_log_gdp_per_cap["country"],
"year": iv_log_gdp_per_cap["year"]})
endog_vars = pd.DataFrame({"log_gdp_per_cap": iv_log_gdp_per_cap["log_gdp_per_cap"],
"log_gdp_per_cap_lag1": iv_log_gdp_per_cap["log_gdp_per_cap_lag1"],
"log_gdp_per_cap_lag2": iv_log_gdp_per_cap["log_gdp_per_cap_lag2"],
"log_gdp_per_cap_lag3": iv_log_gdp_per_cap["log_gdp_per_cap_lag3"]})
instrument_vars = pd.DataFrame({"lrain_mw_l": iv_log_gdp_per_cap["lrain_mw_l"],
"lrain_mw_l2": iv_log_gdp_per_cap["lrain_mw_l2"],
"lrain_mw_l3": iv_log_gdp_per_cap["lrain_mw_l3"],
"lrain_mw_l4": iv_log_gdp_per_cap["lrain_mw_l4"] })
iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
exog = None,
endog = endog_vars,
instruments = instrument_vars)
iv_result = iv_model.fit(cov_type = "clustered")
iv_result
它返回一个答案,尽管我对此不太有信心。它是按“年份”和“国家/地区”进行聚类吗?它说它只是以一种方式进行聚类,我不确定通过什么方式进行聚类。如果我尝试使用降雨变量作为 gdp 变量的工具,以了解对数(人均 gdp)变化对民主的影响,我这样做对吗?我也不确定是否使用了固定效果。
最佳答案
好的,我找到了解决方案
供引用使用此https://pypi.org/project/linearmodels/
在您的情况下,它看起来如下:
from linearmodels.iv import IV2SLS
iv_result = iv_model.fit(cov_type = "clustered")
data= iv_log_gdp_per_cap
mod = IV2SLS.from_formula('democracy ~ 1 + control1 + control2 + [endog~
instrument1 + instrument2]', data)
res = mod.(cov_type = "clustered",
clusters = [iv_log_gdp_per_cap["country"],
iv_log_gdp_per_cap["year"]])
res.summary
关于python - 如何在 IV2SLS 中使用线性模型按实体和年份进行聚类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66572544/