我正在尝试在 R 中复制 Stata 输出。我正在使用数据集 affairs .我在复制具有强大标准错误的 probit 函数时遇到了麻烦。
Stata 代码如下所示:probit affair male age yrsmarr kids relig educ ratemarr, r
我已经开始:
probit1 <- glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr,
family = binomial (link = "probit"), data = mydata)
然后我尝试了对
sandwich
的各种调整包,例如:myProbit <- function(probit1, vcov = sandwich(..., adjust = TRUE)) {
print(coeftest(probit1, vcov = sandwich(probit1, adjust = TRUE)))
}
或(所有类型
HC0
到 HC5
):myProbit <- function(probit1, vcov = sandwich) {
print(coeftest(probit1, vcovHC(probit1, type = "HC0"))
}
或者,正如所建议的那样 here (我是否必须为
object
输入不同的内容?):sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(probit1, vcov = sandwich1)
这些尝试都没有导致来自 stata 输出的相同标准误差或 z 值。
希望有 build 性的意见!
提前致谢!
最佳答案
对于正在考虑加入这辆旅行车的人们,这里有一些代码演示了这个问题(数据 here):
clear
set more off
capture ssc install bcuse
capture ssc install rsource
bcuse affairs
saveold affairs, version(12) replace
rsource, terminator(XXX)
library("foreign")
library("lmtest")
library("sandwich")
mydata<-read.dta("affairs.dta")
probit1<-glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr, family = binomial (link = "probit"), data = mydata)
sandwich1 <- function(object,...) sandwich(object) * nobs(object)/(nobs(object) - 1)
coeftest(probit1,vcov = sandwich1)
XXX
probit affair male age yrsmarr kids relig educ ratemarr, robust cformat(%9.6f) nolog
R 给出:
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.764157 0.546692 1.3978 0.1621780
male 0.188816 0.133260 1.4169 0.1565119
age -0.024400 0.011423 -2.1361 0.0326725 *
yrsmarr 0.054608 0.019025 2.8703 0.0041014 **
kids 0.208072 0.168222 1.2369 0.2161261
relig -0.186085 0.053968 -3.4480 0.0005647 ***
educ 0.015506 0.026389 0.5876 0.5568012
ratemarr -0.272711 0.053668 -5.0814 3.746e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Stata产量:
Probit regression Number of obs = 601
Wald chi2(7) = 54.93
Prob > chi2 = 0.0000
Log pseudolikelihood = -305.2525 Pseudo R2 = 0.0961
------------------------------------------------------------------------------
| Robust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.131927 1.43 0.152 -0.069755 0.447390
age | -0.024400 0.011124 -2.19 0.028 -0.046202 -0.002597
yrsmarr | 0.054608 0.018963 2.88 0.004 0.017441 0.091775
kids | 0.208075 0.166243 1.25 0.211 -0.117754 0.533905
relig | -0.186085 0.053240 -3.50 0.000 -0.290435 -0.081736
educ | 0.015505 0.026355 0.59 0.556 -0.036150 0.067161
ratemarr | -0.272710 0.053392 -5.11 0.000 -0.377356 -0.168064
_cons | 0.764160 0.534335 1.43 0.153 -0.283117 1.811437
------------------------------------------------------------------------------
附录:
系数协方差估计的差异是由于
不同的拟合算法。在 R 中,
glm
命令使用迭代最小二乘法,而Stata的probit
使用基于 Newton-Raphson 算法的 ML 方法。你可以用 glm
来匹配 R 正在做的事情在 Stata 与 irls
选项:glm affair male age yrsmarr kids relig educ ratemarr, irls family(binomial) link(probit) robust
这产生:
Generalized linear models No. of obs = 601
Optimization : MQL Fisher scoring Residual df = 593
(IRLS EIM) Scale parameter = 1
Deviance = 610.5049916 (1/df) Deviance = 1.029519
Pearson = 619.0405832 (1/df) Pearson = 1.043913
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = invnorm(u) [Probit]
BIC = -3183.862
------------------------------------------------------------------------------
| Semirobust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.133260 1.42 0.157 -0.072367 0.450002
age | -0.024400 0.011422 -2.14 0.033 -0.046787 -0.002012
yrsmarr | 0.054608 0.019025 2.87 0.004 0.017319 0.091897
kids | 0.208075 0.168222 1.24 0.216 -0.121634 0.537785
relig | -0.186085 0.053968 -3.45 0.001 -0.291862 -0.080309
educ | 0.015505 0.026389 0.59 0.557 -0.036216 0.067226
ratemarr | -0.272710 0.053668 -5.08 0.000 -0.377898 -0.167522
_cons | 0.764160 0.546693 1.40 0.162 -0.307338 1.835657
------------------------------------------------------------------------------
这些将是接近的,但不完全相同。我不知道如何让 R 在没有大量工作的情况下使用 NR 之类的东西。
关于在 R 中使用鲁棒错误复制 Stata Probit,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30236487/