使用效果编码重新调整因子和 glm

我无法理解 glm 的效果编码。举个例子:

data('mpg')
mpg$trans = as.factor(mpg$trans)
levels(mpg$trans)
[1] "auto(av)"   "auto(l3)"   "auto(l4)"   "auto(l5)"   "auto(l6)"   "auto(s4)"   "auto(s5)"   "auto(s6)"   "manual(m5)" "manual(m6)"

对于效果(或虚拟)编码，glm 将第一个级别作为引用级别，因此在本例中它将是“auto(av)”。

mpg_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))
summary(mpg_glm)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  24.4173     0.7318  33.366  < 2e-16 ***
trans1        3.3827     2.3592   1.434 0.153017    
trans2        2.5827     3.6210   0.713 0.476426    
trans3       -2.4534     0.9157  -2.679 0.007928 ** 
trans4       -3.6993     1.0865  -3.405 0.000784 ***
trans5       -4.4173     2.1743  -2.032 0.043375 *  
trans6        1.2494     2.9866   0.418 0.676105    
trans7        0.9160     2.9866   0.307 0.759341    
trans8        0.7702     1.4517   0.531 0.596262    
trans9        1.8758     0.9845   1.905 0.058011 .

所以我现在认为 trans1 实际上是第二级别(“auto(l3)”)，因为第一个级别是引用级别。为了测试这一点，我重新调整了因子，以便我可以看到第一个级别的系数(“auto(av)”):

mpg$trans <- relevel(mpg$trans, ref="auto(l3)")
levels(mpg$trans)
"auto(l3)"   "auto(av)"   "auto(l4)"   "auto(l5)"   "auto(l6)"   "auto(s4)"   "auto(s5)"   "auto(s6)"   "manual(m5)" "manual(m6)"

现在我期望看到第一级的系数和第二级的系数消失了(因为现在是引用级):

mpg_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))
summary(mpg_glm)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  24.4173     0.7318  33.366  < 2e-16 ***
trans1        2.5827     3.6210   0.713 0.476426    
trans2        3.3827     2.3592   1.434 0.153017    
trans3       -2.4534     0.9157  -2.679 0.007928 ** 
trans4       -3.6993     1.0865  -3.405 0.000784 ***
trans5       -4.4173     2.1743  -2.032 0.043375 *  
trans6        1.2494     2.9866   0.418 0.676105    
trans7        0.9160     2.9866   0.307 0.759341    
trans8        0.7702     1.4517   0.531 0.596262    
trans9        1.8758     0.9845   1.905 0.058011 .

我看到唯一改变的是前2个系数被切换，所以以哪个级别作为引用？我在这里缺少什么？

最佳答案

您正在使用 contr.sum，其中所有级别都与最后一个级别进行比较，并且添加了所有系数(截距除外)总和为零的约束。

您可以在mpg_glm中检查它:

mpg_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))
mpg_glm$contrasts

           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
auto(av)      1    0    0    0    0    0    0    0
auto(l3)      0    1    0    0    0    0    0    0
auto(l4)      0    0    1    0    0    0    0    0
auto(l5)      0    0    0    1    0    0    0    0
auto(l6)      0    0    0    0    1    0    0    0
auto(s4)      0    0    0    0    0    1    0    0
auto(s5)      0    0    0    0    0    0    1    0
auto(s6)      0    0    0    0    0    0    0    1
manual(m5)    0    0    0    0    0    0    0    0
manual(m6)   -1   -1   -1   -1   -1   -1   -1   -1
           [,9]
auto(av)      0
auto(l3)      0
auto(l4)      0
auto(l5)      0
auto(l6)      0
auto(s4)      0
auto(s5)      0
auto(s6)      0
manual(m5)    1
manual(m6)   -1

这里有 9 列，它们是模型中的 9 个非截距系数。由于总和为零的限制，它们或多或少是每个级别的平均值减去最后一个级别的平均值。最后一级是多余的，此处未显示:

var_means = with(mpg,tapply(hwy,trans,mean))
cbind(delta_mean = var_means[-length(var_means)]-var_means[length(var_means)],
coef = coef(mpg_glm)[-1])

           delta_mean       coef
auto(av)    3.5894737  3.3827066
auto(l3)    2.7894737  2.5827066
auto(l4)   -2.2466709 -2.4534380
auto(l5)   -3.4925776 -3.6993447
auto(l6)   -4.2105263 -4.4172934
auto(s4)    1.4561404  1.2493733
auto(s5)    1.1228070  0.9160399
auto(s6)    0.9769737  0.7702066
manual(m5)  2.0825771  1.8758101

因此，当您更改第一个级别时，您仅更改它们出现的顺序，但值不会更改。您可以轻松检查对比度:

mpg$trans <- relevel(mpg$trans, ref="auto(l3)")
new_glm = glm(hwy ~ trans, data = mpg, contrasts = list(trans = contr.sum))    
new_glm$contrasts
$trans
           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
auto(l3)      1    0    0    0    0    0    0    0
auto(av)      0    1    0    0    0    0    0    0
auto(l4)      0    0    1    0    0    0    0    0
auto(l5)      0    0    0    1    0    0    0    0
auto(l6)      0    0    0    0    1    0    0    0
auto(s4)      0    0    0    0    0    1    0    0
auto(s5)      0    0    0    0    0    0    1    0
auto(s6)      0    0    0    0    0    0    0    1
manual(m5)    0    0    0    0    0    0    0    0
manual(m6)   -1   -1   -1   -1   -1   -1   -1   -1
           [,9]
auto(l3)      0
auto(av)      0
auto(l4)      0
auto(l5)      0
auto(l6)      0
auto(s4)      0
auto(s5)      0
auto(s6)      0
manual(m5)    1
manual(m6)   -1

对于您所描述的内容，作为更改引用并具有与该引用相关的其他级别，它应该是 contr.treatment。

关于使用效果编码重新调整因子和 glm，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60377822/

使用效果编码重新调整因子和 glm

上一篇：json - Spring boot中Json Response的继承

下一篇：javascript - useEffect 和 'react-hooks/exhaustive-deps' linting