为什么这些 GLMM 如此不同?
两者都是用 lme4 制作的,都使用相同的数据,但一个是根据成功和试验 (m1bin) 构建的,而一个仅使用原始精度数据 (m1)。我是否完全错误地认为 lme4 从原始数据中找出二项式结构? (BRMS 做得很好。)现在我担心我的一些分析会改变。
d:
uniqueid dim incorrectlabel accuracy
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 0
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental extreme 1
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 1
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 1
5 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 0
6 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 0
dbin:
uniqueid dim incorrectlabel right count
<fctr> <fctr> <fctr> <int> <int>
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental extreme 3 3
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 1 5
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant extreme 3 4
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 3 4
5 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental extreme 3 4
6 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental marginal 2 4
> summary(m1bin)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(right, count) ~ dim * incorrectlabel + (1 | uniqueid)
Data: dbin
AIC BIC logLik deviance df.resid
398.2 413.5 -194.1 388.2 151
Scaled residuals:
Min 1Q Median 3Q Max
-1.50329 -0.53743 0.08671 0.38922 1.28887
Random effects:
Groups Name Variance Std.Dev.
uniqueid (Intercept) 0 0
Number of obs: 156, groups: uniqueid, 39
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.48460 0.13788 -3.515 0.00044 ***
dimrelevant -0.13021 0.20029 -0.650 0.51562
incorrectlabelmarginal -0.15266 0.18875 -0.809 0.41863
dimrelevant:incorrectlabelmarginal -0.02664 0.27365 -0.097 0.92244
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) dmrlvn incrrc
dimrelevant -0.688
incrrctlblm -0.730 0.503
dmrlvnt:ncr 0.504 -0.732 -0.690
> summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: accuracy ~ dim * incorrectlabel + (1 | uniqueid)
Data: d
AIC BIC logLik deviance df.resid
864.0 886.2 -427.0 854.0 619
Scaled residuals:
Min 1Q Median 3Q Max
-1.3532 -1.0336 0.7524 0.9350 1.1514
Random effects:
Groups Name Variance Std.Dev.
uniqueid (Intercept) 0.04163 0.204
Number of obs: 624, groups: uniqueid, 39
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.140946 0.088242 1.597 0.1102
dim1 0.155923 0.081987 1.902 0.0572 .
incorrectlabel1 0.180156 0.081994 2.197 0.0280 *
dim1:incorrectlabel1 0.001397 0.082042 0.017 0.9864
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) dim1 incrr1
dim1 0.010
incrrctlbl1 0.128 0.006
dm1:ncrrct1 0.005 0.138 0.010
我以为它们是一样的。在 BRMS 中建模可以得到具有相同估计值的相同模型。
最佳答案
它们应该是相同的(直到很小的数值差异:见下文),除了对数似然和基于它们的度量(尽管一系列模型在对数似然/中存在差异) AIC/等应该是相同的)。我认为您的问题是使用 cbind(right, count)
而不是 cbind(right, count-right)
:来自 ?glm
,
For binomial ... families the response can also be specified as ... a two-column matrix with the columns giving the numbers of successes and failures.
(强调指出这不是成功的次数和总数,而是成功和失败的次数)。
下面是一个使用内置数据集的示例,比较了聚合数据集和分解数据集的拟合情况:
library(lme4)
library(dplyr)
## disaggregate
cbpp_disagg <- cbpp %>% mutate(obs=seq(nrow(cbpp))) %>%
group_by(obs,herd,period,incidence) %>%
do(data.frame(disease=rep(c(0,1),c(.$size-.$incidence,.$incidence))))
nrow(cbpp_disagg) == sum(cbpp$size) ## check
g1 <- glmer(cbind(incidence,size-incidence)~period+(1|herd),
family=binomial,cbpp)
g2 <- glmer(disease~period+(1|herd),
family=binomial,cbpp_disagg)
## compare results
all.equal(fixef(g1),fixef(g2),tol=1e-5)
all.equal(VarCorr(g1),VarCorr(g2),tol=1e-6)
关于r - 成功构造 LME4 GLMM 时有所不同 |试验与原始数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47620938/