r - 从 flexmix 对象预测 (R)

标签 r predict mixture-model

我在 flexmix 中将一些数据拟合成两个高斯的混合分布:

data("NPreg", package = "flexmix")
mod <- flexmix(yn ~ x, data = NPreg, k = 2,
           model = list(FLXMRglm(yn ~ x, family= "gaussian"),
                        FLXMRglm(yn ~ x, family = "gaussian")))

模型拟合如下:

> mod

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, model =    list(FLXMRglm(yn ~ x, family = "gaussian"), 
    FLXMRglm(yn ~ x, family = "gaussian")))

Cluster sizes:
  1   2 
 74 126 

convergence after 31 iterations

但是我如何根据这个模型进行预测呢?

当我这样做的时候

pred <- predict(mod, NPreg)

我得到了一个列表,其中包含两个组件中每个组件的预测

要获得单个预测,我是否必须像这样添加簇大小?

single <- (74/200)* pred$Comp.1[,1] + (126/200)*pred$Comp.2[,2]

最佳答案

我使用 flexmix 按以下方式进行预测:

pred = predict(mod, NPreg)
clust = clusters(mod,NPreg)
result = cbind(NPreg,data.frame(pred),data.frame(clust))
plot(result$yn,col = c("red","blue")[result$clust],pch = 16,ylab = "yn")

Clusters in NPreg

和混淆矩阵:

table(result$class,result$clust)

Confusion Matrix for NPreg

为了获得 yn 的预测值,我选择了数据点所属的集群的组件值。

for(i in 1:nrow(result)){
  result$pred_model1[i] = result[,paste0("Comp.",result$clust[i],".1")][i]
  result$pred_model2[i] = result[,paste0("Comp.",result$clust[i],".2")][i]
}

实际结果与预测结果显示拟合(此处仅添加其中一个,因为您的两个模型相同,您将对第二个模型使用 pred_model2)。

qplot(result$yn, result$pred_model1,xlab="Actual",ylab="Predicted") + geom_abline()

Actual Vs Predicted

RMSE = sqrt(mean((result$yn-result$pred_model1)^2))

给出 5.54 的均方根误差。

这个答案基于我在使用 flexmix 时通读的许多 SO 答案。它对我的问题很有效。

您可能还对可视化这两个分布感兴趣。我的模型如下所示,它显示了一些重叠,因为组件的比率不接近 1

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, 
model = list(FLXMRglm(yn ~ x, family = "gaussian"), 
             FLXMRglm(yn ~ x, family = "gaussian")))

       prior size post>0 ratio
Comp.1 0.481  102    129 0.791
Comp.2 0.519   98    171 0.573

'log Lik.' -1312.127 (df=13)
AIC: 2650.255   BIC: 2693.133 

我还使用直方图生成密度分布以可视化两个组件。这是受 SO answer 的启发来自 betareg 的维护者。

a = subset(result, clust == 1)
b = subset(result, clust == 2)
hist(a$yn, col = hcl(0, 50, 80), main = "",xlab = "", freq = FALSE, ylim = c(0,0.06))
hist(b$yn, col = hcl(240, 50, 80), add = TRUE,main = "", xlab = "", freq = FALSE, ylim = c(0,0.06))
ys = seq(0, 50, by = 0.1)
lines(ys, dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)), col = hcl(0, 80, 50), lwd = 2)
lines(ys, dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)), col = hcl(240, 80, 50), lwd = 2)

Density of Components

# Joint Histogram
p <- prior(mod)
hist(result$yn, freq = FALSE,main = "", xlab = "",ylim = c(0,0.06))
lines(ys, p[1] * dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)) +
        p[2] * dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)))

enter image description here

关于r - 从 flexmix 对象预测 (R),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33648462/

相关文章:

r - 绘制来自 svm 拟合的数据 - 超平面

r - 模拟每两个变量之间具有不同混合依赖结构的混合数据?

machine-learning - GMM 对新数据的适应

r - 错误 `contrasts' 错误

Java,维卡 : How to predict numeric attribute?

r - Predict.glm(, type ="terms") 实际上做了什么?

python - PyTorch:交替优化中需要retain_graph=True吗?

r - 如何将我的 aovp 测试移动到硬币(或任何其他)包中?

r - 如何将data.frame转换为arules的事务

r - 计算数据帧中分组明智(标签)的峰值函数,并进一步将其绑定(bind)到新数据帧中