r - 当我的随机森林混淆矩阵显示该模型不擅长预测疾病时，为什么我的 ROC 图和 AUC 值看起来不错？

我使用 R 中的 randomForest 包创建一个模型，将病例分类为有病 (1) 或无病 (0):

classify_BV_100t <- randomForest(bv.disease~., data=RF_input_BV_clean, ntree = 100, localImp = TRUE)

print(classify_BV_100t)

Call:
 randomForest(formula = bv.disease ~ ., data = RF_input_BV_clean,      ntree = 100, localImp = TRUE) 
           Type of random forest: classification
                 Number of trees: 100
No. of variables tried at each split: 53

    OOB estimate of  error rate: 8.04%
Confusion matrix:
    0  1 class.error
0 510  7  0.01353965
1  39 16  0.70909091

我的混淆矩阵显示该模型擅长对 0(无疾病)进行分类，但对 1(疾病)进行分类则非常糟糕。

但是当我绘制 ROC 图时，它给人的印象是该模型相当不错。

以下是我绘制 ROC 的两种不同方法:

(使用 https://stats.stackexchange.com/questions/188616/how-can-we-calculate-roc-auc-for-classification-algorithm-such-as-random-forest )
```
library(pROC)
rf.roc<-roc(RF_input_BV_clean$bv.disease, classify_BV_100t$votes[,2])
plot(rf.roc)
auc(rf.roc)
```

(使用 How to compute ROC and AUC under ROC after training using caret in R? )

library(ROCR)
predictions <- as.vector(classify_BV_100t$votes[,2])
pred <- prediction(predictions, RF_input_BV_clean$bv.disease)

perf_AUC <- performance(pred,"auc") #Calculate the AUC value
AUC <- perf_AUC@y.values[[1]]

perf_ROC <- performance(pred,"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

这些是 1 和 2 的 ROC 图:

ROC plot 1

ROC plot 2

两种方法的 AUC 均为 0.8621593。

有谁知道为什么随机森林混淆矩阵的结果似乎与 ROC/AUC 不相符？

最佳答案

我不认为您的 ROC 图有任何问题，并且您对差异的评估是正确的。

高 AUC 值是非常高的真阴性率的产物。 ROC 考虑了敏感性；主要衡量真正的积极值(value)和特异性；真实负值的衡量标准。因为您的特异性非常高，所以该指标有效地承载了模型的较低敏感度值，这使您的 AUC 保持相对较高。是的，它的 AUC 很高，但正如您提到的，该模型只擅长预测负数。

我建议计算其他指标(敏感性、特异性、真阳性率、假阳性率...)，并在评估模型时评估所有这些指标的组合。 AUC 是一个质量指标，但它背后的其他指标意味着更多。

关于r - 当我的随机森林混淆矩阵显示该模型不擅长预测疾病时，为什么我的 ROC 图和 AUC 值看起来不错？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58959104/

r - 当我的随机森林混淆矩阵显示该模型不擅长预测疾病时，为什么我的 ROC 图和 AUC 值看起来不错？

上一篇：python - 如何在百万文档分类中找到异常值？

下一篇：tensorflow - 在卡住的 Keras 模型中，dropout 层是否仍然处于事件状态(即 trainable=False)？