python - ROC-AUC 的倒数？

标签 python machine-learning classification roc auc

我有一个分类问题，需要根据给定的数据预测 (0,1) 类。基本上我有一个包含超过 300 个特征(包括预测目标值)和超过 2000 行(样本)的数据集。我应用了不同的分类器，如下所示:

 1. DecisionTreeClassifier()
 2. RandomForestClassifier()
 3. GradientBoostingClassifier()
 4. KNeighborsClassifier()

除了随机森林在 0.28 左右之外，几乎所有分类器都给出了大约 0.50 AUC 值的相似结果。我想知道如果我反转 RandomForest 结果是否正确:

 1-0.28= 0.72

并将其报告为 AUC？正确吗？

最佳答案

您的直觉并没有错:如果二元分类器的性能确实比随机分类器差(即 AUC < 0.5)，则有效的策略是简单地反转其预测，即每当分类器预测时报告 0 1，反之亦然)；来自相关Wikipedia entry (强调):

The diagonal divides the ROC space. Points above the diagonal represent good classification results (better than random); points below the line represent bad results (worse than random). Note that the output of a consistently bad predictor could simply be inverted to obtain a good predictor.

尽管如此，此反向分类器的正式正确 AUC 是首先反转模型的各个概率预测 prob:

prob_invert = 1 - prob

然后使用这些预测prob_invert计算AUC(可以说，该过程应该与您描述的简单从1中减去AUC的简单方法给出类似的结果，但我不确定确切的结果 - 另请参阅此 Quora answer )。

不用说，所有这一切都基于这样的假设:您的整个过程是正确的，即您没有任何建模或编码错误(构建一个比随机分类器更差的分类器并不完全是微不足道的)。

关于python - ROC-AUC 的倒数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54195440/

上一篇：machine-learning - 我可以在拟合 CatBoostRegressor 时对评估集中的观测值进行加权吗？

下一篇：python - 非常高的验证，同时缓慢增加训练

python - scikit-learn 获得所选类别的分类器的分类/分数的确定性

python - gcloud ml-engine 在大文件上返回错误

python - matplotlib stackplot 颜色 ValueError 和段错误

python - 与多个端口、协议(protocol)和 react 器混合在一起

machine-learning - 比较 MSE 损失和交叉熵损失的收敛性

python - RuntimeError : Trying to backward through the graph a second time, 但缓冲区已被释放。指定 retain_graph=True

python - Keras 多层感知器训练数据显示损失 = nan

python - 对于分类模型， `eli5.show_weights` 究竟显示了什么？

python - 控制 Scikit Learn 中逻辑回归的阈值