python - VotingClassifier 中的 roc_auc,scikit-learn (sklearn) 中的 RandomForestClassifier

标签 python scikit-learn decision-tree roc ensemble-learning

我正在尝试为我构建的硬投票分类器计算 roc_auc。我用可重现的例子展示了代码。现在我想计算 roc_auc 分数并绘制 ROC 曲线,但不幸的是我收到以下错误 predict_proba is not available when voting='hard'

# Voting Ensemble for Classification
import pandas
from sklearn import datasets
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer,confusion_matrix, f1_score, precision_score, recall_score, cohen_kappa_score,accuracy_score,roc_curve
import numpy as np
np.random.seed(42)
iris = datasets.load_iris()
X = iris.data[:, :4]  # we only take the first two features.
Y = iris.target
print(Y)
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
# create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(('logistic', model1))
model2 = RandomForestClassifier(n_estimators=200, max_depth=3, random_state=0)
estimators.append(('RandomForest', model2))
model3 = MultinomialNB()
estimators.append(('NaiveBayes', model3))
model4=SVC(probability=True)
estimators.append(('svm', model4))
model5=DecisionTreeClassifier()
estimators.append(('Cart', model5))
# create the ensemble model
print('Majority Class Labels (Majority/Hard Voting)')
ensemble = VotingClassifier(estimators,voting='hard')
#accuracy
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold,scoring='accuracy')
y_pred = cross_val_predict(ensemble, X ,Y, cv=10)
print("Accuracy ensemble model : %0.2f (+/- %0.2f) " % (results.mean(), results.std() ))
print(results.mean())
#recall
recall_scorer = make_scorer(recall_score, pos_label=1)
recall = cross_val_score(ensemble, X, Y, cv=kfold, scoring=recall_scorer)
print('Recall', np.mean(recall), recall)
# Precision
precision_scorer = make_scorer(precision_score, pos_label=1)
precision = cross_val_score(ensemble, X, Y, cv=kfold, scoring=precision_scorer)
print('Precision', np.mean(precision), precision)
#f1_score
f1_scorer = make_scorer(f1_score, pos_label=1)
f1_score = cross_val_score(ensemble, X, Y, cv=kfold, scoring=f1_scorer)
print('f1_score ', np.mean(f1_score ),f1_score )
#roc_auc_score
roc_auc_score = cross_val_score(ensemble, X, Y, cv=kfold, scoring='roc_auc')
print('roc_auc_score ', np.mean(roc_auc_score ),roc_auc_score )

最佳答案

计算roc_auc您首先需要的指标

替换:ensemble = VotingClassifier(estimators,voting='hard')

:ensemble = VotingClassifier(estimators,voting='soft') .


接下来,最后两行代码将抛出一个错误:

roc_auc_score = cross_val_score(ensemble, X, Y, cv=3, scoring='roc_auc')
print('roc_auc_score ', np.mean(roc_auc_score ),roc_auc_score )

ValueError: multiclass format is not supported

这是正常的,因为在 Y你有 3 个类 ( np.unique(Y) == array([0, 1, 2]) )。

您不能使用 roc_auc作为多类模型的单一汇总指标。如果你愿意,你可以计算 **per-class roc_auc .**


如何解决这个问题:

1) 只使用两个类来计算roc_auc_score

2) 在调用roc_auc_score之前提前使用标签二值化

关于python - VotingClassifier 中的 roc_auc,scikit-learn (sklearn) 中的 RandomForestClassifier,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51465682/

相关文章:

python - 打印变量名称的约定是什么?

python - RFECV 机器学习特征选择花费的 Python 时间太长

c# - AdaBoost 反复选择相同的弱学习器

image-processing - 使用决策树

python - 来自多种数据类型特征的决策树

python - 使用 tensorflow 插件图像变换时出现 undefined symbol 错误

python - 从第一行开始在pycharm控制台中执行多行输入

python - Keras 模型的矩阵大小错误

machine-learning - 为什么使用 LGB 时 10 倍交叉验证甚至比 1 倍拟合更快?

python - 为什么 sklearn MinMaxScaler() 返回超出范围的值而不是错误?