python - scikit-learn中决策树中的AUC计算

标签 python python-2.7 machine-learning scikit-learn decision-tree

在 Windows 上使用 scikit-learn 和 Python 2.7,我计算 AUC 的代码有什么问题?谢谢。

from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
#print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="precision")
#print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="recall")
print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="roc_auc")

Traceback (most recent call last):
  File "C:/Users/foo/PycharmProjects/CodeExercise/decisionTree.py", line 8, in <module>
    print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="roc_auc")
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1433, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__
    self.results = batch()
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1550, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1606, in _score
    score = scorer(estimator, X_test, y_test)
  File "C:\Python27\lib\site-packages\sklearn\metrics\scorer.py", line 159, in __call__
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

编辑 1,看起来 scikit learn 甚至可以在没有任何机器学习模型的情况下决定阈值,想知道为什么,

import numpy as np
from sklearn.metrics import roc_curve
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)
print fpr
print tpr
print thresholds

最佳答案

roc_aucsklearn仅适用于二进制类:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html

解决此问题的一种方法是将标签二值化并将分类扩展到一对多方案。在sklearn中你可以使用sklearn.preprocessing.LabelBinarizer 。文档在这里:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html

关于python - scikit-learn中决策树中的AUC计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39114463/

相关文章:

python - numpy数组转入matlab

python - 是否可以在 Python 中获取关键字列表?

python-2.7 - 运行 gcloud 时出错,一直工作到昨天

python - 参数在发送到函数之前发生变化,python 2.7

java - Python 2.7 子进程调用方法无法运行 java 命令

machine-learning - 调查以确定满意度 : how to find the questions that mattered?

tensorflow - 多个经过训练的模型与多个特征和一个模型

Python 和类属性声明

python - 为什么我的按钮的命令在我创建按钮时立即执行,而不是在我单击它时执行?

tensorflow - 在一对一样本上训练 Keras 模型并绘制验证曲线