python - 值错误: Classification metrics can't handle a mix of unknown and binary targets

标签 python scikit-learn

我正在尝试创建一个多类多标签混淆矩阵。我首先编写了一个简单的代码来测试一下,它运行得很好!

import matplotlib
matplotlib.use('Agg')
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np


y_true = np.array([[0,0,1], [1,1,0],[0,1,0], [0,0,1]])
y_pred = np.array([[3.11640739e-01, 7.03224633e-03, 5.24131523e-04], [1,0,1],[0,0,0],[0,1,0]])

labels = ["A", "B", "C"]

conf_mat_dict={}

for label_col in range(len(labels)):
    y_true_label = y_true[:, label_col]
    y_pred_label = y_pred[:, label_col].astype(int)
    print(len(y_pred_label))
    print(y_pred_label)
    conf_mat_dict[labels[label_col]] = confusion_matrix(y_pred=y_pred_label, y_true=y_true_label)


for label, matrix in conf_mat_dict.items():
    print("Confusion matrix for label {}:".format(label))
    print(matrix)

现在我正在尝试将此代码实现到我的分类器中。但我收到错误:

Traceback (most recent call last):
  File "module/xvisionkeras.py", line 137, in <module>
    conf_mat_dict[all_labels[label_col]] = confusion_matrix(y_pred=y_pred_label, y_true=y_true_label)
  File "/home/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 253, in confusion_matrix
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/home/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 81, in _check_targets
    "and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of unknown and binary targets

这就是我的分类器中的 y_true 和 y_pred 的样子:

y_true = [[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0]
 [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0]
 [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0]
 [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0]
 [1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0]]
y_pred = [[3.11640739e-01 7.03224633e-03 5.24131523e-04 3.86616620e-07
  1.76620641e-04 2.04237878e-07 1.33097637e-02 9.73195732e-02
  4.40102362e-04 1.43901054e-02 1.97768782e-06 3.82719515e-03
  1.00047495e-02 2.02647328e-07]
 [9.50563997e-02 3.03714332e-04 2.36625488e-06 9.52693328e-13
  7.04080634e-08 2.23670063e-12 4.18162963e-04 8.28180760e-02
  3.15815419e-06 7.02972582e-04 8.19261742e-11 1.01527738e-04
  7.73170555e-04 1.85218228e-13]
 [2.22699329e-01 2.33753794e-03 9.19468803e-05 7.17655990e-10
  3.76326443e-06 2.58434874e-09 3.70667153e-03 1.12193748e-01
  1.60316195e-04 3.16509278e-03 1.77856236e-08 7.23963138e-04
  5.58568537e-03 3.64327679e-10]
 [2.01257914e-01 2.55549047e-03 8.14868326e-05 8.32152924e-09
  2.27710298e-05 5.02339681e-09 6.01076195e-03 6.39715046e-02
  4.62430944e-05 8.06804933e-03 8.95162486e-08 1.28999283e-03
  2.87817954e-03 2.70706768e-09]
 [1.99281245e-01 5.05847216e-04 4.23961183e-06 6.71859304e-11
  8.09664698e-07 4.37779007e-10 1.80601899e-03 2.89123088e-01
  6.22663310e-06 9.77680553e-04 3.53975094e-09 2.74123857e-04
  3.29167116e-03 3.53774961e-11]]

有人可以向我指出我做错了什么吗?我多年来一直被这个问题困扰!

最佳答案

您正在尝试比较整数和非整数值。 (1 == 0.99) 永远不会匹配,除非您对非整数值进行舍入。

y_true, y_pred = [0, 1], [0.7, 0.3]
confusion_matrix(y_true, y_pred)
>> ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

四舍五入y_pred就可以了。 但是,我认为 scikit-learn 不支持多类多标签混淆矩阵。 您将得到一个

>> ValueError: multilabel-indicator is not supported

但是,您可以计算其他指标,例如准确度得分(越高越好)或汉明损失(越低越好)

y_true = np.random.choice(2, (5, 14), p=[0.7, 0.3])
y_pred_round = np.random.choice(2, (5, 14), p=[0.7, 0.3])
from sklearn.metrics import accuracy_score, hamming_loss

# Accuracy Score
accuracy_score(y_true, y_pred_round, normalize=True, sample_weight=None)
((y_pred_round == y_true).all(axis=1).sum() / y_pred_round.shape[0])

# Hamming Loss
hamming_loss(y_true, y_pred_round)
scores = (y_pred_round != y_true).sum(axis=1)
numerator = scores.sum()
denominator = ((scores != 0).sum() * y_true.shape[1])
hl = (numerator / denominator)

您还可以使用 (y_pred_round != y_true)(y_pred_round == y_true) 做什么?对它们求和(.sum(axis=1).sum(axis=0))、重新排列、划分、定义其他指标

关于python - 值错误: Classification metrics can't handle a mix of unknown and binary targets,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54018742/

相关文章:

python - 如何在 scipy 中创建数学表达式?

python - 如何在 Python 中转换具有特殊字符的字符串变量以正确打印

python - 属性对 Python 中特定目标的预测能力,使用 Sklearn 中的特征选择

scikit-learn - OneHotEncoder 的分类特征问题

python - 导入错误: cannot import name BayesianGaussianMixture

python - PsychoPy 中的二进制表示和设置并行端口数据

Python 正则表达式未按预期返回

python - 按标签选择并写入 csv

python - scikit-learn 的多级并行化

python - 如何解决形状问题的逆变换?