python - Keras model.fit log 和 Sklearn.metrics.confusion_matrix 报告的验证准确性指标彼此不匹配

问题在于，我从 Keras model.fit 历史记录中获得的报告验证准确性值明显高于我的验证准确性指标从 sklearn.metrics 函数获取。

我从 model.fit 获得的结果总结如下:

Last Validation Accuracy: 0.81
Best Validation Accuracy: 0.84

sklearn 的结果(标准化)非常不同:

True Negatives: 0.78
True Positives: 0.77

Validation Accuracy = (TP + TN) / (TP + TN + FP + FN) = 0.775 

(see confusion matrix below for reference)

Edit: this calculation is incorrect, because one can not 
use the normalized values to calculate the accuracy, since 
it does not account for differences in the total absolute 
number of points in the dataset. Thanks to the comment by desertnaut

以下是 model.fit 历史记录中验证准确度数据的图表:
这是 sklearn 生成的混淆矩阵:

我认为这个问题与这个问题有点相似Sklearn metrics values are very different from Keras values 但我已经检查过这两种方法都在同一数据池上进行验证，因此这个答案可能不足以满足我的情况。

还有，这个问题Keras binary accuracy metric gives too high accuracy似乎解决了二元交叉熵影响多类问题的方式的一些问题，但在我的情况下它可能不适用，因为它是一个真正的二元分类问题。

以下是使用的命令:

模型定义:

inputs = Input((Tx, ))
n_e = 30
embeddings = Embedding(n_x, n_e, input_length=Tx)(inputs)
out = Bidirectional(LSTM(32, recurrent_dropout=0.5, return_sequences=True))(embeddings)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5, return_sequences=True))(out)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5))(out)
out = Dense(3, activation='softmax')(out)
modelo = Model(inputs=inputs, outputs=out)
modelo.summary()

模型摘要:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 100)               0         
_________________________________________________________________
embedding (Embedding)        (None, 100, 30)           86610     
_________________________________________________________________
bidirectional (Bidirectional (None, 100, 64)           16128     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 100, 32)           10368     
_________________________________________________________________
bidirectional_2 (Bidirection (None, 32)                6272      
_________________________________________________________________
dense (Dense)                (None, 3)                 99        
=================================================================
Total params: 119,477
Trainable params: 119,477
Non-trainable params: 0
_________________________________________________________________

模型编译:

mymodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

模型拟合调用:

num_epochs = 30
myhistory = mymodel.fit(X_pad, y, epochs=num_epochs, batch_size=50, validation_data=[X_val_pad, y_val_oh], shuffle=True, callbacks=callbacks_list)

模型拟合日志:

Train on 505 samples, validate on 127 samples

Epoch 1/30
500/505 [============================>.] - ETA: 0s - loss: 0.6135 - acc: 0.6667
[...]
Epoch 10/30
500/505 [============================>.] - ETA: 0s - loss: 0.1403 - acc: 0.9633
Epoch 00010: val_acc improved from 0.77953 to 0.79528, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 41ms/sample - loss: 0.1393 - acc: 0.9637 - val_loss: 0.5203 - val_acc: 0.7953
Epoch 11/30
500/505 [============================>.] - ETA: 0s - loss: 0.0865 - acc: 0.9840
Epoch 00011: val_acc did not improve from 0.79528
505/505 [==============================] - 21s 41ms/sample - loss: 0.0860 - acc: 0.9842 - val_loss: 0.5257 - val_acc: 0.7953
Epoch 12/30
500/505 [============================>.] - ETA: 0s - loss: 0.0618 - acc: 0.9900
Epoch 00012: val_acc improved from 0.79528 to 0.81102, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0615 - acc: 0.9901 - val_loss: 0.5472 - val_acc: 0.8110
Epoch 13/30
500/505 [============================>.] - ETA: 0s - loss: 0.0415 - acc: 0.9940
Epoch 00013: val_acc improved from 0.81102 to 0.82152, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0413 - acc: 0.9941 - val_loss: 0.5853 - val_acc: 0.8215
Epoch 14/30
500/505 [============================>.] - ETA: 0s - loss: 0.0443 - acc: 0.9933
Epoch 00014: val_acc did not improve from 0.82152
505/505 [==============================] - 21s 42ms/sample - loss: 0.0453 - acc: 0.9921 - val_loss: 0.6043 - val_acc: 0.8136
Epoch 15/30
500/505 [============================>.] - ETA: 0s - loss: 0.0360 - acc: 0.9933
Epoch 00015: val_acc improved from 0.82152 to 0.84777, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0359 - acc: 0.9934 - val_loss: 0.5663 - val_acc: 0.8478
[...]
Epoch 30/30
500/505 [============================>.] - ETA: 0s - loss: 0.0039 - acc: 1.0000
Epoch 00030: val_acc did not improve from 0.84777
505/505 [==============================] - 20s 41ms/sample - loss: 0.0039 - acc: 1.0000 - val_loss: 0.8340 - val_acc: 0.8110

sklearn 的混淆矩阵:

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_values, predicted_values)

预测值和黄金值确定如下:

preds = mymodel.predict(X_val)
preds_ints = [[el] for el in np.argmax(preds, axis=1)]
values_pred = tokenizer_y.sequences_to_texts(preds_ints)
values_gold = tokenizer_y.sequences_to_texts(y_val)

最后，我想补充一点，我已经打印出了数据和所有预测错误，并且我相信 sklearn 值更可靠，因为它们似乎与我打印保存的预测得到的结果相匹配”最佳”模型。

另一方面，我无法理解指标为何会如此不同。由于它们都是众所周知的软件，因此我断定我是犯错误的人，但我无法确定错误的位置或方式。

最佳答案

你的问题不恰当；正如已经评论过的，您尚未计算 scikit-learn 模型的实际准确性，因此您似乎将苹果与橙子进行比较。根据归一化混淆矩阵计算 (TP + TN)/2 不能给出准确度。这是使用玩具数据的简单演示，改编自 docs 的 plot_confusion_matrix :

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# toy data
y_true = [0, 1, 0, 1, 0, 0, 0, 1]
y_pred =  [1, 1, 1, 0, 1, 1, 0, 1]
class_names=[0,1]

# plot_confusion_matrix function

def plot_confusion_matrix(y_true, y_pred, classes,
                          normalize=False,
                          title=None,
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if not title:
        if normalize:
            title = 'Normalized confusion matrix'
        else:
            title = 'Confusion matrix, without normalization'

    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    fig, ax = plt.subplots()
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax

计算归一化混淆矩阵得出:

plot_confusion_matrix(y_true, y_pred, classes=class_names, normalize=True)
# result:
Normalized confusion matrix
[[ 0.2         0.8       ]
 [ 0.33333333  0.66666667]]

根据您的不正确理由，准确度应该是:

(0.67 + 0.2)/2
# 0.435

(注意在归一化矩阵中，行如何添加到 100%，这在完整的混淆矩阵中不会发生)

但是现在让我们看看非标准化混淆矩阵的真实准确率是多少:

plot_confusion_matrix(y_true, y_pred, classes=class_names) # normalize=False by default
# result
Confusion matrix, without normalization
[[1 4]
 [1 2]]

根据准确度的定义(TP + TN)/(TP + TN + FP + FN)，我们得到:

(1+2)/(1+2+4+1)
# 0.375

当然，我们不需要混淆矩阵来获得像准确度这样基本的东西；正如评论中已经建议的，我们可以简单地使用 scikit-learn 的内置 accuracy_score 方法:

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
# 0.375

毫不奇怪，这与我们根据混淆矩阵直接计算的结果一致。

底线:

如果存在特定方法(例如 accuracy_score)，那么使用它们绝对比临时灵感更好，尤其当某些事情看起来不正确时(例如差异) Keras 和 scikit-learn 报告的准确度之间的差异)
在本例中，实际准确度低于您自己计算的准确度，这一事实显然并不能说明您报告的具体问题
如果即使在计算出正确的数据准确度后，与 Keras 的差异仍然存在，请不要根据新情况更改问题，因为这会使答案无效，尽管事实上它突出显示了您方法中的错误点 - 请提出一个新问题

关于python - Keras model.fit log 和 Sklearn.metrics.confusion_matrix 报告的验证准确性指标彼此不匹配，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57763363/

python - Keras model.fit log 和 Sklearn.metrics.confusion_matrix 报告的验证准确性指标彼此不匹配

上一篇：python - 对于神经网络(pytorch)中的每一层，应该有多少个偏差？

下一篇：python - 如何使用 KerasClassifier 使用 LSTM 进行序列分类