python - 如何根据损失精度和召回率确定过拟合模型

我用 Keras 编写了一个 LSTM 网络(以下代码):

    df = pd.read_csv("../data/training_data.csv")

    # Group by and pivot the data
    group_index = df.groupby('group').cumcount()
    data = (df.set_index(['group', group_index])
            .unstack(fill_value=0).stack())

    # getting np array of the data and labeling
    # on the label group we take the first label because it is the same for all
    target = np.array(data['label'].groupby(level=0).apply(lambda x: [x.values[0]]).tolist())
    data = data.loc[:, data.columns != 'label']
    data = np.array(data.groupby(level=0).apply(lambda x: x.values.tolist()).tolist())

    # shuffel the training set
    data, target = shuffle(data, target)

    # spilt data to train and test
    x_train, x_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=4)

    # ADAM Optimizer with learning rate decay
    opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    # build the model
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    model.add(LSTM(8, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='sigmoid'))
    model.add(LeakyReLU(alpha=.001))
    model.add(Dropout(0.2))
    model.add(LSTM(4, return_sequences=True, activation='sigmoid'))
    model.add(LeakyReLU(alpha=.001))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(),f1])

    model.summary()


    # Training, getting the results history for plotting
    history = model.fit(x_train, y_train, epochs=3000, validation_data=(x_test, y_test))

监控的指标是损失、准确率、精确率、召回率和 f1 分数。

我注意到验证损失指标开始攀升大约 300 个时期，所以我认为过度拟合!但是，召回率仍在攀升，精度略有提高。

这是为什么呢？我的模型是否过度拟合？

最佳答案

the validation loss metric start to climb around 300 epochs (...) recall is still climbing and precision is slightly improving. (...) Why is that?

准确率和召回率衡量分类器在预测类别标签方面的表现。另一方面，模型损失是衡量 cross entropy 的指标。，分类概率误差:

在哪里

y = predicted label
p = probability of predicted label

例如，对于一次观察，模型的(softmax)输出对于不同的时期可能看起来像这样

# epoch 300
y = [0.1, 0.9] => argmax(y) => 1 (class label 1)
loss = -(1 * log(0.9)) = 0.10

# epoch 500
y = [0.4, 0.6] => argmax(y) => 1 (class label 1)
loss = -(1 * log(0.6)) = 0.51

在这两种情况下，精度和召回率指标都将保持不变(类标签仍然被正确预测)，但模型损失有所增加。一般而言，该模型对其预测变得“不太确定”，但它仍然是正确的。

请注意，在您的模型中，损失是针对所有观测值计算的，而不仅仅是单个观测值。为了简单起见，我限制了讨论。通过取所有观测值损失的平均值，损失公式被简单地扩展到 n > 1 个观测值。

is my model overfitted?

为了确定这一点，您必须比较训练损失和验证损失。您不能仅通过验证损失来判断。如果训练损失减少而验证损失增加，则您的模型过度拟合。

关于python - 如何根据损失精度和召回率确定过拟合模型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52831351/

python - 如何根据损失精度和召回率确定过拟合模型

上一篇：python - 默认安装，Python 中的 "optional"依赖项(setuptools)

下一篇：python - boto3客户端线程安全吗

python - 如何根据损失精​​度和召回率确定过拟合模型

上一篇：python - 默认安装，Python 中的 "optional"依赖项(setuptools)

下一篇：python - boto3客户端线程安全吗

python - 如何根据损失精度和召回率确定过拟合模型