python - 在 Keras 中使用迁移学习训练 CNN - 图像输入不起作用，但矢量输入起作用

我正在尝试在 Keras 中进行迁移学习。我设置了一个 ResNet50 网络，设置为不可通过一些额外层进行训练:

# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

然后，我使用 ResNet50 preprocess_input 函数创建输入数据:x_batch，以及一个热编码标签 y_batch 并进行拟合，如下所示所以:

model.fit(x_batch,
          y_batch,
          epochs=nb_epochs,
          batch_size=64,
          shuffle=True,
          validation_split=0.2,
          callbacks=[lrate])

经过 10 个左右的 epoch 后，训练准确率接近 100%，但验证准确率实际上从 50% 左右下降到 30%，验证损失稳步增加。

但是，如果我创建一个仅包含最后几层的网络:

# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()

并输入 ResNet50 预测的输出:

resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)

然后验证准确率达到 85% 左右...这是怎么回事？为什么图像输入法不起作用？

更新:

这个问题确实很奇怪。如果我将 ResNet50 更改为 VGG19，它似乎可以正常工作。

最佳答案

经过大量谷歌搜索后，我发现问题与 ResNet 中的批量归一化层有关。 VGGNet 中没有批量归一化层，这就是它适用于该拓扑的原因。

Keras 中有一个修复此问题的拉取请求 here ，更详细地解释:

Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.

这意味着 BN 层正在根据训练数据进行调整，但是在执行验证时，将使用 BN 层的原始参数。据我所知，解决办法是允许卡住的 BN 层使用训练中更新的均值和方差。

解决方法是预先计算 ResNet 输出。事实上，这大大减少了训练时间，因为我们没有重复这部分计算。

关于python - 在 Keras 中使用迁移学习训练 CNN - 图像输入不起作用，但矢量输入起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55546246/

python - 在 Keras 中使用迁移学习训练 CNN - 图像输入不起作用，但矢量输入起作用

上一篇：python - 使用 TensorFlow Probability 的 Edward2 的简单哈密顿蒙特卡罗示例

下一篇：python - 比较日期的最简单方法是什么？