tensorflow - 如何在迁移学习期间卡住 batch-norm 层

标签 tensorflow keras neural-network tensorflow2.0 batch-normalization

我正在关注 Transfer learning and fine-tuning guide在 TensorFlow 官方网站上。指出在微调期间,batch normalization 层应该处于推理模式:

Important notes about BatchNormalization layer

Many image models contain BatchNormalization layers. That layer is a special case on every imaginable count. Here are a few things to keep in mind.

  • BatchNormalization contains 2 non-trainable weights that get updated during training. These are the variables tracking the mean and variance of the inputs.
  • When you set bn_layer.trainable = False, the BatchNormalization layer will run in inference mode, and will not update its mean & variance statistics. This is not the case for other layers in general, as weight trainability & inference/training modes are two orthogonal concepts. But the two are tied in the case of the BatchNormalization layer.
  • When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing training=False when calling the base model. Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.

You'll see this pattern in action in the end-to-end example at the end of this guide.

甚至还有一些其他来源,例如 this文章(标题为 Transfer Learning with ResNet)说了一些完全不同的事情:

for layer in resnet_model.layers:
    if isinstance(layer, BatchNormalization):
        layer.trainable = True
    else:
        layer.trainable = False

无论如何,我知道 TensorFlow 中的 trainingtrainable 参数之间存在差异。

我正在从文件中加载我的模型,如下所示:

model = tf.keras.models.load_model(path)

我正在以这种方式解冻(或实际上卡住其余部分)一些顶层:

model.trainable = True

for layer in model.layers:
    if layer not in model.layers[idx:]:
        layer.trainable = False

现在关于批量归一化层:我可以做:

for layer in model.layers:
    if isinstance(layer, keras.layers.BatchNormalization):
      layer.trainable = False

  for layer in model.layers:
    if layer.name.startswith('bn'):
      layer.call(layer.input, training=False)

我应该做哪一个?以及最终卡住批规范层是否更好?

最佳答案

不确定训练与可训练的区别,但就我个人而言,我在设置 trainable = False 时得到了很好的结果。

现在关于是否首先冷冻它们:我没有冷冻它们的效果很好。道理很简单,batch norm层学习初始训练数据的移动平均。这可能是猫、狗、人、汽车等。但是当你进行迁移学习时,你可能会转移到一个完全不同的领域。这个新图像域的移动平均值与之前的数据集有很大不同。

通过解冻这些层并卡住 CNN 层,我的模型的准确度提高了 6-7%(82 -> 89% 左右)。我的数据集与训练 efficientnet 的初始 Imagenet 数据集大不相同。

附言根据您计划如何运行模型后训练,我建议您在模型训练完成后卡住批量归一化层。出于某种原因,如果你在线运行模型(一次 1 张图像),批量规范会变得很奇怪并给出不规则的结果。训练后冷冻它们对我来说解决了这个问题。

关于tensorflow - 如何在迁移学习期间卡住 batch-norm 层,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67885869/

相关文章:

neural-network - 在caffe上使用VGG进行微调

machine-learning - 线性自动编码器如何等于 PCA?

docker - 部署Tensorflow服务客户端脚本

python - 如何将 PrefetchDataset 转换为 TF 张量?

python - ValueError : Error when checking input: expected dense_151_input to have 3 dimensions, 但得到形状为 (2, 2100) 的数组

python - TensorFlow 的内存泄漏

python - lstm预测结果延迟现象

android - Tensorflow 的应用程序相机图像模糊

python - 需要使用 Keras 的 model.predict 的帮助

matlab - 为什么我的神经网络在 MNIST 数据集上训练后无法正确预测 7 和 9?