machine-learning - 预训练卷积神经网络的微调

标签 machine-learning neural-network deep-learning conv-neural-network

当我阅读和搜索有关预训练网络的微调时,它是通过以下两个步骤完成的(简而言之):

  1. 卡住隐藏层并解冻全连接层并进行训练。
  2. 解冻两个层并再次训练。

我的问题是:

  1. 只执行第一步是否足够?

  2. 如果我只执行第一步,这与网络作为特征提取器方法不一样吗?

(网络作为特征提取器的方法是,使用预训练的网络提取特征,并使用传统的机器学习分类算法进行分类)。

如果您需要更多信息来澄清问题,请告诉我。

最佳答案

您的问题存在一些问题...

首先,您明确暗示网络只有 2 层,这与当今实践中实际使用的微调方式相去甚远。

第二,你的第一个问题中的“足够”到底是什么意思(足够做什么)?

<小时/>

事实上,预训练模型、特征提取器和微调的概念之间存在足够的重叠,不同的人甚至可能以不完全相同的方式使用所涉及的术语。斯坦福大学采用的一种方法CNNs for Visual Recognition当然,是将所有这些视为更普遍的称为迁移学习的特殊情况;这是一个有用的excerpt来自上述类(class)的相应部分,可以说它解决了您问题的精神(如果不是文字):

The three major Transfer Learning scenarios look as follows:

  • ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
  • Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.
  • Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

  1. New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
  2. New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
  3. New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
  4. New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

关于machine-learning - 预训练卷积神经网络的微调,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47753911/

相关文章:

python - 无 k 折的分层抽样

python - 使用 Keras 构建的回归神经网络只能预测一个值

machine-learning - 为具有 k-hot 标签的图像数据集创建 LMDB

python - Keras model.fit() 发出 TypeError : 'NoneType' object is not callable

python - 如何使用神经网络提取音频剪辑中 Feather 球击球声音的所有时间戳?

amazon-web-services - 使用 "source-ref"的 AWS Ground Truth 文本分类 list 不显示文本

python - 如何获取全连接层神经元的数量?

python - tensorflow 中几个梯度的计算

python - “密集”对象没有属性 'op'

python - 在 TensorFlow 中完成训练过程后使用 predict() 方法