machine-learning - 使用预训练的卷积神经网络了解特征提取

书中Deep Learning with Python作者:François Chollet(Keras 的创建者)，第 5.3 节(请参阅 companion Jupyter notebook )，我不清楚以下内容:

Let's put this in practice by using the convolutional base of the VGG16 network, trained on ImageNet, to extract interesting features from our cat and dog images, and then training a cat vs. dog classifier on top of these features.

[...]

There are two ways we could proceed:

Running the convolutional base over our dataset, recording its output to a Numpy array on disk, then using this data as input to a standalone densely-connected classifier similar to those you have seen in the first chapters of this book. This solution is very fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. However, for the exact same reason, this technique would not allow us to leverage data augmentation at all.

Extending the model we have (conv_base) by adding Dense layers on top, and running the whole thing end-to-end on the input data. This allows us to use data augmentation, because every input image is going through the convolutional base every time it is seen by the model. However, for this same reason, this technique is far more expensive than the first one.

为什么我们不能增强我们的数据(从现有数据生成更多图像)，在增强的数据集上运行卷积基(一次)，记录其输出，然后使用该数据作为独立全连接的输入分类器？

它不会给出与第二种替代方案类似的结果，但速度更快吗？

我错过了什么？

最佳答案

Wouldn't it give similar results to the second alternative but be faster?

类似的结果是的，但它真的会更快吗？

Chollet 的主要观点是，第二种方式的成本更高，仅仅是因为增强过程本身导致的图像数量较多；而第一种方法

only requires running the convolutional base once for every input image

第二个

every input image is going through the convolutional base every time it is seen by the model [...] for this same reason, this technique is far more expensive than the first one

自

the convolutional base is by far the most expensive part of the pipeline

其中“每次模型看到它”必须理解为“在增强过程生成的每个版本中”(同意，这里的措辞可以而且应该更清晰......)。

使用您提出的方法无法解决此问题。当然，这是第二种方式的有效替代版本，但考虑到整个端到端流程，没有理由相信它实际上会更快 (CNN+FC)在这两种情况下......

更新(评论后):

Maybe you are right, but I still have a feeling of missing something since the author explicitly wrote that the first method "would not allow us to leverage data augmentation at all".

我认为你只是过度阅读了这里的内容 - 尽管作者可以而且应该更清楚；正如所写，Chollet 的论点在这里有点循环(这可能发生在我们最好的人身上):因为我们“对每个输入图像仅运行一次卷积基”，所以结果根据定义，我们不使用任何增强...有趣的是，书中的短语(第 146 页)略有不同(不那么引人注目):

But for the same reason, this technique won’t allow you to use data augmentation.

那是什么原因呢？但当然，我们将每个图像仅输入一次...

换句话说，事实上并不是我们“不被允许”这样做，而是我们选择不增强(也就是说，为了更快)...

关于machine-learning - 使用预训练的卷积神经网络了解特征提取，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51773119/

machine-learning - 使用预训练的卷积神经网络了解特征提取

上一篇：scala - Spark中的FP增长模型

下一篇：machine-learning - 如何使用 Keras (tensorflow) 限制神经网络回归中的预测输出总和