python - 迁移学习时 TensorFlow 中的验证和评估指标问题

我在使用 Tensorflow 2.0 训练 CNN 时遇到了一些奇怪的行为，如果能帮助我解决这些问题，我将不胜感激。我正在使用“tensorflow.keras.applications”中提供的预训练网络进行迁移学习(仅训练分类头)，并注意到以下内容:

对于第一个周期，无论我做什么，验证指标始终为零。
在第一个周期之后进行训练时，训练指标会如您所期望的那样有所改善，但验证指标本质上是随机猜测，即使使用完全相同的数据集作为训练和验证数据集也是如此。就像它没有使用正在训练的模型来进行评估一样。

我尝试过 VGG16、MobileNetV2 和 ResNet50V2，它们都表现出相同的行为。

我能够重现这一点的配置是:

Ubuntu 18.04LTS、Nvidia RTX2080ti(驱动程序版本 430.50)、CUDA10.0、TensorFlow-gpu==2.0.0
MacBook Pro、TensorFlow==2.0.0(CPU)

两者都在 Conda 环境中运行，并且我已经使用 pip 安装了 TensorFlow。我在下面放置了一些示例代码来展示我的工作流程的本质，以防万一我做了任何明显愚蠢的事情。任何帮助将非常感激，因为我不知道如何修复它。

def parse_function(example_proto):
    image_feature_description = {
        'label': tf.io.FixedLenFeature([], tf.int64),
        'image_raw': tf.io.FixedLenFeature([], tf.string)
    }
    parsed_example = tf.io.parse_single_example(example_proto, image_feature_description)
    image = tf.io.decode_image(
                            parsed_example['image_raw'], 
                            channels = 3, 
                            dtype = tf.float32, 
                            expand_animations = False
                            )
    image = tf.image.per_image_standardization(image)
    label = tf.one_hot(parsed_example['label'], 24, dtype=tf.float32) 
    return (image, label)

def load_dataset(TFRecord_dir, record_name):
    record_files = tf.io.matching_files(os.path.join(TFRecord_dir, record_name + '.tfrecords-????'))
    shards = tf.data.TFRecordDataset(record_files)
    shards = shards.shuffle(tf.cast(tf.shape(record_files)[0], tf.int64))
    dataset = shards.map(map_func=parse_function)
    dataset = dataset.batch(batch_size=16, drop_remainder = True)
    dataset = dataset.prefetch(16)
    return dataset



base_model = tf.keras.applications.ResNet50V2(
                                            input_shape=(224,224,3),
                                            weights='imagenet',
                                            include_top = False
                                            )
base_model.trainable = False

model = tf.keras.Sequential([
        base_model,
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(24, activation = 'softmax')
        ])

model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=[ 
            tf.keras.metrics.CategoricalAccuracy(),
            tf.keras.metrics.TopKCategoricalAccuracy(),
            tf.keras.metrics.Precision(),
            tf.keras.metrics.Recall()
            ])

train_dataset = load_dataset(train_dir, 'train')

model.fit(train_dataset,
                verbose = 1,
                epochs= 5,
                validation_data = train_dataset)
model.evaluate(train_dataset)

最佳答案

When training after the first epoch, the training metrics improve as you would expect, but the validation metrics essentially are random guesses, even when the EXACT same dataset is used as a training and a validation dataset. It is like it isn't using the model being trained to do its evaluation.

这意味着您的网络无法学习所有内容，并且它只是过度拟合。随机猜测意味着您的准确度为 1/n，其中 n 是类的数量。

您可能需要将learning_rate修改为一个更低的值(1e-5)来开始，然后甚至解冻一些较低的层(接近您的GAP+Dropout+Dense)。

关于python - 迁移学习时 TensorFlow 中的验证和评估指标问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59733678/

python - 迁移学习时 TensorFlow 中的验证和评估指标问题

上一篇：swift - CoreML 图像检测

下一篇：python - Scikit-learn 的特征选择回归