我在tensorflow 2.x
做了一个脚本但我不得不将其下变频为 tensorflow 1.x
(在 1.14
和 1.15
中测试)。然而,tf1
版本的表现非常不同(测试集的准确率降低了 10%)。另请参阅训练和验证性能图(下附图表)。
查看从 tf1
迁移所需的操作至 tf2
似乎只有Adam
学习率可能是一个问题,但我明确定义了它 tensorflow migration
我在 GPU 和 CPU 以及 colab 上本地重现了相同的行为。使用的 keras 是 tensorflow ( tf.keras
) 中内置的。我使用了以下函数(用于训练、验证和测试),使用稀疏分类(整数):
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
horizontal_flip=horizontal_flip,
#rescale=None, #not needed for resnet50
preprocessing_function=None,
validation_split=None)
train_dataset = train_datagen.flow_from_directory(
directory=train_dir,
target_size=image_size,
class_mode='sparse',
batch_size=batch_size,
shuffle=True)
该模型是一个简单的 resnet50,顶部有一个新层:IMG_SHAPE = img_size+(3,)
inputs = Input(shape=IMG_SHAPE, name='image_input',dtype = tf.uint8)
x = tf.cast(inputs, tf.float32)
# not working in this version of keras. inserted in imageGenerator
x = preprocess_input_resnet50(x)
base_model = tf.keras.applications.ResNet50(
include_top=False,
input_shape = IMG_SHAPE,
pooling=None,
weights='imagenet')
# Freeze the pretrained weights
base_model.trainable = False
x=base_model(x)
# Rebuild top
x = GlobalAveragePooling2D(data_format='channels_last',name="avg_pool")(x)
top_dropout_rate = 0.2
x = Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = Dense(num_classes,activation="softmax", name="pred_out")(x)
model = Model(inputs=inputs, outputs=outputs,name="ResNet50_comp")
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
然后我调用 fit 函数:history = model.fit_generator(train_dataset,
steps_per_epoch=n_train_batches,
validation_data=validation_dataset,
validation_steps=n_val_batches,
epochs=initial_epochs,
verbose=1,
callbacks=[stopping])
例如,我使用以下完整脚本重现了相同的行为(应用于我的数据集并更改为 adam 并删除了中间的最终密集层):deep learning sandbox
复制此行为的最简单方法是在
tf2
上启用或禁用以下行使用相同脚本的环境,并向其中添加以下行。但是,我也在 tf1
上进行了测试环境( 1.14
和 1.15
):tf.compat.v1.disable_v2_behavior()
遗憾的是我无法提供数据集。2020 年 11 月 26 日更新
为了完全重现,我通过 food101(101 个类别)数据集获得了类似的行为,使用 'tf.compat.v1.disable_v2_behavior()' 启用 tf1 行为。以下是使用 tensorflow-gpu 2.2.0 执行的脚本:
#%% ref https://medium.com/deeplearningsandbox/how-to-use-transfer-learning-and-fine-tuning-in-keras-and-tensorflow-to-build-an-image-recognition-94b0b02444f2
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
import tensorflow as tf
# enable and disable this to obtain tf1 behaviour
tf.compat.v1.disable_v2_behavior()
from tensorflow.keras import __version__
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
# since i'm using resnet50 weights from imagenet, i'm using food101 for
# similar but different categorization tasks
# pip install tensorflow-datasets if tensorflow_dataset not found
import tensorflow_datasets as tfds
(train_ds,validation_ds),info= tfds.load('food101', split=['train','validation'], shuffle_files=True, with_info=True)
assert isinstance(train_ds, tf.data.Dataset)
print(train_ds)
#%%
IM_WIDTH, IM_HEIGHT = 224, 224
NB_EPOCHS = 10
BAT_SIZE = 32
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
def train(nb_epoch, batch_size):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
#nb_train_samples = train_ds.cardinality().numpy()
nb_train_samples=info.splits['train'].num_examples
nb_classes = info.features['label'].num_classes
classes_names = info.features['label'].names
#nb_val_samples = validation_ds.cardinality().numpy()
nb_val_samples = info.splits['validation'].num_examples
#nb_epoch = int(args.nb_epoch)
#batch_size = int(args.batch_size)
def preprocess(features):
#print(features['image'], features['label'])
image = tf.image.resize(features['image'], [224,224])
#image = tf.divide(image, 255)
#print(image)
# data augmentation
image=tf.image.random_flip_left_right(image)
image = preprocess_input(image)
label = features['label']
# for categorical crossentropy
#label = tf.one_hot(label,101,axis=-1)
#return image, tf.cast(label, tf.float32)
return image, label
#pre-processing the dataset to fit a specific image size and 2D labelling
train_generator = train_ds.map(preprocess).batch(batch_size).repeat()
validation_generator = validation_ds.map(preprocess).batch(batch_size).repeat()
#train_generator=train_ds
#validation_generator=validation_ds
#fig = tfds.show_examples(validation_generator, info)
# setup model
base_model = ResNet50(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
# transfer learning
setup_to_transfer_learn(model, base_model)
history = model.fit(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
#class_weight='auto')
#execute
history = train(nb_epoch=NB_EPOCHS, batch_size=BAT_SIZE)
以及在 food101 数据集上的表现:更新 27/11/2020
也可以在较小的 oxford_flowers102 数据集上看到差异:
(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)
注意:上图显示了通过多次运行相同的训练并评估均值和标准值以检查对随机权重初始化和数据增强的影响而给出的置信度。
此外,我在 tf2 上尝试了一些超参数调整,结果如下图:
预先感谢您的每一个建议。以下是
tf1
上的准确性和验证性能和 tf2
在我的数据集上:2020 年 12 月 14 日更新
我在 oxford_flowers 上按一个按钮分享 colab 的可重复性:
colab script
最佳答案
在进行相反的迁移(从 TF1+Keras 到 TF2)时,我遇到了类似的事情。
在下面运行此代码:
# using TF2
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 212.3205274187726
# using TF1+Keras
import numpy as np
from keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 187.23898954353717
您可以看到来自不同版本的同一个库中的同一个模型没有返回相同的值(使用 sum
作为快速检查)。我在另一个 SO 答案中找到了这种神秘行为的答案:ResNet model in keras and tf.keras give different output for the same image我给你的另一个建议是,尝试使用
pooling
从内部 applications.resnet50.ResNet50
类,而不是函数中的附加层,为简单起见,并删除可能的问题生成器:)
关于tensorflow - 在多类分类上从 tensorflow 2.3.1 降级到 tensorflow 1.14 或 1.15 时,由于过度拟合而导致精度性能下降,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65005842/