python - Keras 模型未能减少损失

标签 python tensorflow keras deep-learning tensorflow-datasets

我提出了一个示例,其中 tf.keras 模型无法从非常简单的数据中学习。我正在使用 tensorflow-gpu==2.0.0keras==2.3.0 和 Python 3.7。在文章的最后,我给出了重现我观察到的问题的 Python 代码。

<小时/>
  1. 数据

样本是形状为 (6, 16, 16, 16, 3) 的 Numpy 数组。为了让事情变得非常简单,我只考虑充满 1 和 0 的数组。带有 1 的数组被赋予标签 1,带有 0 的数组被赋予标签 0。我可以使用以下代码生成一些样本(在下面,n_samples = 240):

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])

为了在 tf.keras 模型中输入这些数据,我使用下面的代码创建了 tf.data.Dataset 的实例。这实际上会创建 BATCH_SIZE = 12 样本的混洗批处理。

def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset
  • 型号
  • 我提出以下模型来对我的样本进行分类:

    def create_model(in_shape=(6, 16, 16, 16, 3)):
    
        input_layer = Input(shape=in_shape)
    
        reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
    
        conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
    
        relu_layer_1 = ReLU()(conv3d_layer)
    
        pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
    
        reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
    
        expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
    
        conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
    
        relu_layer_2 = ReLU()(conv1d_layer)
    
        reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
    
        out = Dense(units=2, activation='softmax')(reshape_layer_2)
    
        return Model(inputs=[input_layer], outputs=[out])
    

    该模型使用 Adam(使用默认参数)和 binary_crossentropy 损失进行优化:

    clf_model = create_model()
    clf_model.compile(optimizer=Adam(),
                      loss='categorical_crossentropy',
                      metrics=['accuracy', 'categorical_crossentropy'])
    

    clf_model.summary() 的输出是:

    Model: "model"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         [(None, 6, 16, 16, 16, 3) 0         
    _________________________________________________________________
    lambda (Lambda)              (None, 16, 16, 16, 3)     0         
    _________________________________________________________________
    conv3d (Conv3D)              (None, 8, 8, 8, 64)       98368     
    _________________________________________________________________
    re_lu (ReLU)                 (None, 8, 8, 8, 64)       0         
    _________________________________________________________________
    global_average_pooling3d (Gl (None, 64)                0         
    _________________________________________________________________
    lambda_1 (Lambda)            (None, 384)               0         
    _________________________________________________________________
    lambda_2 (Lambda)            (None, 1, 384)            0         
    _________________________________________________________________
    conv1d (Conv1D)              (None, 1, 1)              385       
    _________________________________________________________________
    re_lu_1 (ReLU)               (None, 1, 1)              0         
    _________________________________________________________________
    lambda_3 (Lambda)            (None, 1)                 0         
    _________________________________________________________________
    dense (Dense)                (None, 2)                 4         
    =================================================================
    Total params: 98,757
    Trainable params: 98,757
    Non-trainable params: 0
    
  • 培训
  • 模型训练了 500 个时期,如下所示:

    train_ds = make_tfdataset(for_training=True)
    
    history = clf_model.fit(train_ds,
                            epochs=500,
                            steps_per_epoch=ceil(240 / BATCH_SIZE),
                            verbose=1)
    
  • 问题!
  • During the 500 epochs, the model loss stays around 0.69 and never goes below 0.69. This is also true if I set the learning rate to 1e-2 instead of 1e-3. The data is very simple (just 0s and 1s). Naively, I would expect the model to have a better accuracy than just 0.6. In fact, I would expect it to reach 100% accuracy quickly. What I am doing wrong?

  • 完整代码...
  • import numpy as np
    import tensorflow as tf
    import tensorflow.keras.backend as K
    from math import ceil
    from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
    from tensorflow.keras.models import Model
    from tensorflow.keras.optimizers import Adam
    
    BATCH_SIZE = 12
    
    
    def generate_fake_data():
        for j in range(1, 240 + 1):
            if j < 120:
                yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
            else:
                yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
    
    
    def make_tfdataset(for_training=True):
        dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                                 output_types=(tf.float32,
                                                               tf.float32),
                                                 output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                                tf.TensorShape([2])))
        dataset = dataset.repeat()
        if for_training:
            dataset = dataset.shuffle(buffer_size=1000)
        dataset = dataset.batch(BATCH_SIZE)
        dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
        return dataset
    
    
    def create_model(in_shape=(6, 16, 16, 16, 3)):
    
        input_layer = Input(shape=in_shape)
    
        reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
    
        conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
    
        relu_layer_1 = ReLU()(conv3d_layer)
    
        pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
    
        reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
    
        expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
    
        conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
    
        relu_layer_2 = ReLU()(conv1d_layer)
    
        reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
    
        out = Dense(units=2, activation='softmax')(reshape_layer_2)
    
        return Model(inputs=[input_layer], outputs=[out])
    
    
    train_ds = make_tfdataset(for_training=True)
    clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
    clf_model.summary()
    clf_model.compile(optimizer=Adam(lr=1e-3),
                      loss='categorical_crossentropy',
                      metrics=['accuracy', 'categorical_crossentropy'])
    
    history = clf_model.fit(train_ds,
                            epochs=500,
                            steps_per_epoch=ceil(240 / BATCH_SIZE),
                            verbose=1)
    

    最佳答案

    您的代码有一个关键问题:维度改组。您应该永远接触的一个维度是批处理维度 - 因为根据定义,它保存数据的独立样本。在第一次 reshape 中,您将特征尺寸与批量尺寸混合:

    Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
    Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)
    

    这就像喂 72 个独立的形状样本 (16,16,16,3) 。其他层也遇到类似的问题。

    <小时/> 解决方案:

    • 不要 reshape 每一步(您应该使用 Reshape ),而是塑造现有的卷积层和池化层,让一切直接顺利进行。
    • 除了输入和输出层之外,最好为每个层命名简短的内容 - 不会丢失清晰度,因为每行都由层名称明确定义
    • GlobalAveragePooling旨在成为最后层,因为它折叠功能尺寸 - 在您的情况下,如下所示:(12,16,16,16,3) --> (12,3) ;之后的转换没有多大作用
    • 根据上面的内容,我替换了 Conv1DConv3D
    • 除非您使用可变批量大小,否则始终选择 batch_shape=shape= ,因为您可以完整检查图层尺寸(非常有用)
    • 你的真实batch_size这里是 6,从您的评论回复中推导出来
    • kernel_size=1和(特别是)filters=1是一个非常弱的卷积,我相应地替换了它 - 如果您愿意,您可以恢复
    • 如果您的预期应用程序中只有 2 个类,我建议使用 Dense(1, 'sigmoid')binary_crossentropy损失

    最后一点:除了维度改组建议之外,您可以将上述所有内容都扔掉,但仍然可以获得完美的训练集性能;这是问题的根源。

    def create_model(batch_size, input_shape):
    
        ipt = Input(batch_shape=(batch_size, *input_shape))
        x   = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
                                 activation='relu', padding='same')(ipt)
        x   = Conv3D(filters=8,  kernel_size=4, strides=(2, 2, 2),
                                 activation='relu', padding='same')(x)
        x   = GlobalAveragePooling3D()(x)
        out = Dense(units=2, activation='softmax')(x)
    
        return Model(inputs=ipt, outputs=out)
    
    BATCH_SIZE = 6
    INPUT_SHAPE = (16, 16, 16, 3)
    BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)
    
    def generate_fake_data():
        for j in range(1, 240 + 1):
            if j < 120:
                yield np.ones(INPUT_SHAPE), np.array([0., 1.])
            else:
                yield np.zeros(INPUT_SHAPE), np.array([1., 0.])
    
    
    def make_tfdataset(for_training=True):
        dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                     output_types=(tf.float32,
                                                   tf.float32),
                                     output_shapes=(tf.TensorShape(INPUT_SHAPE),
                                                    tf.TensorShape([2])))
        dataset = dataset.repeat()
        if for_training:
            dataset = dataset.shuffle(buffer_size=1000)
        dataset = dataset.batch(BATCH_SIZE)
        dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
        return dataset
    
    <小时/>

    结果:

    Epoch 28/500
    40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000
    

    关于python - Keras 模型未能减少损失,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58237726/

    相关文章:

    python - 在 sqlalchemy ilike 语句中使用变量

    python - 谷歌应用引擎搜索 API

    python - numpy中的反向数组?

    python - 抽象模型的最佳文件夹结构

    tensorflow - ELBO 是否包含变分自动编码器中的重建损失信息

    python - WALS Model Tensorflow - 为新用户获取推荐

    python - 喀拉斯 fit_generator : Unexpected usage of __getitem__ method

    python - Keras - 数据集的数据生成器太大而无法放入内存

    python - Tensorflow Keras RMSE 指标返回的结果与我自己构建的 RMSE 损失函数不同

    python - Keras : AttributeError: 'int' object has no attribute 'ndim' when using model. 适合