python - 使用 keras 函数式 API 构建(预训练的)CNN+LSTM 网络

标签 python tensorflow keras conv-neural-network lstm

我想在预训练的 CNN (VGG) 之上构建一个 LSTM 来对视频序列进行分类。 LSTM 将输入由 VGG 的最后一个 FC 层提取的特征。

架构是这样的:

enter image description here

我写的代码:

def build_LSTM_CNN_net()
      from keras.applications.vgg16 import VGG16
      from keras.models import Model
      from keras.layers import Dense, Input, Flatten
      from keras.layers.pooling import GlobalAveragePooling2D, GlobalAveragePooling1D
      from keras.layers.recurrent import LSTM
      from keras.layers.wrappers import TimeDistributed
      from keras.optimizers import Nadam
    
    
      from keras.applications.vgg16 import VGG16

      num_classes = 5
      frames = Input(shape=(5, 224, 224, 3))
      base_in = Input(shape=(224,224,3))
    
      base_model = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(224,224,3))
    
      x = Flatten()(base_model.output)
      x = Dense(128, activation='relu')(x)
      x = TimeDistributed(Flatten())(x)
      x = LSTM(units = 256, return_sequences=False, dropout=0.2)(x)
      x = Dense(self.nb_classes, activation='softmax')(x)
    
lstm_cnn = build_LSTM_CNN_net()
keras.utils.plot_model(lstm_cnn, "lstm_cnn.png", show_shapes=True)

但出现错误:

ValueError: `TimeDistributed` Layer should be passed an `input_shape ` with at least 3 dimensions, received: [None, 128]

为什么会这样,我该如何解决?

最佳答案

这里是构建模型以对视频序列进行分类的正确方法。请注意,我将模型实例包装到 TimeDistributed 中。这个模型之前是为了从每个帧中单独提取特征而构建的。在第二部分,我们处理帧序列

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=False)
cnn_base.trainable = False

cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(cnn_base.input, cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()

如果你想使用 VGG 1x4096 emb 表示,你可以简单地做:

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=True) #<=== include_top=True
cnn_base.trainable = False

cnn = Model(cnn_base.input, cnn_base.layers[-3].output) # -3 is the 4096 layer
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()

关于python - 使用 keras 函数式 API 构建(预训练的)CNN+LSTM 网络,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63809805/

相关文章:

python - 在 Visual Studio Code 中禁用自动换行

python - 如何在不使用选项卡的情况下在 jupyter notebook 中获得自动完成功能?

Python utf-8,如何对齐打印出来

python - TensorFlow 模型获得零损失

Tensorflow:如何用张量提供占位符变量?

Keras flow_from_directory 限制示例数量

python - 卷积层keras的平均 channel 数

python - Pandas 数据框 : TypeError: unorderable types: str() >= datetime. 日期()

python - Conv2D模型的训练卡住了[MNIST数据集]

python - Keras - 没有停止和恢复训练的好方法?