python - Tensorflow/Keras : Input 0 of layer lstm is incompatible with the layer: expected ndim=3, 发现 ndim=2

我正在尝试实现联合训练 Keras/Tensorflow 模型来检测文本文章中的假新闻，但我在使用该模型时遇到了问题。当我尝试运行代码时，出现以下错误:

 ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 50]

以及以下警告:

WARNING:tensorflow:Model was constructed with shape (None, 400) for input Tensor("embedding_input:0", shape=(None, 400), dtype=float32), but it was called on an input with incompatible shape (None,).

直观上，我理解嵌入层输出的形状应该是(无、400、50)，但由于某种原因，它仅提供 2d 输入，或者该层需要 3d 张量，但只提供 2d 张量。但是，我不知道如何修复它，或者如何更改输入/输出形状以使它们匹配。我在这个问题上呆了几天。我在机器学习和神经网络领域还是个新手。感谢您的任何建议，非常感谢您。

使用的模型:

max_words = 2000
max_len = 400
embed_dim = 50
lstm_out = 64
batch_size = 32

def getTextModel():
    model = Sequential()
    model.add(Embedding(max_words, embed_dim, input_length = max_len, input_shape=preprocessed_sample_dataset.element_spec))
    model.add(LSTM(lstm_out))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, name='out_layer'))
    model.add(Activation('sigmoid'))
return model

模型摘要:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 400, 50)           100000    
_________________________________________________________________
lstm (LSTM)                  (None, 64)                29440     
_________________________________________________________________
dense (Dense)                (None, 256)               16640     
_________________________________________________________________
activation (Activation)      (None, 256)               0         
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
out_layer (Dense)            (None, 1)                 257       
_________________________________________________________________
activation_1 (Activation)    (None, 1)                 0         
=================================================================
Total params: 146,337
Trainable params: 146,337
Non-trainable params: 0

其他信息:

数据预处理:

def preprocess(dataset):

  def batch_format_fn(element):
    """Flatten a batch `pixels` and return the features as an `OrderedDict`."""
    print(element['features'])
    return collections.OrderedDict(
        x=element['features'],
        y=tf.reshape(element['label'], [-1, 1])
    )
  return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER).batch(
      BATCH_SIZE).map(batch_format_fn).prefetch(PREFETCH_BUFFER)

preprocessed_sample_dataset = preprocess(sample_dataset)


def make_federated_data(client_data, client_ids):
    return [preprocess(client_data.create_tf_dataset_for_client(x)) for x in client_ids]

federated_train_data = make_federated_data(train_dataset, train_dataset.client_ids)

print('Number of client datasets: {l}'.format(l=len(federated_train_data)))
print('First dataset: {d}'.format(d=federated_train_data[0]))

数据集格式:

Number of client datasets: 4
First dataset: <PrefetchDataset shapes: OrderedDict([(x, (None,)), (y, (None, 1))]), types: OrderedDict([(x, tf.string), (y, tf.int64)])>

调用函数的代码:

def model_fn():

  keras_model = getTextModel() #create_keras_model()
  input_spec_aux = preprocessed_sample_dataset.element_spec
  return tff.learning.from_keras_model(
      keras_model,
      input_spec= input_spec_aux,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

#Error occurs in iterative_process
iterative_process = tff.learning.build_federated_averaging_process(
    model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=client_lr),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=server_lr))

print(str(iterative_process.initialize.type_signature))

state = iterative_process.initialize()

最佳答案

数据集格式表示输入 x 的形状为 (None,) (ndim/rank, = 1) 和 dtype tf.string) 。 None 来自这样一个事实:数据集可能会产生不“完整”的批处理，因此实际上第一个维度在 [1, BATCH_SIZE] 范围内。这种形状意味着我们有一批单标量字符串。这可能就是问题所在，通常在 LSTM 中，我们需要批量的字符串序列，例如类似于 (None, SEQUENCE_LENGTH) 的形状。

嵌入层会将最后一个维度投影到嵌入维度z中，例如采取形状(x, y)并生成形状(x, y, z)。因此，嵌入层之后的输入将是 (None, 50) (或 ndim/rank = 2)。回想一下 LSTM 想要序列，而 Keras 想要批处理，错误消息指出所需的形状为 (None, SEQUENCE_LENGTH, 50) (ndim/rank = 3)。

我建议返回数据集并确定 element['features'] 的格式是什么。在这种情况下，它可能是一个完整的句子，需要被标记为单词序列(例如，对于空格上的英语分割)。

警告一句:即使在修复了形状之后，我怀疑 Keras 接下来会提示 tf.string 的数据类型不能在嵌入层中使用。这些序列首先需要转换为整数 ID，可能使用 tf.lookup 中的内容。或来自 tf_text 的内容。

一些可能有用的资源:

Federated Learning for Text Generation Tutorial ，特别是数据集构建部分。
Load text tutorial

关于python - Tensorflow/Keras : Input 0 of layer lstm is incompatible with the layer: expected ndim=3, 发现 ndim=2，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67533039/

python - Tensorflow/Keras : Input 0 of layer lstm is incompatible with the layer: expected ndim=3, 发现 ndim=2

使用的模型:

模型摘要:

其他信息:

上一篇：python - Py_Finalize() 导致 Python 3.9 的段错误但不是 Python 2.7

下一篇：regex - 如何使用 Oracle regexp_substr 从字符串中提取单词？