keras - 如何使用有状态 LSTM 模型进行预测,而不指定与我训练时相同的 batch_size?

标签 keras lstm stateful

我尝试设置 stateful=True 来训练我的 LSTM 模型并且它有效。

但是我必须将我的输入重新整形为我为第一层设置的相同的 batch_size,这是有状态 RNN 必须的,否则我会得到一个错误:InvalidArgumentError: Invalid input_h shape。

我将 batch_size 设置为 64,但我只想输入一个起始句来生成文本。如果我必须提供batch_size=64的输入,我需要准备64个句子,这很荒谬。

如果我没有设置 stateful=True 效果很好,但我需要提高性能。
在这种情况下,如何在不匹配我设置的 batch_size 的情况下使用有状态 LSTM 模型?

我定义的模型

seq_length = 100
batch_size = 64
epochs = 3

vocab_size = len(vocab) # 65
embedding_dim = 256
rnn_units = 1024

def bi_lstm(vocab_size, embedding_dim, batch_size, rnn_units):
  model = keras.models.Sequential([
      keras.layers.Embedding(vocab_size, embedding_dim,
                  batch_input_shape = (batch_size, None)),
      keras.layers.Bidirectional(
          keras.layers.LSTM(units = rnn_units, 
                  return_sequences = True,
                  stateful = True,
                  recurrent_initializer = "glorot_uniform"
      )),
      keras.layers.Dense(vocab_size),
  ])
  return model

我做了一个这样的简单测试,它向我显示了错误。

for x, y in seq_dataset.take(1):
  x = x[:-10,:] # change the batch size from 64 to 54, it worked well if I del this line
  print(x.shape)
  pred = model(x)
  print(pred.shape)
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-98-99323ee3e09d> in <module>()
      2   x = x[:-10,:]
      3   print(x.shape)
----> 4   pred = model(x)
      5   print(pred.shape)

14 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/sequential.py in call(self, inputs, training, mask)
    254       if not self.built:
    255         self._init_graph_network(self.inputs, self.outputs, name=self.name)
--> 256       return super(Sequential, self).call(inputs, training=training, mask=mask)
    257 
    258     outputs = inputs  # handle the corner case where self.layers is empty

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/network.py in call(self, inputs, training, mask)
    706     return self._run_internal_graph(
    707         inputs, training=training, mask=mask,
--> 708         convert_kwargs_to_constants=base_layer_utils.call_context().saving)
    709 
    710   def compute_output_shape(self, input_shape):

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/network.py in _run_internal_graph(self, inputs, training, mask, convert_kwargs_to_constants)
    858 
    859           # Compute outputs.
--> 860           output_tensors = layer(computed_tensors, **kwargs)
    861 
    862           # Update tensor_dict.

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/wrappers.py in __call__(self, inputs, initial_state, constants, **kwargs)
    526 
    527     if initial_state is None and constants is None:
--> 528       return super(Bidirectional, self).__call__(inputs, **kwargs)
    529 
    530     # Applies the same workaround as in `RNN.__call__`

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/wrappers.py in call(self, inputs, training, mask, initial_state, constants)
    640 
    641       y = self.forward_layer(forward_inputs,
--> 642                              initial_state=forward_state, **kwargs)
    643       y_rev = self.backward_layer(backward_inputs,
    644                                   initial_state=backward_state, **kwargs)

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent.py in __call__(self, inputs, initial_state, constants, **kwargs)
    621 
    622     if initial_state is None and constants is None:
--> 623       return super(RNN, self).__call__(inputs, **kwargs)
    624 
    625     # If any of `initial_state` or `constants` are specified and are Keras

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent_v2.py in call(self, inputs, mask, training, initial_state)
    959         if can_use_gpu:
    960           last_output, outputs, new_h, new_c, runtime = cudnn_lstm(
--> 961               **cudnn_lstm_kwargs)
    962         else:
    963           last_output, outputs, new_h, new_c, runtime = standard_lstm(

/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/layers/recurrent_v2.py in cudnn_lstm(inputs, init_h, init_c, kernel, recurrent_kernel, bias, mask, time_major, go_backwards)
   1172     outputs, h, c, _ = gen_cudnn_rnn_ops.cudnn_rnn(
   1173         inputs, input_h=init_h, input_c=init_c, params=params, is_training=True,
-> 1174         rnn_mode='lstm')
   1175 
   1176   last_output = outputs[-1]

/tensorflow-2.0.0/python3.6/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py in cudnn_rnn(input, input_h, input_c, params, rnn_mode, input_mode, direction, dropout, seed, seed2, is_training, name)
    107             input_mode=input_mode, direction=direction, dropout=dropout,
    108             seed=seed, seed2=seed2, is_training=is_training, name=name,
--> 109             ctx=_ctx)
    110       except _core._SymbolicException:
    111         pass  # Add nodes to the TensorFlow graph.

/tensorflow-2.0.0/python3.6/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py in cudnn_rnn_eager_fallback(input, input_h, input_c, params, rnn_mode, input_mode, direction, dropout, seed, seed2, is_training, name, ctx)
    196   "is_training", is_training)
    197   _result = _execute.execute(b"CudnnRNN", 4, inputs=_inputs_flat,
--> 198                              attrs=_attrs, ctx=_ctx, name=name)
    199   _execute.record_gradient(
    200       "CudnnRNN", _inputs_flat, _attrs, _result, name)

/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: Invalid input_h shape: [1,64,1024] [1,54,1024] [Op:CudnnRNN]

最佳答案

stateful=True , batch_size确实需要模型的逻辑才能正常工作。

但是,您的模型的权重不需要知道 batch_size一点也不。所以,如果有一些 set_batch_size() 就好了。方法,甚至更好,如果 fit()predict()可以从输入中推导出来。但不幸的是,事实并非如此。

但是有一个解决方法:只需定义该模型的另一个实例并指定 batch_size=1 (或任何你想要的数字)。然后,只需将训练模型的权重分配给这个具有不同批量大小的新模型:

model64 = bi_lstm(vocab_size, embedding_dim, batch_size=64, rnn_units=rnn_units)
model64.fit(...)
# optional: model64.save_weights('model64_weights.hdf5')

model1 = bi_lstm(vocab_size, embedding_dim, batch_size=1, rnn_units=rnn_units)
model1.set_weights(model64.get_weights()) # or: model1.load_weights('model64_weights.hdf5')
model1.predict(...)

这是因为 batch_size根本不参与权重的形状,因此它们是可互换的。

关于keras - 如何使用有状态 LSTM 模型进行预测,而不指定与我训练时相同的 batch_size?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58799212/

相关文章:

Tensorflow RNN-LSTM - 重置隐藏状态

python - 如何在 Keras 中创建自定义目标函数?

machine-learning - 验证准确性停滞,而训练准确性提高

python - LSTM 模型的时间序列数据输入抛出错误

machine-learning - 来自 for 循环的 Keras LSTM,使用具有自定义层数的函数式 API

java - Ejb 3,消息驱动bean与有状态 session bean合作?

python - 无法将 NumPy 数组转换为张量(不支持的对象类型 int)

python - Tensorflow 2.0 保存训练好的模型以供服务

python - 检查目标 : expected dense_2 to have 2 dimensions, 但获得形状为 (1, 1226, 2) 的数组时出错

php - 幻想草稿工具