python - Tensorflow 抛出分布式函数错误

标签 python machine-learning keras data-science tensorflow2.0

我是 ML 和 tensorflow 的新手,正在尝试训练和使用标准文本生成模型。当我去训练模型时,我得到 此错误 :

Train for 155 steps
Epoch 1/5
  2/155 [..............................] - ETA: 4:49 - loss: 2.5786
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-133-d70c02ff4270> in <module>()
----> 1 model.fit(dataset, epochs=epochs, callbacks=[checkpoint_callback])

11 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  indices[58,87] = 63 is not in [0, 63)
     [[node sequential_12/embedding_12/embedding_lookup (defined at <ipython-input-131-d70c02ff4270>:1) ]]
     [[VariableShape/_24]]
  (1) Invalid argument:  indices[58,87] = 63 is not in [0, 63)
     [[node sequential_12/embedding_12/embedding_lookup (defined at <ipython-input-131-d70c02ff4270>:1) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_distributed_function_95797]

Errors may have originated from an input operation.
Input Source operations connected to node sequential_12/embedding_12/embedding_lookup:
 sequential_12/embedding_12/embedding_lookup/92192 (defined at /usr/lib/python3.6/contextlib.py:81)

Input Source operations connected to node sequential_12/embedding_12/embedding_lookup:
 sequential_12/embedding_12/embedding_lookup/92192 (defined at /usr/lib/python3.6/contextlib.py:81)

Function call stack:
distributed_function -> distributed_function

数据
data['title'] = [['Sentence'],['Sentence2'], ...]

数据准备
tokenizer = keras.preprocessing.text.Tokenizer(num_words=209, lower=False, char_level=True)
tokenizer.fit_on_texts(df['title'])
df['encoded_with_keras'] = tokenizer.texts_to_sequences(df['title'])

dataset = df['encoded_with_keras']
dataset = tf.keras.preprocessing.sequence.pad_sequences(dataset, padding='post')

dataset = dataset.flatten()

dataset = tf.data.Dataset.from_tensor_slices(dataset)

sequences = dataset.batch(seq_len+1, drop_remainder=True)


def create_seq_targets(seq):
    input_txt = seq[:-1]
    target_txt = seq[1:]
    return input_txt, target_txt

dataset = sequences.map(create_seq_targets)

batch_size = 128

buffer_size = 10000

dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

型号:
vocab_size = len(tokenizer.word_index)
embed_dim = 128
rnn_neurons = 256

epochs = 5

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None], mask_zero=True))
    model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True))
    model.add(Dropout(0.2))
    model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True))
    model.add(Dropout(0.2))
    model.compile(optimizer='adam', loss="sparse_categorical_crossentropy")
    return model

model.fit(dataset, epochs=epochs, callbacks=[checkpoint_callback])

我尝试更改几乎所有模型设置,并尝试自定义标记化和数据准备。但是这开始训练,在 155 的第二步我得到这个错误。我不知道从哪里开始任何帮助表示赞赏

最佳答案

尝试将 batch_size 更改为 32、16 或 8 之类的值。显然,对于 rtx 2060/70/80,存在一个 tensorflow 错误,导致内存不足。

关于python - Tensorflow 抛出分布式函数错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60082554/

相关文章:

recursion - 通过机器学习以递归方式预测时间范围

python - 带有掩蔽和直觉的 LSTM 回归问题 (keras)

Python代码不在数据库上创建表但能够查询结果postgres

python - 按数据框重新排序 pandas group

matlab - 将 MATLAB 稀疏过滤移植到 F# 时,我应该使用什么求解器来替换 minFunc

python - 在 keras 中使用 TFRecords

python - 值错误 : in user code while using keras model in R

python - 是否有一个python库,可让我组合在一起并从自定义波形列表中导出mp3文件?

python - 如何打印单个反斜杠?

使用 R 自动运行超过 30 个特定 set.seed 的回归模型