python - 字符 LSTM 不断生成相同的字符序列

标签 python tensorflow keras lstm recurrent-neural-network

我正在使用 keras 训练一个 2 层字符 LSTM,以生成类似于我正在训练的语料库的字符序列。然而,当我训练 LSTM 时,经过训练的 LSTM 生成的输出一遍又一遍地是相同的序列。

我看到过针对类似问题的建议,包括增加 LSTM 输入序列长度、增加批量大小、添加 dropout 层和增加 dropout 量。我已经尝试了所有这些方法,但似乎都没有解决问题。取得一些成功的一件事是在生成过程中向 LSTM 输出的每个向量添加随机噪声向量。这是有道理的,因为 LSTM 使用上一步的输出来生成下一个输出。然而,一般来说,如果我添加足够多的噪声来打破 LSTM 的重复生成,输出的质量就会大大降低。

我的LSTM训练代码如下:

# [load data from file]
raw_text = collected_statements.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text + '\b')))
char_to_int = dict((c, i) for i, c in enumerate(chars)) 
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out]) 

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), 
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, 
save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# fix random seed for reproducibility
seed = 8
numpy.random.seed(seed)
# split into 80% for train and 20% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
  random_state=seed)

# train the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=18, 
  batch_size=256, callbacks=callbacks_list)

我的生成代码如下:

filename = "weights-improvement-18-1.5283.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = unpadded_patterns[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = (x / float(n_vocab)) + (numpy.random.rand(1, len(pattern), 1) * 0.01)
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
#print(index)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
sys.stdout.write(result)
pattern.append(index)
pattern = pattern[1:len(pattern)]
print("\nDone.")

当我运行生成代码时,我一遍又一遍地得到相同的序列:

we have the best economy in the history of our country." "we have the best 
economy in the history of our country." "we have the best economy in the 
history of our country." "we have the best economy in the history of our 
country." "we have the best economy in the history of our country." "we 
have the best economy in the history of our country." "we have the best 
economy in the history of our country." "we have the best economy in the 
history of our country." "we have the best economy in the history of our 
country."

除了一遍又一遍地生成相同的序列之外,还有什么我可以尝试的,可以帮助生成一些东西吗?

最佳答案

在您的角色生成中,我建议从您的模型输出的概率中抽样,而不是直接采用 argmax。这就是keras example char-rnn是为了获得多样性。

这是他们在示例中用于采样的代码:

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

在你的代码中你有 index = numpy.argmax(prediction)

我建议将其替换为 index = sample(prediction) 并试验您选择的温度。请记住,较高的温度会使您的输出更具随机性,而较低的温度会降低其随机性。

关于python - 字符 LSTM 不断生成相同的字符序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54030842/

相关文章:

python - Pandas - FillNa 具有相似列的另一个非空行

python - 如何通过变量/占位符的名称获取引用?

windows - Windows 10 上的 Tensorflow 安装,错误 'Not a supported wheel on this platform'

tensorflow - 基于AUC的提前停止

Python:如何在函数中键入提示 tf.keras 对象?

python - SQL 行不会删除。不抛出任何错误

python - 使用 Selenium Python 接受 1 个位置参数,但 2 个出现错误

algorithm - TensorFlow:它只有 SGD 算法吗?或者它是否也有其他像 LBFGS

python - 如何使用keras实现LSTM中多元回归的输入?

python - 如何在不导入 python 模块的情况下检查运行时是否有效?