我在使用 Python 下的 NLTK 时遇到问题,特别是 .generate() 方法。
generate(self, length=100)
Print random text, generated using a trigram language model.
Parameters:
* length (int) - The length of text to generate (default=100)
这是我正在尝试的简化版本。
import nltk
words = 'The quick brown fox jumps over the lazy dog'
tokens = nltk.word_tokenize(words)
text = nltk.Text(tokens)
print text.generate(3)
这将总是生成
Building ngram index...
The quick brown
None
与用单词构建随机短语相反。
这是我的输出结果
print text.generate()
Building ngram index...
The quick brown fox jumps over the lazy dog fox jumps over the lazy
dog dog The quick brown fox jumps over the lazy dog dog brown fox
jumps over the lazy dog over the lazy dog The quick brown fox jumps
over the lazy dog fox jumps over the lazy dog lazy dog The quick brown
fox jumps over the lazy dog the lazy dog The quick brown fox jumps
over the lazy dog jumps over the lazy dog over the lazy dog brown fox
jumps over the lazy dog quick brown fox jumps over the lazy dog The
None
再次从相同的文本开始,但随后有所不同。我也试过使用奥威尔 1984 年的第一章。同样,总是 以前 3 个标记开始(在本例中其中一个是空格)然后然后随机生成文本。
我在这里做错了什么?
最佳答案
要生成随机文本,你需要使用 Markov Chains
执行此操作的代码:from here
import random
class Markov(object):
def __init__(self, open_file):
self.cache = {}
self.open_file = open_file
self.words = self.file_to_words()
self.word_size = len(self.words)
self.database()
def file_to_words(self):
self.open_file.seek(0)
data = self.open_file.read()
words = data.split()
return words
def triples(self):
""" Generates triples from the given data string. So if our string were
"What a lovely day", we'd generate (What, a, lovely) and then
(a, lovely, day).
"""
if len(self.words) < 3:
return
for i in range(len(self.words) - 2):
yield (self.words[i], self.words[i+1], self.words[i+2])
def database(self):
for w1, w2, w3 in self.triples():
key = (w1, w2)
if key in self.cache:
self.cache[key].append(w3)
else:
self.cache[key] = [w3]
def generate_markov_text(self, size=25):
seed = random.randint(0, self.word_size-3)
seed_word, next_word = self.words[seed], self.words[seed+1]
w1, w2 = seed_word, next_word
gen_words = []
for i in xrange(size):
gen_words.append(w1)
w1, w2 = w2, random.choice(self.cache[(w1, w2)])
gen_words.append(w2)
return ' '.join(gen_words)
解释: Generating pseudo random text with Markov chains using Python
关于python - 从 Python 的 NLTK 中的自定义文本生成随机句子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1150144/