python - 从 Python 的 NLTK 中的自定义文本生成随机句子？

我在使用 Python 下的 NLTK 时遇到问题，特别是 .generate() 方法。

generate(self, length=100)

Print random text, generated using a trigram language model.

Parameters:
   * length (int) - The length of text to generate (default=100)

这是我正在尝试的简化版本。

import nltk

words = 'The quick brown fox jumps over the lazy dog'
tokens = nltk.word_tokenize(words)
text = nltk.Text(tokens)
print text.generate(3)

这将总是生成

Building ngram index...
The quick brown
None

与用单词构建随机短语相反。

这是我的输出结果

print text.generate()

Building ngram index...
The quick brown fox jumps over the lazy dog fox jumps over the lazy
dog dog The quick brown fox jumps over the lazy dog dog brown fox
jumps over the lazy dog over the lazy dog The quick brown fox jumps
over the lazy dog fox jumps over the lazy dog lazy dog The quick brown
fox jumps over the lazy dog the lazy dog The quick brown fox jumps
over the lazy dog jumps over the lazy dog over the lazy dog brown fox
jumps over the lazy dog quick brown fox jumps over the lazy dog The
None

再次从相同的文本开始，但随后有所不同。我也试过使用奥威尔 1984 年的第一章。同样，总是以前 3 个标记开始(在本例中其中一个是空格)然后然后随机生成文本。

我在这里做错了什么？

最佳答案

要生成随机文本，你需要使用 Markov Chains

执行此操作的代码:from here

import random

class Markov(object):

  def __init__(self, open_file):
    self.cache = {}
    self.open_file = open_file
    self.words = self.file_to_words()
    self.word_size = len(self.words)
    self.database()


  def file_to_words(self):
    self.open_file.seek(0)
    data = self.open_file.read()
    words = data.split()
    return words


  def triples(self):
    """ Generates triples from the given data string. So if our string were
    "What a lovely day", we'd generate (What, a, lovely) and then
    (a, lovely, day).
    """

    if len(self.words) < 3:
      return

    for i in range(len(self.words) - 2):
      yield (self.words[i], self.words[i+1], self.words[i+2])

  def database(self):
    for w1, w2, w3 in self.triples():
      key = (w1, w2)
      if key in self.cache:
    self.cache[key].append(w3)
      else:
    self.cache[key] = [w3]

  def generate_markov_text(self, size=25):
    seed = random.randint(0, self.word_size-3)
    seed_word, next_word = self.words[seed], self.words[seed+1]
    w1, w2 = seed_word, next_word
    gen_words = []
    for i in xrange(size):
      gen_words.append(w1)
      w1, w2 = w2, random.choice(self.cache[(w1, w2)])
    gen_words.append(w2)
    return ' '.join(gen_words)

解释: Generating pseudo random text with Markov chains using Python

关于python - 从 Python 的 NLTK 中的自定义文本生成随机句子？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1150144/

python - 从 Python 的 NLTK 中的自定义文本生成随机句子？

上一篇：python - 在 Python 中对 1M 记录进行排序的最佳方法

下一篇：python - 你如何管理你的 Django 应用程序？