过去三天我一直在与 NLTK 合作,熟悉并阅读《自然语言处理》一书以了解发生了什么。我很好奇是否有人可以为我澄清以下内容:
Note that the first time you run this command, it is slow because it gathers statistics about word sequences. Each time you run it, you will get different output text. Now try generating random text in the style of an inaugural address or an Internet chat room. Although the text is random, it re-uses common words and phrases from the source text and gives us a sense of its style and content. (What is lacking in this randomly generated text?)
这部分文字,chapter 1,只是说它“收集统计数据”并且会得到“不同的输出文本”
generate 的具体作用是什么?它是如何工作的?
此 generate()
示例使用 text3,即圣经的创世记:
In the beginning , between me and thee and in the garden thou mayest come in unto Noah into the ark , and Mibsam , And said , Is there yet any portion or inheritance for us , and make thee as Ephraim and as the sand of the dukes that came with her ; and they were come . Also he sent forth the dove out of thee , with tabret , and wept upon them greatly ; and she conceived , and called their names , by their names after the end of the womb ? And he
这里,generate()
函数似乎只是输出通过在标点符号处切断文本并随机重新组合而创建的短语,但它具有一定的可读性。
最佳答案
type(text3)
会告诉您 text3 的类型为 nltk.text.Text
。
引用documentation Text.generate()
:
Print random text, generated using a trigram language model.
这意味着 NLTK 创建了一个 N-Gram model对于创世记文本,计算三个单词序列的每次出现,以便可以预测该文本中任何给定两个单词的最可能的后继者。 N-Gram 模型将在 chapter 5 of the NLTK book 中有更详细的解释。 .
另请参阅 this question 的答案.
关于nlp - 在 Python 中使用 NLTK 时,generate() 会做什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18391602/