python - 使用 NLTK 创建 pos 标记语料库

我想用 NLTK 构建词性标记语料库。这样我就可以基于它来训练我的模型。

到目前为止，我已经引用了很多资料，但每个资料都只是解释如何阅读标记的语料库以及阅读单词、句子等。以下是我尝试过的一段代码:

from nltk.corpus.reader import TaggedCorpusReader
reader = TaggedCorpusReader('/home/abc/nltk_data/', 'pos_tagged.pos')
reader.words()
reader.tagged_words()
reader.sents()

我想将我的语料库包含在 home/nltk_data/corpora/ 文件夹中，以便我可以导入我创建的语料库。请指导我。

最佳答案

我得到了有效的解决方案: 请引用link了解分步过程。

从here下载相同的必要文件.

一旦您遵循 1 的命令将生成 pickle 文件，这是您的标记语料库。

生成 pickle 文件后，您可以通过运行以下代码来检查标记器是否正常工作:

import nltk.data
tagger = nltk.data.load("taggers/NAME_OF_TAGGER.pickle")
tagger.tag(['some', 'words', 'in', 'a', 'sentence'])

关于python - 使用 NLTK 创建 pos 标记语料库，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46426359/

上一篇：Python boto3 过滤RDS标签

下一篇：python - 向量化 numpy 运算

相关文章：

python - Pandas .title() 当字符串有一个 's

nlp - 如何从 NLTK 中的文本中提取关系

nlp - 如何使用斯坦福 CoreNLP NER 和 POS 标记预标记化文本？

nlp - 韩语、泰语和印尼语 POS 标记器

python - 获得锁后检查 multiprocessing.Value 的值

python - 使用 Eve 和 AngularJS 的 CORS 问题

python - 在 pandas 中执行 nltk.stem.SnowballStemmer

python - nltk StanfordNERTagger : How to get proper nouns without capitalization

Python游标动态选择列

python - 如何使用 pip 将 nltk_data 安装为包？