python - 什么是 Python 中最快速准确的词性标注器(具有商业许可证)？

标签 python pos-tagger

<分区>

关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。

我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。

关闭 6 年前。

Improve this question

哪个 POS 标记器快速准确，并且拥有允许将其用于商业需求的许可证？为了进行测试，我使用了 Stanford POS，它运行良好，但速度很慢，而且我遇到了许可证问题。

最佳答案

您可以使用 nltk .

>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

解释:

word_tokenize 首先将句子正确标记为单词。同样可用的是 sentence tokenizer.

然后，pos_tag 将一组单词标记到词性中。

More information available here和 here.

参见 this answer获取 Python 中词性标注器的详细列表。

NLTK is not perfect. In fact, no model is perfect.

您可能需要先运行

>>> import nltk; nltk.download()

为了加载分词器数据。

关于python - 什么是 Python 中最快速准确的词性标注器(具有商业许可证)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41346881/

上一篇：python - 子类 init 方法只调用基础初始化方法

下一篇：python - 删除具有任何/所有 NaN 值的行/列

相关文章：

python - 删除numpy数组中的行和列

python - 在 Python 中将表示为 <number>[m|h|d|s|w] 的时间字符串转换为秒

python pytest 偶尔会因 OSError : reading from stdin while output is captured 而失败

python - 依赖关系解析(括号格式) - 西类牙语 - 使用 nltk 和 stanford-nlp 标签

java - 斯坦福 POS 标记器 : How to preserve newlines in the output?

python - NLTK POS tagger 要求我下载什么？

python-3.x - 如何在单词 'the' 之后找到最常用的名词？

python - 如何在 Visual Studio Code 中配置 pylint 以搜索项目目录中的模块

python - 关于 Python 中 IDLE 调试器的问题

nlp - 检测第一/第二/第三人称代词