python文本处理: identify nouns from individual words

我有一个单词列表，只想保留名词。

这不是 Extracting all Nouns from a text file using nltk 的重复项

在链接的问题中处理一段文本。接受的答案提出了一个标记器。我知道标记文本的不同选项(nlkt、textblob、spacy)，但我无法使用它们，因为我的数据不包含句子。我只有单个单词的列表:

would
research
part
technologies
size
articles
analyzes
line

nltk 有多种语料库可供选择。我发现 verbnet 包含完整的动词列表。但到目前为止我还没有看到任何类似的名词。有没有类似字典的东西，我可以在其中查找单词是否是名词、动词、形容词等？

这可能可以通过一些在线服务来完成。例如，微软翻译在其响应中返回大量信息:https://learn.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-dictionary-lookup?tabs=curl 但这是一项付费服务。我更喜欢 python 包。

关于单词的歧义:理想情况下，我想要一本可以告诉我单词可以具有的所有功能的字典。例如，“鱼”既是名词又是动词。 “吃”只是动词，“狗”只是名词。我知道这不是一门精确的科学。一个可行的解决方案只是删除所有不能是名词的单词。

最佳答案

尝试使用 wordnet ？

from nltk.corpus import wordnet
words = ["would","research","part","technologies","size","articles","analyzes","line"]
for w in words:
    syns = wordnet.synsets(w)
    print(w, syns[0].lexname().split('.')[0]) if syns else (w, None)

您应该看到:

('would', None)
('research', u'noun')
('part', u'noun')
('technologies', u'noun')
('size', u'noun')
('articles', u'noun')
('analyzes', u'verb')
('line', u'noun')

关于python文本处理: identify nouns from individual words，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53180810/

python文本处理: identify nouns from individual words

上一篇：python - 使用 PIL 模块的 UnicodeDecodeError

下一篇：python - numpy，其中 RGB channel 大于 [0,0,0]