python - NLTK 中的 Wordnet 选择限制

标签 python nlp nltk wordnet

有没有办法通过 NLTK 从同义词集中捕获 WordNet 选择限制(例如 +animate、+human 等)? 或者是否有任何其他方式提供有关同义词集的语义信息?我能得到的最接近它的是上位词关系。

最佳答案

这取决于你的“选择限制”是什么,或者我称之为语义特征,因为在经典语义中,存在着一个概念的世界,为了比较我们必须找到的概念

  • 区分特征(即用于区分概念的特征)和
  • 相似特征(即相似概念的特征,并强调区分它们的必要性)

例如:

Man is [+HUMAN], [+MALE], [+ADULT]
Woman is [+HUMAN], [-MALE], [+ADULT]

[+HUMAN] and [+ADULT] = similarity features
[+-MALE] is the discrimating features

传统语义学和将这一理论应用到计算语义学中的共同问题是

"Is there a specific list of features that we can use to compare any

"If so, what are the features on this list?" concepts?"

(有关详细信息,请参阅 www.acl.ldc.upenn.edu/E/E91/E91-1034.pdf )

回到 WordNet,我可以建议 2 种方法来解决“选择限制”

首先,检查区分特征的上位词,但首先您必须确定什么是区分特征。为了区分动物和人类,让我们将区分特征作为 [+-human] 和 [+-animal]。

from nltk.corpus import wordnet as wn

# Concepts to compare
dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# To access the hypernym_paths()[0]
# It's weird for that hypernym_paths gives a list of list rather than a list, nevertheless it works.
dog_hypernyms = dog_sense.hypernym_paths()[0]
jb_hypernyms = jb_sense.hypernym_paths()[0]


# Discriminating features in terms of concepts in WordNet
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

try:
  assert human in jb_hypernyms and animal not in jb_hypernyms
  print "James Baldwin is human"
except:
  print "James Baldwin is not human"

try:
  assert human in dog_hypernyms and animal not in dog_hypernyms
  print "Dog is an animal"
except:
  print "Dog is not an animal"

其次,按照@Jacob 的建议检查相似性度量。

dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# Features to check against whether the 'dubious' concept is a human or an animal
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

if dog_sense.wup_similarity(animal) > dog_sense.wup_similarity(human):
  print "Dog is more of an animal than human"
elif dog_sense.wup_similarity(animal) < dog_sense.wup_similarity(human):
  print "Dog is more of a human than animal"

关于python - NLTK 中的 Wordnet 选择限制,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5493565/

相关文章:

python - 计算具有多列的 pandas 数据框中的聚合值

python - 在 python 中可视化推文长度

python - 如何使用 Pattern 对西类牙语单词进行词形还原?

machine-learning - 区分具有相同含义但使用不同单词组合的句子

python - 如何从 nltk 下载器中删除数据/模型?

python - NLTK Verbnet 给出错误的类

python - nltk 中的退避标记器

python - Pandas:组合具有不同时间频率的列

Python 自由变量。为什么会失败?

python - 有没有办法使用 SpaCy 获取整个成分?