python-3.x - wordnet python-nltk 接口(interface)是否包含任何语义相关性度量?

标签 python-3.x nlp nltk wordnet

我知道我可以使用 nltk 接口(interface)中的语义相似性

sim=wn.synset(name_1).path_similarity(wn.synset(name_2))

我也知道我可以使用向量空间模型和共现矩阵来评估单词的语义相关性,但我无法在 nltk 界面中找到任何解决方案。

最佳答案

NLTK-WordNet 拥有大量基于 WordNet 分类法的单词相似度算法,尽管没有一个算法基于向量空间模型或共现矩阵。

from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic

# Wordnet information content file
brown_ic = wordnet_ic.ic('ic-brown.dat')

cat = wn.synsets('cat')[0]
dog = wn.synsets('dog')[0]


'''
Path Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses
in the is-a (hypernym/hypnoym) taxonomy.
The score is in the range 0 to 1.
'''
print(wn.path_similarity(cat, dog))
# 0.2

'''
Leacock-Chodorow Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses (as above)
and the maximum depth of the taxonomy in which the senses occur.
The relationship is given as -log(p/2d)
where p is the shortest path length and d the taxonomy depth.
'''
print(wn.lch_similarity(cat, dog))
# 2.0281482472922856

'''
Wu-Palmer Similarity:
Return a score denoting how similar two word senses are,
based on the depth of the two senses in the taxonomy
and that of their Least Common Subsumer (most specific ancestor node).
'''
print(wn.wup_similarity(cat, dog))
# 0.8571428571428571

'''
Lin Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
'''
print(wn.lin_similarity(cat, dog, ic=brown_ic))
# 0.8768009843733973

'''
Resnik Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
Note that for any similarity measure that uses information content,
the result is dependent on the corpus used to generate the information content
and the specifics of how the information content was created.
'''
print(wn.res_similarity(cat, dog, ic=brown_ic))
# 7.911666509036577

'''
Jiang-Conrath Similarity
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
'''
print(wn.jcn_similarity(cat, dog, ic=brown_ic))
# 0.4497755285516739

关于python-3.x - wordnet python-nltk 接口(interface)是否包含任何语义相关性度量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63514884/

相关文章:

java - GATE 对 NLP 有多好?

python-3.x - Flask-OIDC redirect_uri 值在某处被覆盖?

python - 模块 Seaborn 没有属性 '<any graph>'

python - 加载预先计算的向量 Gensim

python - 使用 NLTK 查找事物列表(例如河流列表)

python - 在 Python 上提取完整的字符串

python - 自然语言处理

python - 如何从我收到的电子邮件中的超链接中提取 URL?

Python创建普通数组的多维数组

nlp - tf-idf(三角不等式)的余弦相似度替代方案