我需要衡量两个句子之间的相似度。例如:
s1 = "she is good a dog "
s2 = "she is nice a heel"
我需要证明 "good"
与 "nice"
相似。对于名词和动词,路径相似性度量的工作原理如下伪代码:
def get max :
for loop
(wn.synset ('dog ')).path_similarity(wn.synset ('animal'))
结果:.33
,这是一个很高的值,那么这些词是相关的,可以说是相似的。但是对于副词("nice"
和 "good"
),值 .09
很低!
有什么想法吗?
您可以找到good
的所有synsets
的path_similarity
然后选择最大值:
>>> from nltk.corpus import wordnet as wn
>>> n=wn.synsets('nice')
>>> g=wn.synsets('good')
>>> [i.path_similarity(n[0]) for i in g]
[0.0625, 0.06666666666666667, 0.07142857142857142, 0.09090909090909091, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> max(i.path_similarity(n[0]) for i in g)
0.09090909090909091
请注意,一个词的同义词集
包含一个词的多种形式,例如verb、none、adj、...所以您需要选择正确的一个!
您还可以使用 wup_similarity
作为另一种选择:
>>> round(max(i.wup_similarity(n[0]) for i in g), 1)
0.4
Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
阅读更多关于 Synsets 的信息 http://www.nltk.org/howto/wordnet.html