python - Word2vec - 获得相似度等级

标签 python python-3.x nlp gensim word2vec

鉴于我有一个 word2vec 模型(通过 gensim)，我想获得单词之间的排名相似度。例如，假设我有“desk”这个词，与“desk”最相似的词是:

table 0.64

chair 0.61

book 0.59

pencil 0.52

我想创建一个函数:

f(desk,book) = 3 Since book is the 3rd most similar word to desk. Does it exists? what is the most efficient way to do this?

最佳答案

您可以使用 rank(entity1, entity2) 获取距离 - 与索引相同。

model.wv.rank(sample_word, most_similar_word)

此处不需要下面给出的单独函数。保留它以供引用。

假设您在元组列表中有单词列表及其向量，由 model.wv.most_similar(sample_word) 返回，如图所示

[('table', 0.64), ('chair', 0.61), ('book', 0.59), ('pencil', 0.52)]

以下函数接受样本词和最相似的词作为参数，并返回索引或排名(例如 [2])(如果它出现在输出中)

def rank_of_most_similar_word(sample_word, most_similar_word):
    l = model.wv.most_similar(sample_word)
    return [x+1 for x, y in enumerate(l) if y[0] == most_similar_word]

sample_word = 'desk'
most_similar_word = 'book'
rank_of_most_similar_word(sample_word, most_similar_word)

注意:在使用 model.wv.most_similar() 时，使用 topn=x 获取前 x 个最相似的词，如评论中所建议。

关于python - Word2vec - 获得相似度等级，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51747613/

上一篇：python - 为什么我的 NumPy 数组占用的内存比应有的少得多？

下一篇：python - plot 和 annotate 中的线型在 matplotlib 中不相等

相关文章：

java - 从 python 中编译 java

python - 删除多维数组的前导和尾随全纳米轴

machine-learning - TF-IDF提取关键词

python - 理解 LDA/主题建模——主题重叠太多

java - SimpleNLG 模型中的 TextSpec 不可用

python - 解析 imaplib 返回的 Message-ID header

python - 如何根据名称选择 pandas 中的列？

python-3.x - 用于上传文件的 AWS S3 generate_presigned_url 与 generate_presigned_post

python-3.x - 使用 WIN32API 打开 Powerpoint 演示文稿，另存为 PDF 并关闭应用程序

python - Django 注册 |改变行为