我正在阅读这篇论文 http://cs.stanford.edu/~quocle/paragraph_vector.pdf
它指出
" Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors."
串联或平均如何工作?
示例(如果第 1 段包含 word1 和 word2):
word1 vector =[0.1,0.2,0.3]
word2 vector =[0.4,0.5,0.6]
concat method
does paragraph vector = [0.1+0.4,0.2+0.5,0.3+0.6] ?
Average method
does paragraph vector = [(0.1+0.4)/2,(0.2+0.5)/2,(0.3+0.6)/2] ?
同样来自这张图片:
据说:
The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph. For this reason, we often call this model the Distributed Memory Model of Paragraph Vectors (PV-DM).
段落标记是否等于等于
on
的段落向量?最佳答案
How does concatenation or averaging work?
你做对了平均水平。串联是:
[0.1,0.2,0.3,0.4,0.5,0.6]
.Is the paragraph token equal to the paragraph vector which is equal to on?
“段落标记”被映射到称为“段落向量”的向量。它不同于标记“on”,也不同于标记“on”映射到的词向量。
关于nlp - gensim 如何计算 doc2vec 段落向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40413866/