machine-learning - doc2vec : How is PV-DBOW implemented

我知道Python(gensim)中已经存在PV-DBOW(段落向量)的实现。但我有兴趣知道如何自己实现它。解释来自official paper PV-DBOW如下:

Another way is to ignore the context words in the input, but force the model to predict words randomly sampled from the paragraph in the output. In reality, what this means is that at each iteration of stochastic gradient descent, we sample a text window, then sample a random word from the text window and form a classification task given the Paragraph Vector.

根据论文，词向量没有被存储据说 PV-DBOW 的工作原理类似于 word2vec 中的 Skip gram。

Skip-gram 的解释见 word2vec Parameter Learning 。在skip gram模型中，词向量被映射到隐藏层。执行此映射的矩阵在训练期间更新。在PV-DBOW中，隐藏层的维数应该是一段向量的维数。当我想将采样示例的词向量与段落向量相乘时，它们应该具有相同的大小。单词的原始表示有大小(词汇大小 x 1)。执行哪种映射以获得正确的大小(段落尺寸 x 1) 在隐藏层中。当词向量没有存储时，这种映射是如何进行的呢？由于word2vec Parameter Learning中的方程26，我假设单词和段落表示在隐藏层中应该具有相同的大小。

最佳答案

是的，PV-DBOW可以使用word2vec Skip-gram模型轻松实现。

假设你有以下句子:

Children are running in the park

skip-gram 模型尝试在固定窗口上下文中预测周围的单词以学习单词向量。如果窗口大小为 2，目标如下:

word ->  context words to predict
--------------------------------
Children -> (are, running)
are -> (children, running, in)
running -> (children, are, in, the)
in -> (are, running, the, park)
the -> (running, in, park)
park -> (in, the)

现在，您可以简单地修改如何将单词 -> 上下文来预测数据提供给您的skip-gram实现，如下所示:

word ->  context words to predict
--------------------------------
PAR#33 -> (Children, are, running, in, the, park)

PAR#33，它只是模型的另一个词(相同长度)，实际上是代表整个段落(句子)的标记

这是一种带有“段落大小窗口”的skip-gram模型

关于machine-learning - doc2vec : How is PV-DBOW implemented，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36001230/

machine-learning - doc2vec : How is PV-DBOW implemented

上一篇：machine-learning - 为什么在目标检测中使用带有卷积神经网络的滑动窗口？

下一篇：machine-learning - OpenCL Theano - 如何强制禁用 CUDA？