nlp - word2vec:CBOW 和skip-gram 相对于训练数据集大小的性能

问题很简单。 CBOW 和skip-gram 哪一个更适合大数据集？ (小数据集的答案如下。)

我很困惑，因为米科洛夫本人，[Link]

Skip-gram: works well with small amount of the training data, represents well even rare words or phrases.

CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words

但是，根据 Google TensorFlow，[Link]

CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation). For the most part, this turns out to be a useful thing for smaller datasets.

However, skip-gram treats each context-target pair as a new observation, and this tends to do better when we have larger datasets. We will focus on the skip-gram model in the rest of this tutorial.

这里有一篇 Quora 帖子支持第一个想法 [Link] ，然后还有另一篇 Quora 帖子提出了第二种想法 [Link] --两者似乎都源自上述可靠来源。

或者是像米科洛夫所说的那样:

Overall, the best practice is to try few experiments and see what works the best for you, as different applications have different requirements.

但是对于这个问题肯定有一个经验或分析的结论或最终说法吗？

最佳答案

当 Mikolov 表示 CBOW 适用于较大的数据集而 SG 适用于较小的数据集时，我认为考虑了数据量。由于 CBOW 考虑一个目标词和许多上下文词，因此与 SG 中使用的数据集相比，它需要更大的数据集来训练目标向量。反之亦然，在 SG 中，由于单个上下文单词的目标单词较多，因此需要较小的数据集。

Google Tensor Flow 讨论的是数据集中单词的分布，用于生成高质量向量，而不是使用的数据集的数量。由于 CBOW 模型更多地考虑句子中所有目标词的相同上下文词，因此需要更大的(分布式)数据集，对于 SG 反之亦然。

总的来说，它们的含义相同:

CBOW 模型 = 句子较短但样本数量较多的数据集(较大的数据集)
SG 模型 = 长句子和少量样本的数据集(较小的数据集)

关于nlp - word2vec:CBOW 和skip-gram 相对于训练数据集大小的性能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39224236/

nlp - word2vec:CBOW 和skip-gram 相对于训练数据集大小的性能

上一篇：java - 如何获取firebase中child的child列表？

下一篇：vba - 确定实际传递给 VBA 函数的(可选)参数有多少？