python - Tensorflow 负采样

我正在尝试遵循关于 tensorflow 的大胆教程，在该教程中我遇到了以下两行词嵌入模型:

  # Look up embeddings for inputs.
  embed = tf.nn.embedding_lookup(embeddings, train_dataset)
  # Compute the softmax loss, using a sample of the negative labels each time.
  loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, 
                        embed, train_labels, num_sampled, vocabulary_size))

现在我明白了，第二个语句是对负标签进行抽样。但问题是它怎么知道负面标签是什么？我提供的第二个函数是当前输入及其相应的标签以及我想要(负)采样的标签数量。从输入集本身抽样不存在风险吗？

这是完整的例子:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

最佳答案

您可以找到tf.nn.sampled_softmax_loss() 的文档 here . TensorFlow 甚至对候选抽样 提供了很好的解释 here (pdf) .

How does it know what the negative labels are?

TensorFlow 会在所有可能的类别中随机选择负类(对你来说，所有可能的词)。

Isn't there the risk of sampling from the input set in itself?

当您想要计算真实标签的 softmax 概率时，您可以计算:logits[true_label]/sum(logits[negative_sampled_labels]。由于类的数量很大(词汇表大小) , 将 true_label 采样为负标签的概率很小。
无论如何，我认为 TensorFlow 在随机抽样时完全消除了这种可能性。 (编辑:@Alex 确认 TensorFlow 默认执行此操作)

关于python - Tensorflow 负采样，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37671974/

上一篇：python - '>>'在Python字节码中的含义

下一篇：python - 如何更改 Keras 中 softmax 输出的温度

tensorflow - 输出层在自定义估计器中没有激活函数

python - 如何使用FixedLengthRecordReader读取Tensorflow中的自定义数据格式？

python - 来自keras模型中图像列表的TensorFlow数据集

python - 如何找到旋转图像边界框的新坐标以修改其xml文件以进行Tensorflow数据增强？

python - 根据字典中是否存在过滤 numpy 数组

python - 如何在 python 中从 json 获取输入的随机回复

python - 像 JavaScript encodeURIComponent() 那样在 python 中编码 SVG 图像

python - 如何在python中左连接2个数据帧，如果过滤后第二个数据帧中有多个匹配行，则与第一行连接

python - @tf.function 中的 if-else