python - Tensorflow 数据集上的分片操作是确定性的吗？

Tensorflow 数据集具有分片 操作，可创建给定数据集的唯一子集。

我们可以使用它来分区数据集，如下所示:

import tensorflow as tf
source_dataset = tf.data.Dataset.range(100)

number_of_partitions = 4
subset_one = source_dataset.shard(number_of_partitions, 0)
subset_two = source_dataset.shard(number_of_partitions, 1)
subset_three = source_dataset.shard(number_of_partitions, 2)

这个分区是确定性的吗？即上面的 3 个子集总是被赋予相同的元素？

documentation关于分片的说明如下:

Creates a Dataset that includes only 1/num_shards of this dataset.

This dataset operator is very useful when running distributed training, as it allows each worker to read a unique subset.

最佳答案

是的，绝对。这是一个确定性操作。

在上面的示例中，subset_one 包含第一个元素、第五个元素等 ([0,4,8, ...])，而 subset_two 包含 [1,5,9, ...] 等等。

关于python - Tensorflow 数据集上的分片操作是确定性的吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56992725/

上一篇：python - Predict_proba 不适用于我的高斯混合模型(sklearn，python)

下一篇：python - 查找句子中的字典值并输出句子、键

相关文章：

python - 我如何判断 tf op 是否具有梯度？

python - Tensorflow 非极大值抑制

python - 将图像另存为numpy数组

python - 理解 strip()

android - Tensorflow 移动应用程序 : Not a valid TensorFlow Graph serialization: NodeDef mentions attr 'dilations' not in Op

tensorflow - keras 中的 BatchNormalization 如何工作？

tensorflow - 如果使用 keras 在较小尺寸的图像上训练模型，我如何预测较大尺寸的图像

python - 来自网络的 Qt 图片

python - 从文件中读取变音符号并将其插入 XML

python - pymongo.errors.ServerSelectionTimeoutError :localhost:27017:[WinError 10061]No connection could be made because the target machine actively refused it