tensorflow - 如何获取tf.data.dataset的形状？

标签 tensorflow machine-learning deep-learning tensorflow-datasets

我知道数据集有output_shapes，但它显示如下:

data_set: DatasetV1Adapter shapes: {item_id_hist: (?, ?), tags: (?, ?), client_platform: (?,), entrance: (?,), item_id: (?,), lable: (?,), mode: (?,), time: (?,), user_id: (?,)}, types: {item_id_hist: tf.int64, tags: tf.int64, client_platform: tf.string, entrance: tf.string, item_id: tf.int64, lable: tf.int64, mode: tf.int64, time: tf.int64, user_id: tf.int64}

如何获取我的数据总数？

最佳答案

如果长度已知，您可以调用:

tf.data.experimental.cardinality(dataset)

但是如果失败了，重要的是要知道 TensorFlow Dataset(通常)是延迟评估的，因此这意味着在一般情况下，我们可能需要迭代每条记录，然后才能求数据集的长度。

例如，假设您启用了 eagerexecution，并且它是一个适合内存的小型“玩具”数据集，您只需将其枚举到一个新列表中并获取最后一个索引(然后添加 1因为列表是零索引的):

dataset_length = [i for i,_ in enumerate(dataset)][-1] + 1

当然，这充其量是低效的，并且对于大型数据集，将完全失败，因为所有内容都需要适合列表的内存。在这种情况下，除了迭代记录并进行手动计数之外，我看不到任何其他选择。

关于tensorflow - 如何获取tf.data.dataset的形状？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56218014/

上一篇：python - 为什么我们将 nn.Module 作为参数传递给神经网络的类定义？

下一篇：python - 使用python将列文本数据转换为特征以用于机器学习

python - Scikits-学习 : Use custom vocabulary together with Pipeline

python - XGBoost 预测输出不是二进制的？

machine-learning - 为什么在 tensorflow 中的 tf.nn.dropout 中对输入进行缩放？

tensorflow - 使用BERT预测下一句

python - Pip3 不会更新 Tensorflow

Tensorflow:估计器中没有随 mean_squared_error 提供的梯度

python - 找不到匹配的函数来调用从 SavedModel 加载

api - Google Cloud Vision API 和移动视觉有什么区别？

python - 提取检测到的近似形状和边界框Mask RCNN