python - 使用 protobuf 从 tfrecords 文件中提取图像而不运行 TensorFlow session

标签 python tensorflow protocol-buffers

我在 Python 中使用 TensorFlow，并将数据存储在包含 tf.train.Example Protocol Buffer 的 TFRecords 文件中。我试图提取每个示例中存储的字段(在下面的代码示例中，这些字段是 height、width、image)，而无需需要运行 TensorFlow session 。通过反复试验，我发现以下代码可以正常工作:

import numpy as np
import tensorflow as tf

def _im_feature_to_im(example, key):
    feature_ser = example.features.feature[key].bytes_list.SerializeToString()
    feature_ser_clean = feature_ser[4:]
    image = np.fromstring(feature_ser_clean, dtype=np.uint8).reshape((height, width))
    return image

for serialized_example in tf.python_io.tf_record_iterator(tfrec_filename):
    example = tf.train.Example()
    example.ParseFromString(serialized_example)
    # traverse the Example format to get data
    height = example.features.feature['height'].int64_list.value[0]
    width = example.features.feature['width'].int64_list.value[0]
    image = _im_feature_to_im(example, 'image')

所以: int 字段很容易提取。但我的问题是关于图像的提取:为什么我必须从字节数组的开头删除 4 个字节才能获取原始图像？那里有一些标题吗？

最佳答案

这是 Protocol Buffer 编码的关键。

https://developers.google.com/protocol-buffers/docs/encoding

您可以将其打印出来并按照上述网站上的说明进行解码。最有可能的是 tag = 1、type = 2、length = height * width 的某种编码。

希望有帮助!

雪利酒

关于python - 使用 protobuf 从 tfrecords 文件中提取图像而不运行 TensorFlow session ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39795064/