从流中读取多个 protobuf 消息的 python 示例

我正在处理来自 spinn3r 的数据，它由序列化为字节流的多个不同的 protobuf 消息组成:

http://code.google.com/p/spinn3r-client/wiki/Protostream

“原型(prototype)流是 Protocol Buffer 消息流，根据 Google Protocol Buffer 规范在网络上编码为带有长度前缀的 varint。流包含三个部分: header 、有效负载和尾部标记。”

这似乎是 protobufs 的一个非常标准的用例。事实上，protobuf 核心发行版为 C++ 和 Java 提供了 CodedInputStream。但是，protobuf 似乎没有为 python 提供这样的工具——“内部”工具没有为这种外部使用设置:

https://groups.google.com/forum/?fromgroups#!topic/protobuf/xgmUqXVsK-o

所以...在我去拼凑一个 python varint 解析器和用于解析不同消息类型的流的工具之前:有人知道这方面的任何工具吗？

为什么它在 protobuf 中不见了？ (或者我只是没找到它？)

这对于 protobuf 来说似乎是一个很大的差距，尤其是与 thrift 的“传输”和“协议(protocol)”等效工具相比时。我的看法正确吗？

最佳答案

看起来其他答案中的代码可能是从 here 中提取的.在使用此文件之前检查许可证，但我设法使用如下代码让它读取 varint32:

import sys
import myprotocol_pb2 as proto
import varint # (this is the varint.py file)

data = open("filename.bin", "rb").read() # read file as string
decoder = varint.decodeVarint32          # get a varint32 decoder
                                         # others are available in varint.py

next_pos, pos = 0, 0
while pos < len(data):
    msg = proto.Msg()                    # your message type
    next_pos, pos = decoder(data, pos)
    msg.ParseFromString(data[pos:pos + next_pos])

    # use parsed message

    pos += next_pos
print "done!"

这是一个非常简单的代码，旨在加载由描述下一条消息大小的 varint32 分隔的单一类型的消息。

更新:也可以使用以下方法直接从 protobuf 库中包含此文件:

from google.protobuf.internal.decoder import _DecodeVarint32

关于从流中读取多个 protobuf 消息的 python 示例，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11484700/

从流中读取多个 protobuf 消息的 python 示例

上一篇：python - Django 单元测试客户端响应具有空上下文

下一篇：python - 按声明顺序遍历类成员