serialization - protobuf数据的存储格式是什么？

.proto 文件:

package lm;
message helloworld
{
     required int32 id = 1;
     required string str = 2;
     optional int32 opt = 3;
}

writer.cc 文件:

#include <iostream>
#include <string>
#include "lm.helloworld.pb.h"
#include <fstream>
using namespace std;

int main()
{
    lm::helloworld msg1;
    msg1.set_id(101000);
    msg1.set_str("helloworld,this is a protobuf writer");
    fstream output("log", ios::out | ios::trunc | ios::binary);
    string _data;
    msg1.SerializeToString(&_data);
    cout << _data << endl;
    if(!msg1.SerializeToOstream(&output))
    {
        cerr << "Failed to write msg" << endl;
        return -1;
    }
    return 0;
}

reader.cc 文件:

#include <iostream>
#include <fstream>
#include <string>
#include "lm.helloworld.pb.h"
using namespace std;

void ListMsg(const lm::helloworld & msg)
{
    cout << msg.id() << endl;
    cout << msg.str() << endl;
}

int main(int argc, char* argv[])
{
    lm::helloworld msg1;
    {
        fstream input("log", ios::in | ios::binary);
        if (!msg1.ParseFromIstream(&input))
        {
            cerr << "Failed to parse address book." << endl;
            return -1;
        }
    }

    ListMsg(msg1);
    return 0;
}

这是一个使用 protobuf 的简单读写器模型。但是日志中的内容是在 write.cc 文件中键入的可读字符串而不是“数字格式”，这是为什么？

最佳答案

The encoding is described here .

没有另一端出现的示例，这有点难以准确回答，但您所看到的有两种可能的解释:

你已经明确切换到TextFormat在你的代码中；这不太可能 - 事实上，TextFormat 的主要用途是调试等
更有可能的是，您只是在二进制文件中看到来自您的消息的文本数据；文本被编码为 UTF-8，因此如果您在文本编辑器中打开一个 protobuf 文件，它的片段将看起来足够可读以显示文件中的内容

真正的问题是:输出文件中的实际字节数是多少？如果它是这样的:

08-88-95-06-12-24-68-65-6C-6C-6F-77-6F-72-6C-64-2C-74-68-69-73-20-69-73-20-61-20-70-72-6F-74-6F-62-75-66-20-77-72-69-74-65-72

那么就是二进制格式；但请注意，其中大部分只是字符串 "helloworld,this is a protobuf writer" 的 UTF-8 - 它以绝对大小控制消息:

68-65-6C-6C-6F-77-6F-72-6C-64-2C-74-68-69-73-20-69-73-20-61-20-70-72-6F-74-6F-62-75-66-20-77-72-69-74-65-72

因此，如果您在任何文本编辑器中查看，它会在开头显示为一些乱码，然后是清晰易读的 helloworld,this is a protobuf writer。

这里的“二进制”是开头的位:

08-88-95-06-12-24

这是:

08: header: field 1, varint
88-95-06:作为 varint 的值(十进制)101000
12: header: field 2, length-prefixed
24:作为 varint 的值(十进制)36(字符串的长度，以字节为单位)

主要注意事项:

如果您的消息以文本为主，是的，即使是二进制形式，它的大部分看起来也是人类可读的
查看管理费用；它用 6 个字节来编码消息的全部其余部分，其中 3 个字节是数据(101000)——因此实际上只有 3 个字节作为开销丢失了；现在与 xml、json 等进行比较和对比，以了解 protobuf 正在做什么来帮助您

关于serialization - protobuf数据的存储格式是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17287779/

serialization - protobuf数据的存储格式是什么？

上一篇：ASP.NET 字符串资源到小写？

下一篇：emgucv - 试图加载格式不正确的程序。 (来自 HRESULT : 0x8007000B) Error 的异常