我正在尝试为以下 json 构建一个 avro 模式:
{
"id":1234,
"my_name_field": "my_name",
"extra_data": {
"my_long_value": 1234567890,
"my_message_string": "Hello World!",
"my_int_value": 777,
"some_new_field": 1
}
}
“id”和“my_name_field”的值是已知的,但“extra_data”中的字段动态变化且未知。
我想到的 avro 模式是:
{
"name":"my_record",
"type":"record",
"fields":[
{"name":"id", "type":"int", "default":0},
{"name":"my_name_field", "type":"string", "default":"NoName"},
{ "name":"extra_data", "type":{"type":"map", "values":["null","int","long","string"]} }
]
}
我的第一个想法是让“extra_data”成为带有 map 的记录,但这行不通:
{ "name":"extra_data", "type":{"type":"map", "values":["null","int","long","string"]} }
我得到:
AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
apache 在 https://cwiki.apache.org/confluence/display/Hive/AvroSerDe 中提供了一些很好的例子,但似乎没有一个能完成这项工作。
这是我要检查的单元测试:
公共(public)类 AvroTest {
@Test
public void readRecord() throws IOException {
String event="{\"id\":1234,\"my_name_field\":\"my_name\",\"extra_data\":{\"my_long_value\":1234567890,\"my_message_string\":\"Hello World!\",\"my_int_value\":777,\"some_new_field\":1}}";
SchemaRegistry<Schema> registry = new com.linkedin.camus.schema.MySchemaRegistry();
DecoderFactory decoderFactory = DecoderFactory.get();
ObjectMapper mapper = new ObjectMapper();
GenericDatumReader<GenericData.Record> reader = new GenericDatumReader<GenericData.Record>();
Schema schema = registry.getLatestSchemaByTopic("record_topic").getSchema();
reader.setSchema(schema);
HashMap hashMap = mapper.readValue(event, HashMap.class);
long now = Long.valueOf(hashMap.get("now").toString())*1000;
GenericData.Record read = reader.read(null, decoderFactory.jsonDecoder(schema, event));
}
非常感谢这方面的帮助, 谢谢。
最佳答案
如果额外数据字段列表确实未知,使用多个可选值字段可能会有所帮助,如下所示:
{
"name":"my_record",
"type":"record",
"fields":[
{"name":"id", "type":"int", "default":0},
{"name":"my_name_field", "type":"string", "default":"NoName"},
{"name":"extra_data", "type": "array", "items": {
{"name": "extra_data_entry", "type":"record", "fields": [
{"name":"extra_data_field_name", "type": "string"},
{"name":"extra_data_field_type", "type": "string"},
{"name":"extra_data_field_value_string", "type": ["null", "string"]},
{"name":"extra_data_field_value_int", "type": ["null", "int"]},
{"name":"extra_data_field_value_long", "type": ["null", "long"]}
]}
}}
]
}
然后您可以根据该字段的 extra_data_field_type
选择 extra_data_field_value_*
值。
关于json - 为简单的 json 创建 Avro 模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22168453/