我想解析嵌套的 avro 文件并希望将其加载到 HIVE 表中(HIVE 表可以是嵌套表)。
我的 AVRO 架构如下所示
{
"type" : "record",
"name" : "NTTObject",
"namespace" : "com.test.ntt",
"fields" : [ {
"name" : "header",
"type" : {
"type" : "map",
"values" : {
"type" : "string",
"avro.java.string" : "String"
},
"avro.java.string" : "String"
},
"default" : { }
}, {
"name" : "body",
"type" : {
"type" : "string",
"avro.java.string" : "String"
},
"default" : ""
} ]
}
示例数据如下所示
{"objectKey":"trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8"} {"StatDataRequest":{"protocolVersion":"1","platform":"Android","format":"Detailed","deviceid":"0384bdr311-32w5b-49aa-a814-379256f80ca8","stats":{"clientStat":[{"contentActionStat":{"progid":"56aa31a135d1c95d77f70b533289dfc3","gen1re":"Sports/Auto/Racing/High-Def/Events/Series/Live","rating":"0","vendor":"1 1 877U3 50B","vod":"false","ppv":"false","series":"true","title":"Test Prix, Practice","description":"\"Test Prix, Practice\"","recordDate":"2016-05-26T12:00:00Z","channel":"220","channel_name":"NBCSHD","TMSID":"ABCD5544671291","channel_minor":"0","hd":"false","contentAction":"Streaming_Started","clientMode":"UNKNOWN","timestamp":"2016-05-27T03:00:28.686Z","errorReason":"36100530"}},{"contentActionStat":{"progid":"56aa31a135d1c95d77f70b533289dfc3","gen1re":"Sports/Auto/Racing/High-Def/Events/Series/Live","rating":"0","vendor":"1 1 875E3 50B","vod":"false","ppv":"false","series":"true","title":"Test Prix, Practice","description":"\"Test Prix, Practice\"","recordDate":"2016-05-26T12:00:00Z","channel":"220","channel_name":"NBCSHD","TMSID":"ABCD5544671291","channel_minor":"0","hd":"false","contentAction":"Streaming_Stopped","clientMode":"UNKNOWN","durationSeconds":"3172","timestamp":"2016-05-27T03:53:20.077Z","errorReason":"36100530"}}]}}}
上述示例数据的预期输出(其中 PIPE (|) 我已将其视为列分隔符
trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8|1|Android|Detailed|0384bdr311-32w5b-49aa-a814-379256f80ca8|56aa31a135d1c95d77f70b533289dfc3|Sports/Auto/Racing/High-Def/Events/Series/Live|0|1 1 877U3 50B|false|false|true|Test Prix, Practice|\"Test Prix, Practice\"|2016-05-26T12:00:00Z|220|NBCSHD|ABCD5544671291|0|false|Streaming_Started|UNKNOWN||2016-05-27T03:00:28.686Z|36100530
trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8|1|Android|Detailed|0384bdr311-32w5b-49aa-a814-379256f80ca8|56aa31a135d1c95d77f70b533289dfc3|Sports/Auto/Racing/High-Def/Events/Series/Live|0|1 1 877U3 50B|false|false|true|Test Prix, Practice|\"Test Prix, Practice\"|2016-05-26T12:00:00Z|220|NBCSHD|ABCD5544671291|0|false|Streaming_Started|UNKNOWN|3172|2016-05-27T03:53:20.077Z|36100530
任何 Java 或 Scala 中的小示例代码都会有所帮助
按照@SANN3的建议使用代码片段
import java.util.ArrayList;
import java.util.List;
import org.json.JSONArray;
import org.json.JSONObject;
public class GenieGo_AVRO_Parsing {
String jsonStr = "{\"objectKey\":\"trx/Android/2016-05-27/15-03-59/c496555a-940d-46eb-bc6a-21ae265ddf27\"} {\"StatDataRequest\":{\"protocolVersion\":\"1\",\"platform\":\"Android\",\"format\":\"Detailed\",\"deviceid\":\"c496555a-940d-46eb-bc6a-21ae265ddf27\",\"stats\":{\"clientStat\":[{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:44:43.511Z\"}},{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Finish\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"263\",\"timestamp\":\"2016-05-26T02:49:06.347Z\"}},{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Cancel\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:49:06.349Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:49:16.382Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Finish\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"254\",\"timestamp\":\"2016-05-26T02:53:30.368Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Cancel\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:53:30.373Z\"}}]}}}"; //Input JSON
String json1 = jsonStr.substring(0, jsonStr.indexOf("}")+1);
String json2 = jsonStr.substring(jsonStr.indexOf("}")+1);
String out = "", header = "";
JSONObject json = new JSONObject(json1);
header = header.concat(json.getString("objectKey")).concat("|");
json = new JSONObject(json2);
JSONObject StatDataRequest = json.getJSONObject("StatDataRequest");
header = header.concat(StatDataRequest.getString("protocolVersion")).concat("|");
header = header.concat(StatDataRequest.getString("platform")).concat("|");
header = header.concat(StatDataRequest.getString("format")).concat("|");
header = header.concat(StatDataRequest.getString("deviceid")).concat("|");
JSONObject stats = StatDataRequest.getJSONObject("stats");
JSONArray clientStatArr = stats.getJSONArray("clientStat");
List<String> keyList = new ArrayList<String>();
keyList.add("progid");
keyList.add("gen1re");
keyList.add("rating");
keyList.add("vendor");
keyList.add("vod");
keyList.add("ppv");
keyList.add("series");
keyList.add("title");
keyList.add("description");
keyList.add("recordDate");
keyList.add("channel");
keyList.add("channel_name");
keyList.add("TMSID");
keyList.add("channel_minor");
keyList.add("hd");
keyList.add("contentAction");
keyList.add("clientMode");
keyList.add("timestamp");
keyList.add("errorReason");
String row;
JSONObject clientStat, contentActionStat;
for (int i = 0; i < clientStatArr.length(); i++) {
clientStat = clientStatArr.getJSONObject(i);
contentActionStat = clientStat.getJSONObject("contentActionStat");
row = "";
for (String key : keyList) {
row = row.concat(contentActionStat.getString(key)).concat("|");
}
out = out.concat(header).concat(row).concat("\n");
}
System.out.println(out);
}
}
最佳答案
查找以下两个解决方案,它们将在使用所提供的数据时产生您期望的输出:
1。 jackson
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
public class ParserWithJackson {
private static final String COLUMN_DELIMITER = "|";
private static final String LINE_DELIMITER = "\r\n";
private ObjectMapper mapper = new ObjectMapper();
public String parse(String input) throws JsonProcessingException, IOException {
StringBuilder output = new StringBuilder();
String json1String = input.substring(0, input.indexOf("}") + 1).trim();
String json2String = input.substring(input.indexOf("}") + 1).trim();
JsonNode json1Tree = mapper.readTree(json1String);
JsonNode json2Tree = mapper.readTree(json2String);
JsonNode statDataRequestNode = json2Tree.get("StatDataRequest");
JsonNode statsNode = statDataRequestNode.get("stats");
JsonNode clientStatArray = statsNode.get("clientStat");
String header = createHeaderData(json1Tree, json2Tree);
for (JsonNode clientStatNode : clientStatArray) {
JsonNode contentActionStatNode = clientStatNode.get("contentActionStat");
output.append(header);
output.append(getContent(contentActionStatNode, "progid"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "gen1re"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "rating"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "vendor"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "vod"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "ppv"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "series"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "title"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "description"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "recordDate"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel_name"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "TMSID"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel_minor"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "hd"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "contentAction"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "clientMode"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "durationSeconds"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "timestamp"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "errorReason"));
output.append(LINE_DELIMITER);
}
return output.toString();
}
private Object getContent(JsonNode node, String key) {
JsonNode jsonNode = node.get(key);
if (jsonNode == null) {
return "";
}
return jsonNode.asText();
}
private String createHeaderData(JsonNode json1Tree, JsonNode json2Tree) {
StringBuilder builder = new StringBuilder();
builder.append(getContent(json1Tree, "objectKey"));
builder.append(COLUMN_DELIMITER);
JsonNode statDataRequestNode = json2Tree.get("StatDataRequest");
builder.append(getContent(statDataRequestNode, "protocolVersion"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "platform"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "format"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "deviceid"));
builder.append(COLUMN_DELIMITER);
return builder.toString();
}
public static void main(String[] args) throws IOException {
String input = "{\"objectKey\":\"trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8\"} {\"StatDataRequest\":{\"protocolVersion\":\"1\",\"platform\":\"Android\",\"format\":\"Detailed\",\"deviceid\":\"0384bdr311-32w5b-49aa-a814-379256f80ca8\",\"stats\":{\"clientStat\":[{\"contentActionStat\":{\"progid\":\"56aa31a135d1c95d77f70b533289dfc3\",\"gen1re\":\"Sports/Auto/Racing/High-Def/Events/Series/Live\",\"rating\":\"0\",\"vendor\":\"1 1 877U3 50B\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Test Prix, Practice\",\"description\":\"\\\"Test Prix, Practice\\\"\",\"recordDate\":\"2016-05-26T12:00:00Z\",\"channel\":\"220\",\"channel_name\":\"NBCSHD\",\"TMSID\":\"ABCD5544671291\",\"channel_minor\":\"0\",\"hd\":\"false\",\"contentAction\":\"Streaming_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-27T03:00:28.686Z\",\"errorReason\":\"36100530\"}},{\"contentActionStat\":{\"progid\":\"56aa31a135d1c95d77f70b533289dfc3\",\"gen1re\":\"Sports/Auto/Racing/High-Def/Events/Series/Live\",\"rating\":\"0\",\"vendor\":\"1 1 875E3 50B\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Test Prix, Practice\",\"description\":\"\\\"Test Prix, Practice\\\"\",\"recordDate\":\"2016-05-26T12:00:00Z\",\"channel\":\"220\",\"channel_name\":\"NBCSHD\",\"TMSID\":\"ABCD5544671291\",\"channel_minor\":\"0\",\"hd\":\"false\",\"contentAction\":\"Streaming_Stopped\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"3172\",\"timestamp\":\"2016-05-27T03:53:20.077Z\",\"errorReason\":\"36100530\"}}]}}}";
ParserWithJackson parser = new ParserWithJackson();
String output = parser.parse(input);
System.out.println(output);
}
}
依赖关系:
2。 JSON.org
import java.io.IOException;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
public class ParserWithJsonOrg {
private static final String COLUMN_DELIMITER = "|";
private static final String LINE_DELIMITER = "\r\n";
public String parse(String input) {
StringBuilder output = new StringBuilder();
String json1String = input.substring(0, input.indexOf("}") + 1).trim();
String json2String = input.substring(input.indexOf("}") + 1).trim();
JSONObject json1Tree = new JSONObject(json1String);
JSONObject json2Tree = new JSONObject(json2String);
JSONObject statDataRequestNode = json2Tree.getJSONObject("StatDataRequest");
JSONObject statsNode = statDataRequestNode.getJSONObject("stats");
JSONArray clientStatArray = statsNode.getJSONArray("clientStat");
String header = createHeaderData(json1Tree, json2Tree);
for (int i = 0; i < clientStatArray.length(); i++) {
JSONObject clientStatNode = clientStatArray.getJSONObject(i);
JSONObject contentActionStatNode = clientStatNode.getJSONObject("contentActionStat");
output.append(header);
output.append(getContent(contentActionStatNode, "progid"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "gen1re"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "rating"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "vendor"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "vod"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "ppv"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "series"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "title"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "description"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "recordDate"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel_name"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "TMSID"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "channel_minor"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "hd"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "contentAction"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "clientMode"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "durationSeconds"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "timestamp"));
output.append(COLUMN_DELIMITER);
output.append(getContent(contentActionStatNode, "errorReason"));
output.append(LINE_DELIMITER);
}
return output.toString();
}
private Object getContent(JSONObject jsonObject, String key) {
Object object = null;
try {
object = jsonObject.get(key);
} catch (JSONException e) {
object = "";
}
return object;
}
private String createHeaderData(JSONObject json1Tree, JSONObject json2Tree) {
StringBuilder builder = new StringBuilder();
builder.append(getContent(json1Tree, "objectKey"));
builder.append(COLUMN_DELIMITER);
JSONObject statDataRequestNode = json2Tree.getJSONObject("StatDataRequest");
builder.append(getContent(statDataRequestNode, "protocolVersion"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "platform"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "format"));
builder.append(COLUMN_DELIMITER);
builder.append(getContent(statDataRequestNode, "deviceid"));
builder.append(COLUMN_DELIMITER);
return builder.toString();
}
public static void main(String[] args) throws IOException {
String input = "{\"objectKey\":\"trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8\"} {\"StatDataRequest\":{\"protocolVersion\":\"1\",\"platform\":\"Android\",\"format\":\"Detailed\",\"deviceid\":\"0384bdr311-32w5b-49aa-a814-379256f80ca8\",\"stats\":{\"clientStat\":[{\"contentActionStat\":{\"progid\":\"56aa31a135d1c95d77f70b533289dfc3\",\"gen1re\":\"Sports/Auto/Racing/High-Def/Events/Series/Live\",\"rating\":\"0\",\"vendor\":\"1 1 877U3 50B\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Test Prix, Practice\",\"description\":\"\\\"Test Prix, Practice\\\"\",\"recordDate\":\"2016-05-26T12:00:00Z\",\"channel\":\"220\",\"channel_name\":\"NBCSHD\",\"TMSID\":\"ABCD5544671291\",\"channel_minor\":\"0\",\"hd\":\"false\",\"contentAction\":\"Streaming_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-27T03:00:28.686Z\",\"errorReason\":\"36100530\"}},{\"contentActionStat\":{\"progid\":\"56aa31a135d1c95d77f70b533289dfc3\",\"gen1re\":\"Sports/Auto/Racing/High-Def/Events/Series/Live\",\"rating\":\"0\",\"vendor\":\"1 1 875E3 50B\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Test Prix, Practice\",\"description\":\"\\\"Test Prix, Practice\\\"\",\"recordDate\":\"2016-05-26T12:00:00Z\",\"channel\":\"220\",\"channel_name\":\"NBCSHD\",\"TMSID\":\"ABCD5544671291\",\"channel_minor\":\"0\",\"hd\":\"false\",\"contentAction\":\"Streaming_Stopped\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"3172\",\"timestamp\":\"2016-05-27T03:53:20.077Z\",\"errorReason\":\"36100530\"}}]}}}";
ParserWithJsonOrg parser = new ParserWithJsonOrg();
String output = parser.parse(input);
System.out.println(output);
}
}
依赖关系:
关于java - 解析嵌套 avro 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38340457/