我正在尝试使用 Java 将 JSON 转换为 parquet 格式,但出现异常。
输入 JSON:
{"list": [ {"mainBearingX": 0.178334,
"gearBoxZ": 0.03885,
"_t": 1560305236290000,
"mainBearingZ": 0.034438,
"gearBoxX": 0.035738,
"mainBearingY": 0.029445,
"gearBoxY": 0.040929,
"generatorX": 0.776837,
"generatorY": 0.124234,
"ts_id":"t1"
},
{"mainBearingX": 0.169478,
"gearBoxZ": 0.008242,
"_t": 1560305236311000,
"mainBearingZ": 0.007531,
"gearBoxX": 0.025647,
"mainBearingY": 0.029445,
"gearBoxY": 0.026282,
"generatorX": 0.770189,
"generatorY": 0.117464,
"ts_id": "t1"
}
]
}
代码:
public static void toConvert(OutPut output) {
String inputFile = "test.parquetFile";
Path dataFile = new Path(inputFile);
Schema schema = ReflectData.AllowNull.get().getSchema(OutPut.class);
try (ParquetWriter<OutPut> writer = AvroParquetWriter.<OutPut>builder(dataFile)
.withSchema(schema)
.withDataModel(ReflectData.get())
.withConf(new Configuration())
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withWriteMode(Mode.OVERWRITE)
.build()) {
} catch (IOException e) {
e.printStackTrace();
}
public class OutPut {
List<Map<String, Object>> list;
}
异常(exception):
Exception in thread "main" org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: required group value {}
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:273)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:494)
最佳答案
问题是您的 OutPut
类型包含 Object
类型作为 Map
中的值类型:
public class OutPut {
List<Map<String, Object>> list;
}
您正在使用 ReflectData
通过内省(introspection)来推断您的类型的 Avro 架构。但是,它无法从 Object
类型推断出任何有用的信息。
如果您更改 OutPut
的定义以使用具体类型,例如:
public class OutPut {
List<Map<String, Double>> list;
}
那么它应该可以工作。
关于java - 在 Java 中将 JSON 转换为 parquet,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57430625/