我正在尝试通过Hive Sink将JSON数据加载到Hive。
但是它失败,并显示以下错误:
WARN org.apache.hive.hcatalog.data.JsonSerDe: Error [java.io.IOException: Field name expected] parsing json text [{"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}].
INFO org.apache.flume.sink.hive.HiveWriter: Parse failed : Unable to convert byte[] record into Object : {"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}
数据示例:
{"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}
hive 表说明:
id string
url string
title array<string>
time string
# Partitions
time string
如果JSON数据不包含数组(以及Hive表),则同样可以正常工作。
Flume版本:1.7.0(Cloudera CDH 5.10)
是否可以通过Flume Hive接收器使用数组加载JSON数据?
最佳答案
是否可以通过Flume Hive接收器使用数组加载JSON数据?
尽管我从未尝试过,但我认为这是有可能的。从:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s08s02.html
Following serializers are provided for Hive sink:
JSON: Handles UTF8 encoded Json (strict syntax) events and requires no configuration. Object names in the JSON are mapped directly to columns with the same name in the Hive table. Internally uses org.apache.hive.hcatalog.data.JsonSerDe but is independent of the Serde of the Hive table. This serializer requires HCatalog to be installed.
因此,也许您在SerDe中实现了错误。该用户通过执行先前的regexp解决了使用数组序列化JSON的问题:
Parse json arrays using HIVE
您可能要尝试的另一件事是更改SerDe。至少您有两个选择(也许还有更多选择):
(https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master)
关于hadoop - Flume Hive Sink无法使用数组序列化JSON,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42393240/