hadoop - Flume Hive Sink无法使用数组序列化JSON

我正在尝试通过Hive Sink将JSON数据加载到Hive。
但是它失败，并显示以下错误:

WARN org.apache.hive.hcatalog.data.JsonSerDe: Error [java.io.IOException: Field name expected] parsing json text [{"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}].
INFO org.apache.flume.sink.hive.HiveWriter: Parse failed : Unable to convert byte[] record into Object  : {"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}

数据示例:

{"id": "12345", "url": "https://mysite", "title": ["MyTytle"]}

hive 表说明:

id              string                                      
url             string                                      
title           array<string>                               
time            string                                      

# Partitions
time            string

如果JSON数据不包含数组(以及Hive表)，则同样可以正常工作。

Flume版本:1.7.0(Cloudera CDH 5.10)

是否可以通过Flume Hive接收器使用数组加载JSON数据？

最佳答案

是否可以通过Flume Hive接收器使用数组加载JSON数据？

尽管我从未尝试过，但我认为这是有可能的。从:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s08s02.html

Following serializers are provided for Hive sink:

JSON: Handles UTF8 encoded Json (strict syntax) events and requires no configuration. Object names in the JSON are mapped directly to columns with the same name in the Hive table. Internally uses org.apache.hive.hcatalog.data.JsonSerDe but is independent of the Serde of the Hive table. This serializer requires HCatalog to be installed.

因此，也许您在SerDe中实现了错误。该用户通过执行先前的regexp解决了使用数组序列化JSON的问题:

Parse json arrays using HIVE

您可能要尝试的另一件事是更改SerDe。至少您有两个选择(也许还有更多选择):

'org.apache.hive.hcatalog.data.JsonSerDe'

'org.openx.data.jsonserde.JsonSerDe'
(https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master)

关于hadoop - Flume Hive Sink无法使用数组序列化JSON，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42393240/

hadoop - Flume Hive Sink无法使用数组序列化JSON

上一篇：docker - 在 Synology 上使用 Docker 在 Confluence 上启用 SSL

下一篇：docker - 在后端 api 中获取 AWS API Gateway 请求 ID