是否可以使用Nifi将json文件加载到结构化表中?
我调用了以下天气预报数据(来自 6000 个气象站),目前正在将其加载到 HDFS 中。所有内容都显示在一行上:
{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2017-01-12T22:00:00Z","type":"Forecast","Location":[{"i":"14","lat":"54.9375","lon":"-2.8092","name":"CARLISLE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"50.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"WNW","F":"-3","G":"25","H":"67","Pp":"0","S":"13","T":"2","V":"EX","W":"1","U":"1","$":"720"}}},{"i":"22","lat":"53.5797","lon":"-0.3472","name":"HUMBERSIDE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"24.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"NW","F":"-2","G":"43","H":"63","Pp":"3","S":"25","T":"4","V":"EX","W":"3","U":"1","$":"720"}}}, .....
理想情况下,我希望架构结构为 6000 行表。
我尝试编写一个模式将上述内容传递给 Pig,但没有成功,可能是因为我对 json 不够熟悉,无法正确翻译它。
寻找一种向数据添加一些结构的简单方法,我发现 Nifi 中有一个 PutHBaseJson 处理器。
任何人都可以建议这个 PutHBaseJson 处理器是否可以使用上述数据结构吗?如果是这样,有人能给我指点一个像样的教程来给我一个配置的起点吗?
非常感谢任何指导。
最佳答案
您可能想使用SplitJson
处理器将 6000 条记录 JSON 结构拆分为 6000 个单独的流文件。如果您需要从顶级响应“注入(inject)”参数定义,您可以执行 ReplaceText
或JoltTransformJSON
操作单个 JSON 记录。这是good article由 Yolanda Davis 描述如何在 NiFi 中执行 Jolt 转换(JSON -> JSON)。
一旦您拥有包含单个 JSON 记录的各个流文件,将它们放入 HBase 中就非常容易了。布莱恩·本德 (Bryan Bende) 写了一篇 article describing the necessary configurations对于 PutHBaseJson 处理器。
关于json - 使用 Nifi 构建摄取的 json 数据的可能性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41625322/