hadoop - 无法获取行中array <struct <string,string >>的avro数据

标签 hadoop hive avro hive-serde

嗨,我有一个具有struct数组的avro模式,并且能够将数据另存为avro。但是在从中检索数据时

array<struct<string, string>>

我无法进入行。我在单行中获取的所有数据。

这是表的定义

CREATE EXTERNAL TABLE meterevents ROW FORMAT SERDE org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/......' TBLPROPERTIES ('avro.schema.url'='/..../schema.avsc');


hive 表结构

nametype                struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>       from deserializer
names                   struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>  from deserializer
enddeviceeventdetails   struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>      from deserializer
enddeviceevent          struct<mrid:string,createddatetime:string,issuerid:string,issuertrackingid:string,reason:string,severity:string,userid:string,asset:struct<assetmrid:string,assetnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>,enddeviceeventdetails:array<struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>>,enddeviceeventtype:string,enddeviceeventnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>,status:struct<statusdatetime:string,statusreason:string,statusremark:string,statusvalue:string>,usagepoint:struct<usagepointmrid:string,usagepointnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>       from deserializer
enddeviceeventtype      struct<enddeviceeventtypemrid:string,enddeviceeventtypedomain:string,enddeviceeventtypeeventoraction:string,enddeviceeventtypesubdomain:string,type:string,enddeviceeventtypenames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>       from deserializer
header                  struct<noun:string,context:string,verb:string,value:string,source:string,timestamp:string,correlationid:string,name:string,messageid:string,property:struct<propertyname:array<string>,propertyvalue:array<string>>>    from deserializer
payload                 struct<enddeviceevents:array<struct<mrid:string,createddatetime:string,issuerid:string,issuertrackingid:string,reason:string,severity:string,userid:string,asset:struct<assetmrid:string,assetnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>,enddeviceeventdetails:array<struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>>,enddeviceeventtype:string,enddeviceeventnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>,status:struct<statusdatetime:string,statusreason:string,statusremark:string,statusvalue:string>,usagepoint:struct<usagepointmrid:string,usagepointnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>>,enddeviceeventtype:array<struct<enddeviceeventtypemrid:string,enddeviceeventtypedomain:string,enddeviceeventtypeeventoraction:string,enddeviceeventtypesubdomain:string,type:string,enddeviceeventtypenames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>>


我在查询中使用“横向 View 爆炸”选项

select eddetails.enddeviceeventdetailsname,	eddetails.enddeviceeventdetailsvalue 
FROM meterevents_tmp 
LATERAL VIEW explode(payload.enddeviceevents.enddeviceeventdetails)  ed AS eddetails 
limit 1;


但我仍然在单行中获取数据。

enddeviceeventdetailsname       enddeviceeventdetailsvalue
["EventSequenceNumber","EventSequenceNumber","EventSequenceNumber","EventSequenceNumber"]       ["683","684","685","686"


我想将这些数据作为

enddeviceeventdetailsname       enddeviceeventdetailsvalue
EventSequenceNumber              683
EventSequenceNumber              684
EventSequenceNumber              685
EventSequenceNumber              686


我已经阅读了关于stackoverflow的另一个问题:Exploding Array of Struct using HiveQL

但是无法获得预期的输出。因为在那篇文章中,它的配置单元外部表而不是SERDE,所以我无法指定“MAP KEYS TERMINATED BY”和“COLLECTION ITEMS TERMINATED BY”

任何帮助深表感谢。

谢谢

最佳答案

我能够解决这个问题

我无法按行获取输出,因为

array<struct<string,string>> 

是父数组的一部分
array<struct<array<struct<string, string>>>

我更新了查询并使用了嵌套爆炸
select eddetails.enddeviceeventdetailsname, eddetails.enddeviceeventdetailsvalue from (select ede.enddeviceeventdetails FROM meterevents_tmp LATERAL VIEW explode(payload.enddeviceevents)  e AS ede) t LATERAL VIEW explode(t.enddeviceeventdetails)  ed AS eddetails limit 10;

我得到了期望的输出-

enddeviceeventdetailsname       enddeviceeventdetailsvalue
EventSequenceNumber     683
EventSequenceNumber     684
EventSequenceNumber     685
EventSequenceNumber     686

关于hadoop - 无法获取行中array <struct <string,string >>的avro数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41773681/

相关文章:

hive - Sqoop、Avro 和 Hive

java - 未应用 boolean 值的 avro 默认值

java - Mapreduce Hadoop作业异常输出目录已存在

java - 从 POJO 到 Avro Record 的通用转换

java - Eclipse Hadoop HDFS 异常

hadoop - Hadoop简单Java程序-与MongoDB的交互

hadoop - 是否需要将 map /简化工作简化为数据集的计算机平均值?

hadoop - 在 mapreduce 作业中恢复黑名单 tasktrackers

hadoop - 是否有用于 CSV 文件的 Hive SerDe 以从文件头推断架构

hadoop - 使用 lazy simple serde hive 替换引号