我有一个具有以下架构的配置单元表:
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1,2,3] [r,t,null] [2,1,null]
我如何规范化数组,以便得到以下结果
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1] [r] [2]
1234123 [2] [t] [1]
1234123 [3] null null
我尝试了以下方法:
select concat_ws('|',visid_high,visid_low) as cookie
,pid
,catid
,qty
from table
lateral view explode(productid) ptable as pid
lateral view explode(catalogId) ptable2 as catid
lateral view explode(qty) ptable3 as qty
但是结果是笛卡尔积。
最佳答案
您可以使用来自Brickhouse(http://github.com/klout/brickhouse)的numeric_range
和array_index
UDF来解决此问题。在http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/上有一篇翔实的博客文章,详细描述了
使用这些UDF,查询将类似于
select cookie,
array_index( product_id_arr, n ) as product_id,
array_index( catalog_id_arr, n ) as catalog_id,
array_index( qty_id_arr, n ) as qty
from table
lateral view numeric_range( size( product_id_arr )) n1 as n;
关于hive - hive explode/横向查看多个阵列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20667473/