我有一个 hive 表
col1 col2
1 ["apple", "orange"]
1 ["orange", "banana"]
1 ["mango"]
2 ["apple"]
2 ["apple", "orange"]
有数据类型是col1 int
col2 array<string>
我想查询类似的东西:select col1, concat(col2) from table group by col1;
输出应为:1 ["apple", "orange", "banana", "mango"]
2 ["apple", "orange"]
hive 中有任何功能可以做到这一点吗?我也将此数据写入csv,当我将其作为数据帧读取时,将col2 dtype作为
object
。有没有办法将其输出为array
。
最佳答案
尝试分解数组,然后通过按collect_set
分组使用 col1
函数。
Example:
Input:
select * from table;
OK
dd.col1 dd.col2
1 ["apple","orange"]
1 ["mango"]
1 ["orange","banana"]
select col1,collect_set(tt1)col2 from (
select * from table lateral view explode(col2) tt as tt1
)cc
group by col1;
Output:
col1 col2
1 ["apple","orange","mango","banana"]
关于sql - 配置单元查询中的串联,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63491772/