sql - 配置单元查询中的串联

我有一个 hive 表

col1   col2
1     ["apple", "orange"]
1     ["orange", "banana"]
1     ["mango"]
2     ["apple"]
2     ["apple", "orange"]

有数据类型是

col1 int
col2 array<string>

我想查询类似的东西:

select col1, concat(col2) from table group by col1;

输出应为:

1    ["apple", "orange", "banana", "mango"]
2    ["apple", "orange"]

hive 中有任何功能可以做到这一点吗？
我也将此数据写入csv，当我将其作为数据帧读取时，将col2 dtype作为object。有没有办法将其输出为array。

最佳答案

尝试分解数组，然后通过按collect_set分组使用 col1 函数。
Example:
Input:

select * from table;
OK
dd.col1 dd.col2
1       ["apple","orange"]
1       ["mango"]
1       ["orange","banana"]

select col1,collect_set(tt1)col2 from (
   select * from table lateral view explode(col2) tt as tt1
)cc 
group by col1;

Output:

col1    col2
1       ["apple","orange","mango","banana"]

关于sql - 配置单元查询中的串联，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63491772/

相关文章：

php - WAMP Mysql - [1045] 用户访问被拒绝