hadoop - Hive:在Hive SQL中转置的方法

我正在尝试在Hive中转置以下结构的数据集:

Id1  Id2 Event
 1    1   7
 2    2   3
 2    2   7
 3    3   8
 3    3   7
 1    2   3
 1    2   7

一些id组合有很多事件(接近20个唯一事件)，我需要将它们分别转换为20个列，以用于Id1和Id2的每个唯一组合，例如:

Id1 Id2 event1  event2  event3 event4 event5.......event20
1    1    7       
2    2    3        7
3    3    8        7
1    2    3        7

如果可能的话，我也想知道如何在不使用20 max()函数的情况下以以下形式转置:(这里，事件值为后缀，每次出现都将计为1)

Id1 Id2 event_7 event_3  event_8 ........
1    1    1       
2    2    1        1
3    3    1                1
1    2    1        1

非常感谢!

最佳答案

您可以尝试一下，看看是否可行吗？
我正在做的是，首先对id1的数据进行排名，因此对于Id1的重复值，该排名将为1,2。
然后使用等级并将其与“event_”连接以形成诸如“event_1”，“event_2”之类的内容。

以下有2个选项。 1)使用named_struct 2)使用to_map
我没有尝试过，所以可能存在语法问题，但是希望您能理解。

with data as (
    select 
        id1, id2, event, 
        row_number() over (partition by id1 order by id1) as rnk
    from table
    ),
collect_data as (   
    select id1, id2, collect_set(named_struct(concat("event_', rnk), event)) kv
    from data
    group by id1,id2
    )
select id1, id2, kv[0]['event_1'], kv[0]['event_2']...
from collect_data;

要么

with data as (
    select 
        id1, id2, event, 
        row_number() over (partition by id1 order by id1) as rnk
    from table
    ),
collect_data as (   
select id1, id2, to_map(concat('event_',rnk), event) as kv
from data
group by id1, id2
    )
select id1, id2, 
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3    
from collect_data;

关于hadoop - Hive:在Hive SQL中转置的方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60795214/

hadoop - Hive:在Hive SQL中转置的方法

上一篇：amazon-web-services - 由于hadoop用户 `File '/var/aws/emr/userData.json无法读取到ssh到胶开发端点

下一篇：mysql - 计算Apache Pig中的不同项目