hadoop - Proc Transpose 的 SAS Hive SQL (Hadoop) 版本?

标签 hadoop sas hiveql transpose proc

我想知道 SAS Hive SQL (Hadoop) 中是否有“Proc Transpose”版本?

否则我可以看到唯一的其他(冗长的)方法是创建许多单独的表然后重新连接在一起,我宁愿避免这种情况。

欢迎任何帮助!

要转置的示例表> 打算将月份放在表格的顶部,以便按月拆分费率:

+------+-------+----------+----------+-------+
| YEAR | MONTH |   Geog   | Category | Rates |
+------+-------+----------+----------+-------+
| 2018 |     1 | National | X        |    32 |
| 2018 |     1 | National | Y        |    43 |
| 2018 |     1 | National | Z        |    47 |
| 2018 |     1 | Regional | X        |    52 |
| 2018 |     1 | Regional | Y        |    38 |
| 2018 |     1 | Regional | Z        |    65 |
| 2018 |     2 | National | X        |    63 |
| 2018 |     2 | National | Y        |    14 |
| 2018 |     2 | National | Z        |    34 |
| 2018 |     2 | Regional | X        |    90 |
| 2018 |     2 | Regional | Y        |    71 |
| 2018 |     2 | Regional | Z        |    69 |
+------+-------+----------+----------+-------+

示例输出:

+------+----------+----------+----+----+
| YEAR |   Geog   | Category | 1  | 2  |
+------+----------+----------+----+----+
| 2018 | National | X        | 32 | 63 |
| 2018 | National | Y        | 43 | 14 |
| 2018 | National | Z        | 47 | 34 |
| 2018 | Regional | X        | 52 | 90 |
| 2018 | Regional | Y        | 38 | 71 |
| 2018 | Regional | Z        | 65 | 69 |
+------+----------+----------+----+----+

最佳答案

用于转置(或旋转)的典型wallpaper SQL 技术是group+transform to pivot case statements group 中的子查询折叠子查询的聚合查询。该组代表单个结果数据透视行。

比如你的组是year, geog, categorymin 用于折叠:

proc sql;
  create view want_pivot as
  select year, geog, category
  , min(rate_m1) as rate_m1
  , min(rate_m2) as rate_m2
  from
  ( select
    year, geog, category
    , case when month=1 then rates end as rate_m1
    , case when month=2 then rates end as rate_m2
    from have
  )
  group by year, geog, category
  ;

这里是相同的概念,更通用一点,其中数据在组内在细节级别重复,mean 用于折叠重复。

data have;
input id name $ value;
datalines;
1 a 1
1 a 2 
1 a 3
1 b 2
1 c 3
2 a 2
2 d 4
2 b 5
3 e 1
run;

proc sql;
  create view have_pivot as 
  select
  id
  , mean(a) as a
  , mean(b) as b
  , mean(c) as c
  , mean(d) as d
  , mean(e) as e
  from
  (
    select
      id
      , case when name='a' then value end as a
      , case when name='b' then value end as b
      , case when name='c' then value end as c
      , case when name='d' then value end as d
      , case when name='e' then value end as e
    from have
  )
  group by id
;
quit;

当列名不是先验已知时,您将需要编写一个代码生成器来传递所有数据以确定名称值,编写墙纸查询,该查询将对数据执行第二次传递以返回数据透视表。

此外,许多当代数据库都有一个 PIVOT 子句,可以通过传递来利用它。

Hadoop Mania post "TRANSPOSE/PIVOT a Table in Hive"以类似的墙纸方式显示 collect_listmap 的使用:

select b.id, b.code, concat_ws('',b.p) as p, concat_ws('',b.q) as q, concat_ws('',b.r) as r, concat_ws('',b.t) as t from
 (select id, code,
 collect_list(a.group_map['p']) as p,
 collect_list(a.group_map['q']) as q,
 collect_list(a.group_map['r']) as r,
 collect_list(a.group_map['t']) as t
 from ( select
  id, code,
  map(key,value) as group_map
  from test_sample
 ) a group by a.id, a.code) b;

关于hadoop - Proc Transpose 的 SAS Hive SQL (Hadoop) 版本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53478106/

相关文章:

hadoop - 获取空指针异常

excel-2007 - 64位SAS中使用libname语句与32位Excel交互

sql - 对 Hive 表执行验证和检查(可能不是重复的)

hadoop - Sqoop失败,并带有密码文件参数

hadoop - 在 docker swarm 外部访问 hdfs

macros - SAS - 从条件 1 到条件 2 循环

date - 如何在SAS中将日期格式d.m.yyyy转换为dd.mm.yyyy?

hadoop - 如何在Hive 3中启用LLAP?

hadoop - 从Hive加载数据到ES时获取 “EsHadoopIllegalArgumentException: Index [petrol/petrolCat] missing and settings”

hadoop - 无法在 hadoop 0.23.0 上启动节点管理器