hive - 如何在 Hive 中获取数组中的前 n 个元素

标签 hive

我在Hive中使用split函数创建一个数组，如何从数组中获取前n个元素，我想遍历子数组

代码示例

select col1 from table
where split(col2, ',')[0:5]

'[0:5]'看起来像python风格，但在这里不起作用。

最佳答案

这是一种更简单的方法。有一个UDF here称为 TruncateArrayUDF.java 可以执行您所要求的操作。只需克隆 repo从主页并使用 Maven 构建 jar .

示例数据:

|       col1         |
----------------------
  1,2,3,4,5,6,7
  11,12,13,14,15,16,17

查询:

add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function trunc as 'brickhouse.udf.collect.TruncateArrayUDF';

select pos
      ,newcol
from (
      select trunc(split(col1, '\\,'), 5) as p
      from table
     ) x
lateral view posexplode(p) explodetable as pos, newcol

输出:

  pos  |  newcol  |
-------------------
  0         1
  1         2
  2         3
  3         4
  4         5
  0         11
  1         12
  2         13
  3         14
  4         15

关于hive - 如何在 Hive 中获取数组中的前 n 个元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25355916/

上一篇：git - 从另一个 GIT 项目的特定提交创建一个 GIT 项目

下一篇：r - 使用 eulerr 的精确维恩图

相关文章：

sql - Hive 在简单的选择查询中不返回任何结果

hadoop - 列值超过一行的数据导入到HIVE表

hadoop - 如何在单节点中安装大数据生态系统

hadoop - 将 TeraData 查询转换为 Hive

sql - 如何根据另一个表的值获取一个表的最大值

apache - 如何在不重新启动 oozie 作业的情况下重新加载 oozie 作业配置文件

hive - 无法使用 SQOOP 列出 MS SQL 中的表

hadoop - hiveserver2 启动然后很快停止

linux - 如何禁止打印到 CLI 的配置单元列名？