hadoop - 组运算符和组与组

在http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#FOREACH的以下两个示例中，group是做什么的:

Example: Nested Projection

In this example if one of the fields in the input relation is a tuple, bag or map, we can perform a projection on that field (using a deference operator).

X = FOREACH C GENERATE group, B.b2;

DUMP X; (1,{(3)}) (4,{(6),(9)}) (8,{(9)})

In this example multiple nested columns are retained.

X = FOREACH C GENERATE group, A.(a1, a2);

DUMP X; (1,{(1,2)}) (4,{(4,2),(4,3)}) (8,{(8,3),(8,4)})

使用group和使用GROUP有什么区别吗？

例子:

Example: Flattening

In this example the FLATTEN operator is used to eliminate nesting.

X = FOREACH C GENERATE group, FLATTEN(A);

DUMP X; (1,1,2,3) (4,4,2,1) (4,4,3,3) (8,8,3,4) (8,8,4,3)

Another FLATTEN example.

X = FOREACH C GENERATE GROUP, FLATTEN(A.a3);

DUMP X; (1,3) (4,1) (4,3) (8,4) (8,3)

最佳答案

从http://pig.apache.org/docs/r0.11.0/basic.html#GROUP

The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.

字段名称区分大小写，因此GROUP运算符不会创建字段“GROUP”。我认为这是 pig 文档中的一个小错误。

关于hadoop - 组运算符和组与组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23795661/

上一篇：hadoop - 色调浏览器中的WebHdfsException

下一篇：hadoop - 在Pig脚本中加载.gz文件时出错

相关文章：

apache-pig - 使用 pig，如何将混合格式行解析为元组和一袋元组？

hadoop - 如何在 Pig Latin 中实现 Levenshtein 算法

azure - Azure:创建R群集时是否可以选择ambari实例？

hadoop - hadoop中的数据存储在哪里

java - MapReduce canopy 聚类中心

hadoop - 将数据提取到 PIG 中的不同关系中

hadoop - 像袋子一样压扁元组

hadoop - 组织.apache.hadoop.hbase.MasterNotRunningException : null

hadoop - Hive查询时如何知道数据文件是如何压缩的

hadoop - 如何在Pig中使用自定义加载程序功能返回多个元组