hadoop - 组运算符和组与组

标签 hadoop apache-pig

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#FOREACH的以下两个示例中,group是做什么的:

Example: Nested Projection

In this example if one of the fields in the input relation is a tuple, bag or map, we can perform a projection on that field (using a deference operator).

X = FOREACH C GENERATE group, B.b2;

DUMP X; (1,{(3)}) (4,{(6),(9)}) (8,{(9)})

In this example multiple nested columns are retained.

X = FOREACH C GENERATE group, A.(a1, a2);

DUMP X; (1,{(1,2)}) (4,{(4,2),(4,3)}) (8,{(8,3),(8,4)})



使用group和使用GROUP有什么区别吗?

例子:

Example: Flattening

In this example the FLATTEN operator is used to eliminate nesting.

X = FOREACH C GENERATE group, FLATTEN(A);

DUMP X; (1,1,2,3) (4,4,2,1) (4,4,3,3) (8,8,3,4) (8,8,4,3)

Another FLATTEN example.

X = FOREACH C GENERATE GROUP, FLATTEN(A.a3);

DUMP X; (1,3) (4,1) (4,3) (8,4) (8,3)

最佳答案

http://pig.apache.org/docs/r0.11.0/basic.html#GROUP

The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.

字段名称区分大小写,因此GROUP运算符不会创建字段“GROUP”。我认为这是 pig 文档中的一个小错误。

关于hadoop - 组运算符和组与组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23795661/

相关文章:

apache-pig - 使用 pig,如何将混合格式行解析为元组和一袋元组?

hadoop - 如何在 Pig Latin 中实现 Levenshtein 算法

azure - Azure:创建R群集时是否可以选择ambari实例?

hadoop - hadoop中的数据存储在哪里

java - MapReduce canopy 聚类中心

hadoop - 将数据提取到 PIG 中的不同关系中

hadoop - 像袋子一样压扁元组

hadoop - 组织.apache.hadoop.hbase.MasterNotRunningException : null

hadoop - Hive查询时如何知道数据文件是如何压缩的

hadoop - 如何在Pig中使用自定义加载程序功能返回多个元组