在http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#FOREACH的以下两个示例中,group
是做什么的:
Example: Nested Projection
In this example if one of the fields in the input relation is a tuple, bag or map, we can perform a projection on that field (using a deference operator).
X = FOREACH C GENERATE group, B.b2;
DUMP X; (1,{(3)}) (4,{(6),(9)}) (8,{(9)})
In this example multiple nested columns are retained.
X = FOREACH C GENERATE group, A.(a1, a2);
DUMP X; (1,{(1,2)}) (4,{(4,2),(4,3)}) (8,{(8,3),(8,4)})
使用
group
和使用GROUP
有什么区别吗?例子:
Example: Flattening
In this example the FLATTEN operator is used to eliminate nesting.
X = FOREACH C GENERATE group, FLATTEN(A);
DUMP X; (1,1,2,3) (4,4,2,1) (4,4,3,3) (8,8,3,4) (8,8,4,3)
Another FLATTEN example.
X = FOREACH C GENERATE GROUP, FLATTEN(A.a3);
DUMP X; (1,3) (4,1) (4,3) (8,4) (8,3)
最佳答案
从http://pig.apache.org/docs/r0.11.0/basic.html#GROUP
The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.
字段名称区分大小写,因此
GROUP
运算符不会创建字段“GROUP”。我认为这是 pig 文档中的一个小错误。
关于hadoop - 组运算符和组与组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23795661/