我有以下内容:
(id: int, 名字: chararray)
然后我按 id 分组,创建一个名字包。我看到在名称包中,可能有一个空值。如何从包中删除空值?
最佳答案
您可以使用嵌套在 FOREACH 中的 FILTER 从 GROUP BY 创建的包中删除元组。
inpt = LOAD '...' as (id: int, names: chararray);
grp = GROUP inpt BY id;
result = FOREACH grp {
no_nulls = FILTER inpt BY names is not null;
GENERATE group, no_nulls;
};
或者在分组之前过滤空名称:
inpt = LOAD '...' as (id: int, names: chararray);
no_nulls = FILTER input BY names is not null;
grp = GROUP no_nulls BY id;
关于hadoop - 逐个从包中删除空值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14670061/