hadoop - pig 过滤器不工作

我有以下 pig 脚本，

meta_file = LOAD 'meta_file' USING PigStorage(',');

DUMP meta_file;

meta = FOREACH meta_file GENERATE (chararray)$0 AS is_vta:chararray, (chararray)$1 AS id:long;

DUMP meta;

new_d = FILTER meta BY (is_vta == 't');
DUMP new_d;

元文件的内容:

"t","7181397"
"t","6331589"
"f","7266217"
"t","6051440"
"t","6901437"
"t","6805292"
"f","7144764"
"t","6820265"
"f","7515321"
"t","4777938"

meta_file 的 DUMP 完全没问题，与文件的内容相同，meta 的内容也是如此，但是 new_d 是空的。我可以看到 meta 中有 is_vta，其值为 t，但 new_d 仍然是空的。为什么 meta 没有被正确过滤？我在这里做错了什么？我是 Pig Latin 的新手，无法弄清楚这里可能存在的问题。

感谢您的帮助。

最佳答案

简单的方法:

new_d = FILTER meta BY is_vta MATCHES '.*t.*';

另一种解决方案:

remquotes = FOREACH meta GENERATE REPLACE($0, '\\"', '') AS is_vta:chararray, id;

new_d = FILTER remquotes BY is_vta == 't';

关于hadoop - pig 过滤器不工作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42152953/

上一篇：scala - 如何在 HBase 中编码 float/double/integer 值？

下一篇：hadoop - 在 MapReduce Word Count Example 中查找在 map 阶段启动的 map 方法的数量

相关文章：

string - Hadoop中文本和字符串的区别

hadoop - yarn hadoop 2.4.0 : info message: ipc. 客户端重试连接到服务器

java - 将多个特定文件作为来自另一个 java 程序的一个字符串 arg 传递

hadoop - clusteredPoints聚类结果消失[mahout]

hadoop - pig 错误: unexpected character '\'

java - Hadoop Java : how to specify map key as one of the index of input split?

hadoop - 所有 map task 均达到100％，但仍处于运行状态

python-2.7 - pig 过滤器不工作

foreach - 通过在 PIG 中的同一 block 内计算的条件值在 FOREACH block 内进行过滤

hadoop - pig 拉丁问题