hadoop - 在Pig中,出现 'Error compiling operator POLocalRearrange'错误

标签 hadoop apache-pig bigdata

我正在使用cloudera yarn VMware Player(非商业用途)进行练习。
我在 pig 里的剧本是
a1 = load '/user/training/my_hdfs/id' using PigStorage('\t') as(id:int,name:chararray,desig:chararray); a2 = load '/user/training/my_hdfs/trips' using PigStorage('\t') as(id:int,place:chararray,no_trips:int); a3 = join a1 by id,a2 by id; a4 = group a3 by a1::id; illustrate a4;
说明后,它显示消息为,
2017-08-21 07:52:11,926 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception : Error compiling operator POLocalRearrange
数据集是
Table id 101 aaa executive 102 bbb manager 104 hhh manager 106 ccc trainee 109 hhh traineeTable trips 101 pune 1 101 hyd 2 102 pune 2 102 hyd 3 102 bang 4

最佳答案

当我尝试使用提供的数据运行程序时,由于文件中的分隔符不一致,我也遇到一些错误。有的地方有空间,有的地方有标签(可能是因为复制粘贴)。我使定界符通用(使用制表符),并且一切正常。

尝试使用转储a1或转储a2,看看是否可以在正确的列中看到数据。
对我来说,在使定界符通用之后,它可以完美工作并说明a4给出以下输出:

------------------------------------------------------------------
| a1     | id:int     | name:chararray     | desig:chararray     |
------------------------------------------------------------------
|        | 101        | aaa                | executive           |
|        | 101        | aaa                | executive           |
------------------------------------------------------------------
----------------------------------------------------------------
| a2     | id:int     | place:chararray     | no_trips:int     |
----------------------------------------------------------------
|        | 101        | pune                | 1                |
|        | 101        | hyd                 | 2                |
----------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
| a3     | a1::id:int     | a1::name:chararray     | a1::desig:chararray     | a2::id:int     | a2::place:chararray     | a2::no_trips:int     |
------------------------------------------------------------------------------------------------------------------------------------------------
|        | 101            | aaa                    | executive               | 101            | pune                    | 1                    |
|        | 101            | aaa                    | executive               | 101            | hyd                     | 2                    |
|        | 101            | aaa                    | executive               | 101            | pune                    | 1                    |
|        | 101            | aaa                    | executive               | 101            | hyd                     | 2                    |
------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| a4     | group:int     | a3:bag{:tuple(a1::id:int,a1::name:chararray,a1::desig:chararray,a2::id:int,a2::place:chararray,a2::no_trips:int)}                                 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        | 101           | {(101, ..., 1), ..., (101, ..., 2)}                                                                                                               |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

关于hadoop - 在Pig中,出现 'Error compiling operator POLocalRearrange'错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45797284/

相关文章:

java - 使用库在Eclipse中运行Pig时零件文件为空

.net - 如何使用 .net 语言(C# 或 PowerShell 等)高效读取大文件(大于数百 GB)的前 1000 个字节

java - Cassandra 和 Pig 集成 - hadoop 是可选的吗?

hadoop - 如何排序元组中的项目?

sql - 为什么 A 和 B 的内连接在 Pig 中比 A 或 B 产生更多结果?

hadoop - Pig:动态聚合特定参数

hadoop - Pig:如何对时间序列数据重新采样?

hadoop - Hive:尝试创建动态分区时出现 fatal error

java - Hive:如何计算时差

hadoop - 所有节点启动失败