hadoop - Pig/Hadoop 中的奇怪转换错误

标签 hadoop apache-pig

使用 Pig 0.10.1,我有以下脚本:

br = LOAD 'cfs:///somefile';

SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;

tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;

out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;

dump out;

br 中加载的文件是由先前的 Pig 脚本生成的,并且具有嵌入式模式(.pig_schema 文件):

describe br
br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,t0002: long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214: long,t1416: long,t1618: long,t1820: long,t2022: long,t2200: long,browser_software: chararray,first_timestamp: long,last_timestamp: long,os: chararray,platform: chararray,sp: int,adp: double}

一些不相关的字段是从上面编辑的(我现在不能完全透露数据的性质)。

脚本失败并出现以下错误:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer cannot be cast to java.lang.Long

然而,转储s0, s1, s2, s3, s4tmp0tmp1tmp2 tmp3tmp4 完美运行。

Hadoop 作业跟踪器显示以下错误 4 次:

java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

我也试过这个片段(而不是原来的dump):

x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;

我得到了一个不同的(但我认为是相关的)错误:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double cannot be cast to java.lang.Long

作业跟踪器错误(重复 4 次):

java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

我尝试寻找涉及联合的已知错误,但没有成功。这实在令人费解。想法?

最佳答案

进一步挖掘后,看起来这是一个错误。我创建了一个 ticket for it .

关于hadoop - Pig/Hadoop 中的奇怪转换错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24047572/

相关文章:

hadoop - 在配置单元中更改列名后,列的值变为 NULL

mysql - Hive:计算运行 DISTINCT

hadoop - 预计 pig 脚本中的QUOTED STRING

hadoop - 在 Hadoop 作业中调用 Solr Cloud Index 时出错

apache-pig - Pig - 日期字符串到长整型的转换

hadoop - 将 pig 结果存储在文本文件中

java - pig : Hadoop jobs Fail

hadoop - describe 不能在 Pig 宏中使用?

hadoop - Mapreduce:捕获所有作业统计信息

Hadoop 生态系统 - 在我的场景中使用什么技术工具组合? (详情见内)