java - Pig 0.13.0 错误 :ERROR 2998: Unhandled internal error. org/apache/commons/io/input/ClassLoaderObjectInputStream

标签 java hadoop apache-pig

Hadoop 版本 1 和版本 2 之间似乎存在版本不匹配。

环境:
Mac OS X 10.9.5 小牛
pig 0.13.0

用 0.13.0 构建 pig

$ ant clean jar-all -Dhadoopversion=23

HADOOP_HOME=/Users/davidlaxer/hadoop-2.3.0-src
HADOOP_CONF_DIR=/Users/davidlaxer/hadoop-2.3.0-src/src/conf

(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$ env | grep PIG
PIG_HOME=/Users/davidlaxer/pig-0.13.0
PIG_CLASSPATH=/users/davidlaxer/hadoop-2.3.0-src/src/conf

(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$ hadoop version
Hadoop 0.21.0
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 -r 985326
Compiled by tomwhite on Tue Aug 17 01:02:28 EDT 2010
From source with checksum a1aeb15b4854808d152989ba76f90fac
(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$ 

pig -secretDebugCmd
Find hadoop at /usr/local/bin/hadoop
dry run:
HADOOP_CLASSPATH: /Users/davidlaxer/pig-0.13.0/conf:/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/lib/tools.jar:/users/davidlaxer/hadoop-2.3.0-src/src/conf:/Users/davidlaxer/hadoop-2.3.0-src/src/conf:/Users/davidlaxer/pig-0.13.0/lib/accumulo-core-1.5.0.jar:/Users/davidlaxer/pig-0.13.0/lib/accumulo-fate-1.5.0.jar:/Users/davidlaxer/pig-0.13.0/lib/accumulo-server-1.5.0.jar:/Users/davidlaxer/pig-0.13.0/lib/accumulo-start-1.5.0.jar:/Users/davidlaxer/pig-0.13.0/lib/accumulo-trace-1.5.0.jar:/Users/davidlaxer/pig-0.13.0/lib/avro-1.7.5.jar:/Users/davidlaxer/pig-0.13.0/lib/avro-mapred-1.7.5.jar:/Users/davidlaxer/pig-0.13.0/lib/avro-tools-1.7.5-nodeps.jar:/Users/davidlaxer/pig-0.13.0/lib/groovy-all-1.8.6.jar:/Users/davidlaxer/pig-0.13.0/lib/hbase-0.94.1.jar:/Users/davidlaxer/pig-0.13.0/lib/jruby-complete-1.6.7.jar:/Users/davidlaxer/pig-0.13.0/lib/js-1.7R2.jar:/Users/davidlaxer/pig-0.13.0/lib/json-simple-1.1.jar:/Users/davidlaxer/pig-0.13.0/lib/jython-standalone-2.5.3.jar:/Users/davidlaxer/pig-0.13.0/lib/piggybank.jar:/Users/davidlaxer/pig-0.13.0/lib/protobuf-java-2.4.0a.jar:/Users/davidlaxer/pig-0.13.0/lib/zookeeper-3.4.5.jar:/Users/davidlaxer/pig-0.13.0/pig-0.13.0-withouthadoop-h2.jar:
HADOOP_OPTS: -Xmx1000m  -Dpig.log.dir=/Users/davidlaxer/pig-0.13.0/logs -Dpig.log.file=pig.log -Dpig.home.dir=/Users/davidlaxer/pig-0.13.0 
HADOOP_CLIENT_OPTS: -Xmx1000m  -Dpig.log.dir=/Users/davidlaxer/pig-0.13.0/logs -Dpig.log.file=pig.log -Dpig.home.dir=/Users/davidlaxer/pig-0.13.0 
/usr/local/bin/hadoop jar /Users/davidlaxer/pig-0.13.0/pig-0.13.0-withouthadoop-h2.jar

(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$ cat test.pig
/* Set Home Directory - where we install software */
%default HOME `echo \$HOME`

REGISTER /Users/davidlaxer/pig-0.13.0/build/ivy/lib/Pig/avro-1.7.5.jar
REGISTER /Users/davidlaxer/pig-0.13.0/build/ivy/lib/Pig/json-simple-1.1.jar
REGISTER /Users/davidlaxer/pig-0.13.0/contrib/piggybank/java/piggybank.jar

/* DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();*/

/* Load the emails in avro format (edit the path to match where you saved them) using the AvroStorage UDF from Piggybank */
messages = LOAD '/tmp/test_mbox' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

DESCRIBE messages;
EXPLAIN messages;
ILLUSTRATE messages;
lmt = LIMIT messages 100;
dump messages;

STORE messages INTO '/tmp/messages' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$
(virtualenv)David-Laxers-MacBook-Pro:pig davidlaxer$ !pi
pig -l /tmp -x local -w -v test.pig 
2014-10-10 17:37:45,670 INFO  [main] pig.ExecTypeProvider (ExecTypeProvider.java:selectExecType(41)) - Trying ExecType : LOCAL
2014-10-10 17:37:45,673 INFO  [main] pig.ExecTypeProvider (ExecTypeProvider.java:selectExecType(43)) - Picked LOCAL as the ExecType
2014-10-10 17:37:45,734 [main] INFO  org.apache.pig.Main - Apache Pig version 0.13.1-SNAPSHOT (rUnversioned directory) compiled Oct 10 2014, 17:26:21
2014-10-10 17:37:45,735 [main] INFO  org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1412980665665.log
2014-10-10 17:37:46.007 java[87678:1003] Unable to load realm info from SCDynamicStore
2014-10-10 17:37:46,012 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-10-10 17:37:46,598 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /Users/davidlaxer/.pigbootup not found
2014-10-10 17:37:46,684 [main] INFO  org.apache.pig.tools.parameters.PreprocessorContext - Executing command : echo $HOME
2014-10-10 17:37:46,844 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:46,845 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-10-10 17:37:46,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-10-10 17:37:47,012 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:47,280 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:47,330 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:47,891 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
messages: {message_id: chararray,thread_id: chararray,in_reply_to: chararray,subject: chararray,body: chararray,date: chararray,from: (real_name: chararray,address: chararray),tos: {ARRAY_ELEM: (real_name: chararray,address: chararray)},ccs: {ARRAY_ELEM: (real_name: chararray,address: chararray)},bccs: {ARRAY_ELEM: (real_name: chararray,address: chararray)},reply_tos: {ARRAY_ELEM: (real_name: chararray,address: chararray)}}
2014-10-10 17:37:48,810 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
messages: (Name: LOStore Schema: message_id#26:chararray,thread_id#27:chararray,in_reply_to#28:chararray,subject#29:chararray,body#30:chararray,date#31:chararray,from#32:tuple(real_name#33:chararray,address#34:chararray),tos#35:bag{ARRAY_ELEM#36:tuple(real_name#37:chararray,address#38:chararray)},ccs#39:bag{ARRAY_ELEM#40:tuple(real_name#41:chararray,address#42:chararray)},bccs#43:bag{ARRAY_ELEM#44:tuple(real_name#45:chararray,address#46:chararray)},reply_tos#47:bag{ARRAY_ELEM#48:tuple(real_name#49:chararray,address#50:chararray)})
|
|---messages: (Name: LOLoad Schema: message_id#26:chararray,thread_id#27:chararray,in_reply_to#28:chararray,subject#29:chararray,body#30:chararray,date#31:chararray,from#32:tuple(real_name#33:chararray,address#34:chararray),tos#35:bag{ARRAY_ELEM#36:tuple(real_name#37:chararray,address#38:chararray)},ccs#39:bag{ARRAY_ELEM#40:tuple(real_name#41:chararray,address#42:chararray)},bccs#43:bag{ARRAY_ELEM#44:tuple(real_name#45:chararray,address#46:chararray)},reply_tos#47:bag{ARRAY_ELEM#48:tuple(real_name#49:chararray,address#50:chararray)})RequiredFields:null
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
messages: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---messages: Load(/tmp/test_mbox:org.apache.pig.piggybank.storage.avro.AvroStorage) - scope-0

#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
No MR jobs. Fetch only.
2014-10-10 17:37:49,145 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:49,146 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-10-10 17:37:49,185 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[LoadTypeCastInserter, StreamTypeCastInserter], RULES_DISABLED=[AddForEach, ColumnMapKeyPrune, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, LimitOptimizer, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter]}
2014-10-10 17:37:49,221 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-10-10 17:37:49,236 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-10-10 17:37:49,236 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-10-10 17:37:49,267 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2014-10-10 17:37:49,281 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2014-10-10 17:37:49,281 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-10-10 17:37:49,282 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2014-10-10 17:37:49,506 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: messages[11,11] C:  R: 
2014-10-10 17:37:49,509 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-10 17:37:49,511 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2014-10-10 17:37:49,552 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-10-10 17:37:49,556 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
2014-10-10 17:37:49,556 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at org.apache.pig.piggybank.storage.avro.PigAvroInputFormat.listStatus(PigAvroInputFormat.java:96)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:375)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:202)
    at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:259)
    at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:223)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:155)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1282)
    at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:810)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:802)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:381)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:608)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Details also at logfile: /private/tmp/pig_1412980665665.log

最佳答案

Pig 将在 Hadoop 2.x 上运行良好,只要您使用 -Dhadoopversion 编译它即可。开关,你有。

但是,您在脚本中使用了 piggybank 函数,并且在运行 ant jar-all 时不会编译 piggybank。在 pig 的根目录中。这意味着您选择了针对 Hadoop 1.x 构建的版本,因此 JobController类与接口(interface)异常。

要修复它,您只需要使用 -Dhadoopversion 构建储钱 jar 即可。转变。

从 pig 根目录:

$ cd contrib/piggybank/java
$ ant clean
$ ant -Dhadoopversion=23

关于java - Pig 0.13.0 错误 :ERROR 2998: Unhandled internal error. org/apache/commons/io/input/ClassLoaderObjectInputStream,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26309124/

相关文章:

hadoop - Pig 字段中的拆分字符

java - 自定义 Jackson 序列化具有一对多关系的实体

java - 为什么空方法 appendTo 有意义?

java - 方法是否隐藏了多态性的一种形式?

hadoop - Oozie 作业使用 fork 而不是使用 join

scala - 使用 sc.textFile() 加载本地文件以激发

java - 从 String.format() 生成的字符串中提取值

hadoop - 如何设计每个映射器来处理SequenceFile的每一行?

apache-pig - pig 在色调内不可见

hadoop - 是否有用于上传到 HDFS 的现有门户