mongodb - 执行与MongoDB集成的Pig脚本时出错

标签 mongodb hadoop apache-pig

我正在练习与Pig进行MongoDB集成。
Hadoop版本-1.2.1
MongoDB版本-2.6.11
Apache pig -0.14

请检查代码和建议,怎么了?

我在使用正确的兼容JAR吗?

--Program for Connecting MongoDB with Pig  
 -----------------------------------------------  
 REGISTER /usr/local/hadoop/pig/lib/avro-1.7.5.jar  
 REGISTER /usr/local/hadoop/pig/lib/json-simple-1.1.jar  
 REGISTER /usr/local/hadoop/pig/lib/piggybank.jar  
 DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();  
 REGISTER /usr/local/hadoop/pig/mongo-hadoop-core_1.0.2-1.0.0.jar  
 REGISTER /usr/local/hadoop/pig/mongo-hadoop-pig_0.20.205.0-1.2.0.jar  
 REGISTER /usr/local/hadoop/pig/mongo-java-driver-2.10.1.jar  
 REGISTER /usr/local/hadoop/pig/hadoop-mapred-0.21.0.jar  
 data = LOAD 'mongodb://localhost/MongoPracs.employees'  
  USING com.mongodb.hadoop.pig.MongoLoader('firstName:chararray, lastName:chararray’);  
 EXPLAIN data;  
 DUMP data;

我收到错误
@ubuntu:/usr/local/hadoop/pig$ pig -x local Mongo.pig  
 15/12/14 07:29:48 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL  
 15/12/14 07:29:48 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType  
 2015-12-14 07:29:48,698 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:01:24  
 2015-12-14 07:29:48,698 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/hadoop-1.2.1/pig-0.14.0/pig_1450058388697.log  
 2015-12-14 07:29:48,809 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/gopal/.pigbootup not found  
 2015-12-14 07:29:48,884 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///  
 2015-12-14 07:29:49,909 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}  
 #-----------------------------------------------  
 # New Logical Plan:  
 #-----------------------------------------------  
 data: (Name: LOStore Schema: firstName#1:chararray,lastName#2:chararray)  
 |  
 |---data: (Name: LOLoad Schema: firstName#1:chararray,lastName#2:chararray)RequiredFields:null  
 #-----------------------------------------------  
 # Physical Plan:  
 #-----------------------------------------------  
 data: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1  
 |  
 |---data: Load(mongodb://localhost/MongoPracs.employees:com.mongodb.hadoop.pig.MongoLoader('firstName:chararray, lastName:chararray')) - scope-0  
 2015-12-14 07:29:50,105 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false  
 2015-12-14 07:29:50,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1  
 2015-12-14 07:29:50,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1  
 #--------------------------------------------------  
 # Map Reduce Plan                   
 #--------------------------------------------------  
 MapReduce node scope-2  
 Map Plan  
 data: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1  
 |  
 |---data: Load(mongodb://localhost/MongoPracs.employees:com.mongodb.hadoop.pig.MongoLoader('firstName:chararray, lastName:chararray')) - scope-0--------  
 Global sort: false  
 ----------------  
 2015-12-14 07:29:50,161 [main] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library  
 2015-12-14 07:29:50,188 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN  
 2015-12-14 07:29:50,197 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized  
 2015-12-14 07:29:50,198 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}  
 2015-12-14 07:29:50,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false  
 2015-12-14 07:29:50,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1  
 2015-12-14 07:29:50,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1  
 2015-12-14 07:29:50,246 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job  
 2015-12-14 07:29:50,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3  
 2015-12-14 07:29:50,331 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job  
 2015-12-14 07:29:50,358 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.  
 2015-12-14 07:29:50,383 [JobControl] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).  
 2015-12-14 07:29:50,444 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area file:/tmp/hadoop-gopal/mapred/staging/gopal95268095/.staging/job_local95268095_0001  
 2015-12-14 07:29:50,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete  
 2015-12-14 07:29:50,452 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.  
 Details at logfile: /usr/local/hadoop-1.2.1/pig-0.14.0/pig_1450058388697.log  
 2015-12-14 07:29:50,469 [main] INFO org.apache.pig.Main - Pig script completed in 17 seconds and 77 milliseconds (17077 ms)



Pig Stack Trace
---------------
ERROR 2117: Unexpected error when launching map reduce job.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias data
    at org.apache.pig.PigServer.openIterator(PigServer.java:935)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:624)
    at org.apache.pig.Main.main(Main.java:170)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias data
    at org.apache.pig.PigServer.storeEx(PigServer.java:1038)
    at org.apache.pig.PigServer.store(PigServer.java:997)
    at org.apache.pig.PigServer.openIterator(PigServer.java:910)
    ... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:388)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
    ... 14 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching job: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:159)
    at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
    ... 3 more
Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected
    at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:54)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
    at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    ... 8 more

    at org.apache.pig.backend.hadoop.executionengine.Launcher.setJobException(Launcher.java:293)
    at org.apache.pig.backend.hadoop.executionengine.Launcher$JobControlThreadExceptionHandler.uncaughtException(Launcher.java:282)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1986)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:159)
    at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
    ... 3 more
Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected
    at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:54)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
    at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    ... 8 more
================================================================================

最佳答案

您的日志似乎不完整,但是当前最接近根本原因的原因是类转换出现问题。

推导自:

Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected

关于mongodb - 执行与MongoDB集成的Pig脚本时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34280277/

相关文章:

arrays - Mongoose 创建包含对象数组的记录(转换为数组失败)

java - Hadoop 和 Hive 中的位级查询

hadoop - Presto 查询无法将数据插入 Hive

performance - 如何强制 PigStorage 输出几个大文件而不是数千个小文件?

apache-pig - 如何用 Pig Latin 编写带有 WHERE 子句的左外连接?

hadoop - 在 PIG 中读取带有模式的文件

javascript - 快速中间件的正确使用方法?

mongodb - 在mongodb中查询不同的多级关系

mongodb - 我无法使用容器 docker 连接到 mongo

hadoop - 是否有SQL命令删除HDFS上用于外部表的文件