mongodb - "ERROR 6000, Output location validation failed"在 EMR 上使用 PIG MongoDB-Hadoop 连接器

标签 mongodb hadoop apache-pig amazon-emr mongodb-hadoop

我在 EMR 上的 pig 脚本中收到“输出位置验证失败”异常。 将数据保存回 S3 时失败。 我使用这个简单的脚本来缩小问题范围:

REGISTER /home/hadoop/lib/mongo-java-driver-2.13.0.jar  
REGISTER /home/hadoop/lib/mongo-hadoop-core-1.3.2.jar
REGISTER /home/hadoop/lib/mongo-hadoop-pig-1.3.2.jar

example = LOAD 's3://xxx/example-full.bson'
         USING com.mongodb.hadoop.pig.BSONLoader();


STORE example INTO 's3n://xxx/out/example.bson' USING com.mongodb.hadoop.pig.BSONStorage();

这是生成的 Stacktrace:

================================================================================
Pig Stack Trace
---------------
ERROR 6000:
<line 8, column 0> Output Location Validation Failed for: 's3://xxx/out/example.bson More info to follow:
Output directory not set.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias example
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1637)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:577)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1091)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:543)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 6000:
<line 8, column 0> Output Location Validation Failed for: 's3://xxx/out/example.bson More info to follow:
Output directory not set.
    at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:95)
    at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
    at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
    at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
    at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
    at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:317)
    at org.apache.pig.PigServer.compilePp(PigServer.java:1382)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
    at org.apache.pig.PigServer.execute(PigServer.java:1299)
    at org.apache.pig.PigServer.access$400(PigServer.java:124)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1632)
    ... 13 more
Caused by: org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:138)
    at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)
    ... 26 more

为了设置 MongoConnector,我使用了这个 Bootstrap 脚本:

#!/bin/sh

wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar

cp /home/hadoop/lib/mongo* /home/hadoop/hive/lib
cp /home/hadoop/lib/mongo* /home/hadoop/pig/lib

最佳答案

错误提示输出目录不存在。

当然,解决方案是创建输出目录。

为了快速检查,也可以使输出目录与输入目录相同。如果该目录确实存在,则可能是一个权利问题。

关于mongodb - "ERROR 6000, Output location validation failed"在 EMR 上使用 PIG MongoDB-Hadoop 连接器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29217152/

相关文章:

hadoop - 从 oozie 以本地模式运行 PIG

mongodb - MGO - 从 Mongo 返回的空结果有结果

java - MongoSocketOpenException - 连接到 MongoDB 时出现问题

tsql - 如何跳过在 HIVE 命令行中显示结果?

java - 升级 Cloudera 的 java 版本有多安全?

session - session 化的Web日志,获取上一个和下一个域

hadoop - 将数据加载到 Hadoop

node.js - 如何使用 MongoDB 和 Node.js 插入多个对象?

javascript - 通过什么技巧分别构造的两个对象可以是同一个对象?

hadoop - HIVE 拆分字符串