hadoop - 将 Hadoop+Hive 与 AWS EMR 上的 MongoDB 连接(找不到类 com/mongodb/DBObject)

标签 hadoop amazon-web-services hive mongodb-java emr

我喜欢通过 MongoDB 连接(而不是通过 BSON 转储)将 EMR 集群连接到我们的 MongoDB。

为此,我通过 AWS 管理控制台生成了集群。在 Bootstrap 配置中,我指向了这个位于 S3 上的文件:

#!/bin/sh

wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar

wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar

当集群生成时,我通过 sshed 进入 master 并看到它们已成功下载。

当我在 Hive shell 中执行此操作时:

CREATE TABLE nicks
( 
  id STRING,
  name STRING,
  business STRING,
  alias STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
TBLPROPERTIES('mongo.uri'='mongodb://54.93.123.123:27017/foo.aliases');

ADD JAR /home/hadoop/lib/mongo-hadoop-core-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-hadoop-hive-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-java-driver-2.13.0.jar;

Select * from nicks;

我得到了这个异常(exception):

Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/DBObject
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByClass(MongoSplitterFactory.java:41)
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:109)
    at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:64)
    at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:44)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:418)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:534)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1519)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:292)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.mongodb.DBObject
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 20 more

注意:

  • 我已经(通过 ssh)将所有 4 个库都放在了正确的文件夹中
  • Mongo-Hive 连接器 JAR 似乎已加载,因为我在通过执行“ADD JAR ...”修复之前遇到了另一个异常。
  • 我检查了 mongo-java-driver jar 的内容。好像是有效的(我在里面找到了DBObject类)

如何解决这个问题或我如何调试错误?

最佳答案

解决方案是将库也放入 /home/hadoop/hive/lib。使用此脚本,它可以工作:

#!/bin/sh

wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar

wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar

cp /home/hadoop/lib/mongo* /home/hadoop/hive/lib

关于hadoop - 将 Hadoop+Hive 与 AWS EMR 上的 MongoDB 连接(找不到类 com/mongodb/DBObject),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28998333/

相关文章:

hadoop - 如何将 Flink 作业提交到远程 YARN 集群?

hadoop - 如何在Hortonworks Edge Node中安装最新版本的Apache Spark

python - 从 S3 存储桶导入 AWS Lambda 函数代码中的库

hive - 如何将键值对插入 Hive 映射?

hadoop - 纱容器尺寸和 Tez 容器管理

hadoop - HIVE多组依负运算

hadoop - 作为输入可以使用Apache Pig Load Function Bag吗?

hadoop - TDCH中Hive表的拆分大小

amazon-web-services - 如何同时使用 ec2.py 和 localhost

python-3.x - 使用 boto3 扫描 Dynamo DB 以获取字典数组