hadoop - Hive的Apache Spark部署问题(集群模式)

编辑:

我正在开发一个Spark应用程序，该应用程序从多个结构化架构中读取数据，并且试图汇总这些架构中的信息。我的应用程序在本地运行时运行良好。但是，当我在群集上运行它时，我遇到了配置问题(很可能是hive-site.xml)或提交命令参数问题。我一直在寻找其他相关职位，但找不到适合我的方案的解决方案。我已经在下面提到了我尝试过的命令以及详细的错误。我是Spark的新手，可能会错过一些琐碎的事情，但是可以提供更多信息来支持我的问题。

原始问题:

我一直在尝试在与HDP2.3组件 bundle 在一起的6节点Hadoop集群中运行我的spark应用程序。

以下是组件信息，可能对您在建议解决方案方面很有用:

群集信息:6节点群集:

128GB内存
24芯
8TB硬盘

应用程序中使用的组件

HDP -2.3

Spark -1.3.1

$ hadoop版本:

Hadoop 2.7.1.2.3.0.0-2557
Subversion git@github.com:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1
Compiled by jenkins on 2015-07-14T13:08Z
Compiled with protoc 2.5.0
From source with checksum 54f9bbb4492f92975e84e390599b881d

方案:

我正在尝试使用SparkContext和HiveContext来充分利用Spark对其数据结构(如数据帧)的实时查询的优势。我的应用程序中使用的依赖项是:

<dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.4.0</version>
    </dependency>

以下是我得到的提交命令和响应的错误日志:

提交Command1:

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    application-with-all-dependencies.jar

错误记录1:

User class threw exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

提交Command2:

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

错误记录2:

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

由于我没有管理权限，因此无法修改配置。好吧，我可以与IT工程师联系并进行更改，但是我正在寻找
如果可能的话，可以减少配置文件更改的解决方案!

建议更改配置here。

然后，我尝试按照其他讨论论坛中的建议传递各种jar文件作为参数。

提交Command3:

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-rdbms-3.2.9.jar \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

错误记录3:

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

我不明白下面的命令发生了什么，也无法分析错误日志。

提交Command4:

spark-submit --class working.path.to.Main \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 17 \
    --executor-cores 8 \
    --executor-memory 25g \
    --driver-memory 25g \
    --num-executors 5 \
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/*.jar \
    --files /etc/hive/conf/hive-site.xml \
    application-with-all-dependencies.jar

提交日志4:

Application application_1461686223085_0014 failed 2 times due to AM Container for appattempt_1461686223085_0014_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://cluster-host:XXXX/cluster/app/application_1461686223085_0014Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e10_1461686223085_0014_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.

还有其他可能的选择吗？任何帮助将不胜感激。如果您需要其他任何信息，请告诉我。

谢谢。

最佳答案

here中解释的解决方案适用于我的情况。 hive-site.xml驻留的两个位置可能会造成混淆。使用--files /usr/hdp/current/spark-client/conf/hive-site.xml而不是--files /etc/hive/conf/hive-site.xml。我不必为我的配置添加jar。希望这可以帮助遇到类似问题的人。谢谢。

关于hadoop - Hive的Apache Spark部署问题(集群模式)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36945467/

hadoop - Hive的Apache Spark部署问题(集群模式)

上一篇：hadoop - Apache Kylin- “Timeout visiting cube”和其他怪异现象

下一篇：docker - Docker容器DNS解析