apache-spark - 如何使用 Supervisord 自动启动 Apache Spark 集群?

标签 apache-spark supervisord

启动 Apache Spark 集群通常是通过代码库提供的 Spark-Submit shell 脚本来完成的。但问题是每次集群关闭再启动时,都需要执行那些shell脚本来启动spark集群。

Supervisord非常适合管理进程,并且似乎是重新启动后自动启动 Spark 进程的良好候选者。

但是,通过启动主进程后

command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080

以及工作进程

command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077

提交 Spark 申请后,我最终遇到以下错误:

15/06/05 17:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 7
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 9
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
15/06/05 17:16:32 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED

有谁知道如何通过supervisord管理spark进程?

我也愿意接受替代解决方案。

最佳答案

spark master 可以通过

在前台运行
command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080

还有 worker

command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077

关于apache-spark - 如何使用 Supervisord 自动启动 Apache Spark 集群?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30672648/

相关文章:

python - 如何在 pyspark DataFrame 上下文中调用 aes_encrypt (和其他 Spark SQL 函数)

django - 自动启动但未手动启动时未找到项目wsgi

python - PySpark:具有多个功能的多列上的 Groupby

java - Apache Spark 中的矩阵乘法

python - Supervisord 进程控制 - 停止单个子进程

laravel - 主管 worker 停止加类,不重启(与laravel一起使用)

docker - docker中Supervisor的使用

linux - bash 脚本在主管启动和重启时的不同行为

python - spark python API java_gateway套接字连接错误

hadoop - Apache Spark history-server 如何引用 Amazon S3?