apache-spark - 驱动程序停止后,Spark流作业失败

标签 apache-spark apache-kafka yarn spark-streaming

我有一个 Spark 流作业,该作业从Kafka读取数据并对其进行一些操作。我正在一个 yarn 集群Spark 1.4.1上运行该作业,该集群有两个节点,每个节点具有16 GB的RAM和每个内核16个。

我已经将这些conf传递给spark-submit作业:

--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3



作业返回此错误,并在运行一会后完成:
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)

.....

ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver

更新:

也找到了这些日志:
INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.

....

INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.

INFO yarn.ExecutorRunnable: Starting Executor Container.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...

INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)

INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
    at org.apache.hadoop.util.Shell.run(Shell.java:487)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

这可能是什么原因?感谢一些帮助。

谢谢

最佳答案

您能否显示从kafka读取的scala/java代码?我怀疑您可能未正确创建SparkConf。

尝试类似的东西

SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");

还可以尝试在yarn-client模式下运行应用程序并共享输出。

关于apache-spark - 驱动程序停止后,Spark流作业失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35124329/

相关文章:

azure - 如何访问 Microsoft Azure HDInsight 中的 blob 存储?

apache-kafka - 限制主题创建/更改

azure - HDInsight Kafka 群集无法访问主存储 Blob 帐户

hadoop - 使用Apache YARN在Spring Cloud Dataflow上部署失败

hadoop - yarn 界面未显示有关应用程序的信息

scala - 使用partitionBy写入现有目录Dataframe

python - Spark k-means OutOfMemoryError 异常

python - 使用 PySpark Dataframe 的成对列操作(例如点积)

apache-spark - 基于流的应用程序中的受控/手动错误/恢复处理

python - 我在EMR群集主服务器上运行的python作业失败,该如何解决?