apache-spark - Spark + Mesos集群模式，jar谁上传？

我正在尝试使用 Mesos 集群模式运行 Spark 应用程序。 (我已经使用客户端模式，但仍想尝试集群模式)

我已经在 Mesos 管理节点上启动了 spark-mesos-dispatcher。

当我使用以下命令在本地路径 /tmp/assembly.jar 提交程序集时，

bin/spark-submit --master mesos://dispatcher:7077 --deploy-mode cluster --class com.example.Example /tmp/assembly.jar

失败是因为文件 /tmp/assembly.jar 在 mesos 从属节点上不存在。

I1129 10:47:43.839771  5884 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9\/deploy","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/tmp\/assembly.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9\/frameworks\/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291\/executors\/driver-20151129104742-0008\/runs\/31bf5840-226e-4b87-ae76-d14bd2f17950","user":"user"}
I1129 10:47:43.840710  5884 fetcher.cpp:369] Fetching URI '/tmp/assembly.jar'
I1129 10:47:43.840721  5884 fetcher.cpp:243] Fetching directly into the sandbox directory
I1129 10:47:43.840731  5884 fetcher.cpp:180] Fetching URI '/tmp/assembly.jar'
I1129 10:47:43.840737  5884 fetcher.cpp:160] Copying resource with command:cp '/tmp/assembly.jar' '/var/lib/mesos/slaves/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9/frameworks/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291/executors/driver-20151129104742-0008/runs/31bf5840-226e-4b87-ae76-d14bd2f17950/assembly.jar'
cp: cannot stat `/tmp/assembly.jar': No such file or directory
Failed to fetch '/tmp/assembly.jar': Failed to copy with command 'cp '/tmp/assembly.jar' '/var/lib/mesos/slaves/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9/frameworks/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291/executors/driver-20151129104742-0008/runs/31bf5840-226e-4b87-ae76-d14bd2f17950/assembly.jar'', exit status: 256
Failed to synchronize with slave (it's probably exited)

如果是 YARN 集群模式，Spark's YARN client implementation will upload the application jar to HDFS so that the driver and all executors have access to the jar , 但我在 RestSubmissionClient 中找不到这样的代码, 用于 Mesos 或 Standalond 集群模式。

在这种情况下谁上传？还是我需要手动将应用程序集放在可通过 HTTP URI 访问的位置？

最佳答案

据我了解，您可以使用 SparkContext addJar() 方法添加本地(到驱动程序应用程序)JAR 文件路径，然后分发到执行器节点(在客户端模式下)。

当您声明要使用集群模式时，我建议您查看 Spark Jobserver项目，这应该比使用内置工具更容易在 Mesos 上运行 Spark 应用程序。

关于apache-spark - Spark + Mesos集群模式，jar谁上传？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33978672/

apache-spark - Spark + Mesos集群模式，jar谁上传？

上一篇：scala - 我将如何在Scala中表达一个链式的任务？

下一篇：language-agnostic - 在 .NET 中重构大型方法