docker - 使用 Kubernetes 的齐柏林飞艇。非本地模式的解释器设置中未指定 SPARK_HOME

我在 Kubernetes 集群(Minikube)中有一个 Spark 集群(Master + 2 Workers)。

我想在我的 k8s 集群中添加 Zeppelin 并将其配置为使用我的 Spark 集群。

所以我尝试使用 Zeppelin 0.8.1 image from apache/zeppelin , 或 another image built on Zeppelin 0.9.0-SNAPSHOT (still in develop)

我关注了官方Zeppelin documentation (至少需要 Zeppelin 0.9.0，即使它还没有发布 ¯\_(ツ)_/¯ )

我做了什么 :

拉取 Zeppelin docker 镜像

构建 Spark docker 镜像

从文档中下载 zeppelin-server.yaml

编辑它以便他有正确的路径到我的本地 Spark 图像和 Zeppelin 图像

kubectl apply -f(spark 和 zeppelin yaml 文件)

然后我浏览我的 Zeppelin Notebook，并尝试运行一个小型 Spark 测试以查看它是否有效，但我收到以下错误:

java.lang.RuntimeException: SPARK_HOME is not specified in interpreter-setting for non-local mode, if you specify it in zeppelin-env.sh, please move that into interpreter setting 
    at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.setupPropertiesForSparkR(SparkInterpreterLauncher.java:181) 
    at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:63) 
    at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launch(StandardInterpreterLauncher.java:86) 
    at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:698) 
    at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:63) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:110) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:163) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:131) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:290) 
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:402) 
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:75) 
    at org.apache.zeppelin.scheduler.Job.run(Job.java:172) 
    at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) 
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:187) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
    at java.lang.Thread.run(Thread.java:748)

首先，我看到错误来自函数 setupPropertiesForSparkR() ，即使我不使用 Spark R。

但我迷失的主要事情是，由于我使用 Zeppelin 和 Spark docker 图像，我不知道如何设置我的 SPARK_HOME 以及它应该具有什么值。

注意事项:

我使用 Spark 2.4.0

我也尝试手动构建 Zeppelin 镜像，但使用正在开发的源，构建失败)

最佳答案

您可以使用以下方法配置环境变量:

docker run --env SPARK_HOME=/path ...

您也可以使用 Spark 集群创建一个卷

docker run --env SPARK_HOME=/pathInCluster -v /pathYourSparkCluster:/pathInCluster ...

关于docker - 使用 Kubernetes 的齐柏林飞艇。非本地模式的解释器设置中未指定 SPARK_HOME，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55647378/

docker - 使用 Kubernetes 的齐柏林飞艇。非本地模式的解释器设置中未指定 SPARK_HOME

上一篇：kubernetes - 激活gcloud配置时自动获取群集凭据

下一篇：azure - 无法从 Azure 容器注册表提取镜像