apache-spark - Kubernetes 上的 SparkPi - 找不到或加载主类?

标签 apache-spark kubernetes spark-submit

我正在尝试在 kubernetes 集群上启动一个标准示例 SparkPi。
Spark-submitt 创建 pod 并失败并出现错误 - “错误:无法找到或加载主类 org.apache.spark.examples.SparkPi”。

Spark 提交

spark-submit \
--master k8s://https://k8s-cluster:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=ca-app \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar

Kubernetes 在 pod 中创建了 2 个容器。 spark-init 在其中写入,示例 jar 被复制。
2018-07-22 15:13:35 INFO  SparkPodInitContainer:54 - Downloading remote jars: Some(https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar,https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar)
2018-07-22 15:13:35 INFO  SparkPodInitContainer:54 - Downloading remote files: None
2018-07-22 15:13:37 INFO  Utils:54 - Fetching https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp6219129583337519707.tmp
2018-07-22 15:13:37 INFO  Utils:54 - Fetching https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp8698641635325948552.tmp
2018-07-22 15:13:37 INFO  SparkPodInitContainer:54 - Finished downloading application dependencies.

和 spark-kubernetes-driver,向我抛出了错误。
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n /var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar'
+ '[' -n /var/spark-data/spark-files ']'
+ cp -R /var/spark-data/spark-files/. .
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.app.id=spark-e032bc91fc884e568b777f404bfbdeae -Dspark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs -Dspark.kubernetes.namespace=ca-app -Dspark.jars=https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar,https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar -Dspark.driver.host=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-driver-svc.ca-app.svc -Dspark.master=k8s://https://k8s-cluster:6443 -Dspark.kubernetes.initContainer.configMapName=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-init-config -Dspark.kubernetes.authenticate.driver.serviceAccountName=default -Dspark.driver.port=7078 -Dspark.kubernetes.driver.pod.name=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-driver -Dspark.app.name=spark-pi -Dspark.kubernetes.executor.podNamePrefix=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0 -Dspark.driver.blockManager.port=7079 -Dspark.submit.deployMode=cluster -Dspark.executor.instances=5 -Dspark.kubernetes.initContainer.configMapKey=spark-init.properties -cp ':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=10.233.71.5 org.apache.spark.examples.SparkPi
Error: Could not find or load main class org.apache.spark.examples.SparkPi

我究竟做错了什么?感谢您的提示。

最佳答案

我建议使用 https://github.com/JWebDev/spark/raw/master/spark-examples_2.11-2.3.1.jar自从 /blob/是 Assets 的 HTML View ,而 /raw/将 302 重定向到它的实际存储 URL

关于apache-spark - Kubernetes 上的 SparkPi - 找不到或加载主类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51467082/

相关文章:

apache-spark - Spark 提交管道模型

kubernetes - Airflow 没有名为 'kubernetes' 的模块

kubernetes - 无法将 ClusterRoleBinding 附加到 Kubernetes ServiceAccount

apache-spark - spark-submit 中的 Spark : How to set spark. yarn.executor.memoryOverhead 属性

apache-spark - SPARK数据帧: Remove MAX value in a group

kubernetes - 共享容器中容器的CPU限制

apache-spark - 如何将配置从 spark-submit 传递到 yarn cluster?

scala - 将 Dataframe 中的 spark 模式与类型 T 进行比较

apache-spark - 并发调度多个 spark 作业时出现死锁

apache-spark - pyspark.sql无法实例化HiveMetaStoreClient-从org.apache.commons.dbcp.connectionfactory中找到noclass