使用 spark-submit 我在 Kubernetes 集群上启动应用程序。而且我只有去http://driver-pod:port才能看到Spark-UI .
如何在集群上启动 Spark-UI History Server?
如何使所有正在运行的 Spark 作业都在 Spark-UI 历史服务器上注册。
这可能吗?
最佳答案
对的,这是可能的。简而言之,您需要确保以下几点:
filesystem
、 s3
、 hdfs
等)。 现在 spark(默认情况下)只读取
filesystem
路径,所以我将详细说明这个案例 spark operator :PVC
具有支持 ReadWriteMany 的卷类型模式。例如 NFS
体积。以下代码段假设您有 NFS
的存储类( nfs-volume
) 已配置: apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spark-pvc
namespace: spark-apps
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
storageClassName: nfs-volume
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-java-pi
namespace: spark-apps
spec:
type: Java
mode: cluster
image: gcr.io/spark-operator/spark:v2.4.4
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar"
imagePullPolicy: Always
sparkVersion: 2.4.4
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
restartPolicy:
type: Never
volumes:
- name: spark-data
persistentVolumeClaim:
claimName: spark-pvc
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 2.4.4
serviceAccount: spark
volumeMounts:
- name: spark-data
mountPath: /mnt
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 2.4.4
volumeMounts:
- name: spark-data
mountPath: /mnt
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: spark-history-server
namespace: spark-apps
spec:
replicas: 1
template:
metadata:
name: spark-history-server
labels:
app: spark-history-server
spec:
containers:
- name: spark-history-server
image: gcr.io/spark-operator/spark:v2.4.0
resources:
requests:
memory: "512Mi"
cpu: "100m"
command:
- /sbin/tini
- -s
- --
- /opt/spark/bin/spark-class
- -Dspark.history.fs.logDirectory=/data/
- org.apache.spark.deploy.history.HistoryServer
ports:
- name: http
protocol: TCP
containerPort: 18080
readinessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
livenessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: spark-pvc
readOnly: true
随意配置
Ingress
, Service
用于访问 UI
.您也可以使用 Google Cloud Storage、Azrue Blob Storage 或 AWS S3 作为事件日志位置。为此,您需要安装一些额外的
jars
所以我建议看看 Lightbend Spark 历史服务器 image和 charts .
关于apache-spark - Kubernetes 上的 Spark UI 历史服务器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51798927/