kubernetes - Argo Workflow + Spark operator + 未生成 App 日志

标签 kubernetes argo-workflows spark-operator

我正处于使用 Spark 运算符探索 Argo 以在我的 EC2 实例上的 minikube 设置上运行 Spark 示例的早期阶段。

以下是资源详情,不知道为什么看不到 spark 应用程序日志。

WORKFLOW.YAML

kind: Workflow
metadata:
  name: spark-argo-groupby
spec:
  entrypoint: sparkling-operator
  templates:
  - name: spark-groupby
    resource:
      action: create
      manifest: |
        apiVersion: "sparkoperator.k8s.io/v1beta2"
        kind: SparkApplication
        metadata:
          generateName: spark-argo-groupby
        spec:
          type: Scala
          mode: cluster
          image: gcr.io/spark-operator/spark:v3.0.3
          imagePullPolicy: Always
          mainClass: org.apache.spark.examples.GroupByTest
          mainApplicationFile:  local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar
          sparkVersion: "3.0.3"
          driver:
            cores: 1
            coreLimit: "1200m"
            memory: "512m"
            labels:
              version: 3.0.0
          executor:
            cores: 1
            instances: 1
            memory: "512m"
            labels:
              version: 3.0.0
  - name: sparkling-operator
    dag:
      tasks:
      - name: SparkGroupBY
        template: spark-groupby

角色

# Role for spark-on-k8s-operator to create resources on cluster
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-cluster-cr
  labels:
    rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
  - apiGroups:
      - sparkoperator.k8s.io
    resources:
      - sparkapplications
    verbs:
      - '*'
---
# Allow airflow-worker service account access for spark-on-k8s
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: argo-spark-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: spark-cluster-cr
subjects:
  - kind: ServiceAccount
    name: default
    namespace: argo

ARGO 用户界面

Workflow status

Workflow logs

为了深入挖掘,我尝试了 https://dev.to/crenshaw_dev/how-to-debug-an-argo-workflow-31ng 上列出的所有步骤但无法获取应用日志。

基本上,当我运行这些示例时,我希望打印 spark 应用程序日志 - 在本例中为以下 Scala 示例的输出

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala

有趣的是,当我列出 PODS 时,我期待看到驱动程序 pods 和执行程序 pods,但总是只看到一个 POD,并且它有上面的日志,如附图所示。 请帮我理解为什么没有生成日志,我该如何获取?

RAW LOGS
$ kubectl logs spark-pi-dag-739246604 -n argo

time="2021-12-10T13:28:09.560Z" level=info msg="Starting Workflow Executor" version="{v3.0.3 2021-05-11T21:14:20Z 02071057c082cf295ab8da68f1b2027ff8762b5a v3.0.3 clean go1.15.7 gc linux/amd64}"
time="2021-12-10T13:28:09.581Z" level=info msg="Creating a docker executor"
time="2021-12-10T13:28:09.581Z" level=info msg="Executor (version: v3.0.3, build_date: 2021-05-11T21:14:20Z) initialized (pod: argo/spark-pi-dag-739246604) with template:\n{\"name\":\"sparkpi\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: \\\"sparkoperator.k8s.io/v1beta2\\\"\\nkind: SparkApplication\\nmetadata:\\n  generateName: spark-pi-dag\\nspec:\\n  type: Scala\\n  mode: cluster\\n  image: gjeevanm/spark:v3.1.1\\n  imagePullPolicy: Always\\n  mainClass: org.apache.spark.examples.SparkPi\\n  mainApplicationFile: local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar\\n  sparkVersion: 3.1.1\\n  driver:\\n    cores: 1\\n    coreLimit: \\\"1200m\\\"\\n    memory: \\\"512m\\\"\\n    labels:\\n      version: 3.0.0\\n  executor:\\n    cores: 1\\n    instances: 1\\n    memory: \\\"512m\\\"\\n    labels:\\n      version: 3.0.0\\n\"},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"my-bucket\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"secretkey\"},\"key\":\"spark-pi-dag/spark-pi-dag-739246604\"}}}"
time="2021-12-10T13:28:09.581Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2021-12-10T13:28:09.581Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2021-12-10T13:28:10.348Z" level=info msg=argo/SparkApplication.sparkoperator.k8s.io/spark-pi-daghhl6s
time="2021-12-10T13:28:10.348Z" level=info msg="Starting SIGUSR2 signal monitor"
time="2021-12-10T13:28:10.348Z" level=info msg="No output parameters"

最佳答案

正如 Michael 在他的回答中提到的,Argo Workflows 不知道其他 CRD(例如您使用的 SparkApplication)是如何工作的,因此无法从该特定 CRD 创建的 pod 中提取日志。

但是,您可以添加标签 workflows.argoproj.io/workflow: {{workflow.name}}SparkApplication 生成的 pod让 Argo Workflows 知道然后使用 argo logs -c <container-name>从这些 pod 中提取日志。

您可以在此处找到一个示例,但 Kubeflow CRD 但在您的情况下,您需要将标签添加到 executordriver给你的SparkApplication资源模板中的 CRD:https://github.com/argoproj/argo-workflows/blob/master/examples/k8s-resource-log-selector.yaml

关于kubernetes - Argo Workflow + Spark operator + 未生成 App 日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70306051/

相关文章:

amazon-web-services - 安装 3 个 Kubernetes Master 和 3 个节点

kubernetes - 使用 Argo Workflows 创建队列系统

kubernetes - 如何将脚本模板的结果作为输入参数传递给 argo 工作流中 dag 中的另一个任务

continuous-integration - ArgoCD同步失败时发送邮件通知

java - 使用Java api调用Kubernetes Spark Operator

kubernetes - 无法访问 RPI 上运行的 K8 集群中的 NGINX nodePort 服务

Kubernetes Pod 跨节点通信,它是如何工作的?

docker - Kubernetes:将Pod ID和部署ID发送给Docker