kubernetes - 如何在 Kubernetes 中实现自动回滚?

标签 kubernetes rollback

假设我有一个部署。由于某种原因,它在一段时间后没有响应。有没有办法告诉 Kubernetes 在失败时自动回滚到以前的版本?

最佳答案

你提到过:

I've a deployment. For some reason it's not responding after sometime.

在这种情况下,您可以使用 liveness and readiness探测:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

上述探针可能会阻止您部署损坏的版本,但是 liveness 和 readiness 探针无法将您的部署回滚到以前的版本。有一个类似的issue在 Github 上,但我不确定在不久的将来这件事会有什么进展。

如果您真的想自动化回滚过程,下面我将描述一个您可能会觉得有帮助的解决方案。


此解决方案需要在 Pod 中运行 kubectl 命令。 简而言之,您可以使用脚本来持续监控您的 Deployments,当出现错误时您可以运行 kubectl rollout undo deployment DEPLOYMENT_NAME

首先,您需要决定如何查找失败的部署。例如,我将使用以下命令检查执行更新超过 10 秒的部署:
注意:您可以根据需要使用不同的命令。

kubectl rollout status deployment ${deployment} --timeout=10s

要持续监控 default 命名空间中的所有部署,我们可以创建一个 Bash 脚本:

#!/bin/bash

while true; do
    sleep 60
    deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
    echo "====== $(date) ======"
    for deployment in ${deployments}; do
        if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
            echo "Error: ${deployment} - rolling back!"
            kubectl rollout undo deployment ${deployment}
        else
            echo "Ok: ${deployment}"
        fi
    done
done

我们想从 Pod 内部运行这个脚本,所以我将它转换为 ConfigMap,这将允许我们将这个脚本挂载到一个卷中(参见:Using ConfigMaps as files from a Pod):

$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: check-script
data:
  checkScript.sh: |
    #!/bin/bash

    while true; do
        sleep 60
        deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
        echo "====== $(date) ======"
        for deployment in ${deployments}; do
            if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
                echo "Error: ${deployment} - rolling back!"
                kubectl rollout undo deployment ${deployment}
            else
                echo "Ok: ${deployment}"
            fi
        done
    done        

$ kubectl apply -f check-script-configmap.yml
configmap/check-script created

我创建了一个单独的 deployment-checker ServiceAccount 并分配了 edit 角色,我们的 Pod 将在这个 ServiceAccount 下运行:
注意:我创建了一个 Deployment 而不是单个 Pod。

$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: deployment-checker-binding
subjects:
  - kind: ServiceAccount
    name: deployment-checker
    namespace: default
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-checker
  name: deployment-checker
spec:
  selector:
    matchLabels:
      app: deployment-checker
  template:
    metadata:
      labels:
        app: deployment-checker
    spec:
      serviceAccountName: deployment-checker
      volumes:
        - name: check-script
          configMap:
            name: check-script
      containers:
      - image: bitnami/kubectl
        name: test
        command: ["bash", "/mnt/checkScript.sh"]
        volumeMounts:
        - name: check-script
          mountPath: /mnt
      
      

应用上述 list 后,deployment-checker Deployment 已创建并开始监控 default 命名空间中的 Deployment 资源:

$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created

$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker   1/1     1            
pod/deployment-checker-69c8896676-pqg9h   1/1     Running   

最后,我们可以检查一下它是如何工作的。我创建了三个部署(app-1app-2app-3):

$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created

$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created

$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created

然后我将 app-1 的图像更改为不正确的图像(nnnginx):

$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated

deployment-checker日志中我们可以看到app-1已经回滚到之前的版本:

$ kubectl logs -f  deployment-checker-69c8896676-pqg9h
...
====== Thu Oct  7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct  7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3

关于kubernetes - 如何在 Kubernetes 中实现自动回滚?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69468977/

相关文章:

java - 使用 Spring 微服务从 kubernetes 集群中的服务中提取配置映射

kubernetes - 不使用 gcloud 工具将 kubectl 的本地实例连接到 GKE 集群?

Python Flask 奇怪的日志记录行为(kubernetes)

java - 在 Spring Boot 应用程序中超时后如何以与 weblogic 相同的方式回滚事务

rollback - Datagrip 出错时自动回滚

kubernetes - 如何使用 kubectl 检查 RBAC 是否启用

docker - 无法登录 kubernetes 集群中 docker 容器中的 iSCSI 启动器

deployment - 单击一次应用程序回滚

mysql - 为什么我的存储过程内的事务不打印我想要的错误消息?

java - 管理我的托管 bean 中的事务回滚