kubernetes - Kubernetes active 探测失败是自愿的还是非自愿的中断?

标签 kubernetes livenessprobe

我有一个应用程序部署到 Kubernetes,它依赖于外部应用程序。有时这两个之间的连接会进入无效状态,这只能通过重新启动我的应用程序来修复。
为了自动重启,我配置了一个事件探测器来验证连接。
这一直工作得很好,但是,我担心如果外部应用程序出现故障(这样连接错误不仅仅是由于无效的 pod 状态),我的所有 pod 将立即重新启动,我的应用程序将变成完全不可用。我希望它继续运行,以便不依赖于不良服务的功能可以继续。
我想知道 pod 中断预算是否会阻止这种情况,因为它限制了由于“自愿”中断而减少的 pod 数量。但是,K8s 文档没有说明活性探测失败是否是自愿中断。他们是吗?

最佳答案

根据文档,我会说:

Voluntary and involuntary disruptions

Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.

We call these unavoidable cases involuntary disruptions to an application. Examples are:

  • a hardware failure of the physical machine backing the node
  • cluster administrator deletes VM (instance) by mistake
  • cloud provider or hypervisor failure makes VM disappear
  • a kernel panic
  • the node disappears from the cluster due to cluster network partition
  • eviction of a pod due to the node being out-of-resources.

Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.

We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:

  • deleting the deployment or other controller that manages the pod
  • updating a deployment's pod template causing a restart
  • directly deleting a pod (e.g. by accident)

Cluster administrator actions include:

  • Draining a node for repair or upgrade.
  • Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
  • Removing a pod from a node to permit something else to fit on that node.

-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions


所以你的例子完全不同,据我所知,这既不是自愿的也不是非自愿的中断。

还可以查看另一个 Kubernetes 文档:

Pod lifetime

Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.

Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.

-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime


Container probes

The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a livenessProbe):

  • livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.

-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes

When should you use a liveness probe?

If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.

If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.

-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe


根据这些信息,最好创建自定义活性探测器,它应该考虑内部进程健康检查和外部依赖(活性)健康检查。在第一种情况下,您的容器应该停止/终止您的进程,这与具有外部依赖性的第二种情况不同。
回答以下问题:

I'm wondering if a pod disruption budget would prevent this scenario.


在这种特殊情况下,PDB 将无济于事。

我认为提高评论的可见度,我在此事上提供的额外资源可能对其他社区成员有用:
  • Blog.risingstack.com: Designing microservices architecture for failure
  • Loft.sh: Blog: Kubernetes readiness probles examples common pitfalls: External depenedencies
  • Cloud.google.com: Archiecture: Scalable and resilient apps: Resilience designing to withstand failures
  • 关于kubernetes - Kubernetes active 探测失败是自愿的还是非自愿的中断?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67276029/

    相关文章:

    docker - 无法从群集外部访问LoadBalancer

    docker 镜像的 Azure 私有(private)注册表

    kubernetes - 即使端点正在工作,k8s 准备和活跃度探测也会失败

    kubernetes - 如何在 kubernetes 上的探针授权 header 中使用 Secret?

    javascript - 如何使用proxy_pass重写URI nginx反向代理?

    kubernetes - Openshift中的JCS集群

    java - Spring Boot 2.3 Liveness Probe 功能在正常关闭时失败

    kubernetes - 如何为worker pod定义k8s liveness probe和readiness probe

    windows - 在 Kubernetes 上编排 Windows VM

    linux - 如何否定要在 Kubernetes livenessProbe 中使用的退出代码状态?