kubernetes - 为什么K8S中的Redis老是重启?

标签 kubernetes redis

Redis pod 疯狂重启。 我怎样才能找出这种行为的原因?

我想通了,应该升级资源配额,但我不知道什么是最好的 cpu/ram 比率。为什么没有崩溃事件或日志?

这是 pod :

> kubectl get pods
    redis-master-5d9cfb54f8-8pbgq                     1/1     Running     33         3d16h

这是日志:

> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94

这是同一 pod 的先前日志。

> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...

这是 pod 的描述。如您所见,限制是 100Mi,但我看不到阈值,在阈值之后 pod 将重新启动。

> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name:           redis-master-5d9cfb54f8-8pbgq
Namespace:      cryptoman
Priority:       0
Node:           gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time:     Fri, 04 Sep 2020 18:52:17 +0300
Labels:         app=redis
                pod-template-hash=5d9cfb54f8
                role=master
                tier=backend
Annotations:    <none>
Status:         Running
IP:             10.36.2.124
IPs:            <none>
Controlled By:  ReplicaSet/redis-master-5d9cfb54f8
Containers:
  master:
    Container ID:   docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
    Image:          k8s.gcr.io/redis:e2e
    Image ID:       docker-pullable://k8s.gcr.io/redis@sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
    Port:           6379/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 09 Sep 2020 10:27:56 +0300
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    0
      Started:      Wed, 09 Sep 2020 07:34:18 +0300
      Finished:     Wed, 09 Sep 2020 10:27:55 +0300
    Ready:          True
    Restart Count:  42
    Limits:
      cpu:     100m
      memory:  250Mi
    Requests:
      cpu:        100m
      memory:     250Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-5tds9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5tds9
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                   From                                                Message
  ----    ------          ----                  ----                                                -------
  Normal  SandboxChanged  52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Pod sandbox changed, it will be killed and re-created.
  Normal  Killing         52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Stopping container master
  Normal  Created         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Created container master
  Normal  Started         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Started container master
  Normal  Pulled          52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Container image "k8s.gcr.io/redis:e2e" already present on machine

最佳答案

这是它重新启动后的限制。 CPU 只是受到限制,内存出现 OOM。

    Limits:
      cpu:     100m
      memory:  250Mi

原因:OOMKilled

  1. 删除请求和限制
  2. 运行 pod,确保它不会重启
  3. 如果你已经有 prometheus,运行 VPA Recommender检查它需要多少资源。或者只使用任何监控堆栈:GKE Prometheus , prometheus-operator , DataDog等检查实际资源消耗并相应地调整限制。

关于kubernetes - 为什么K8S中的Redis老是重启?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63829136/

相关文章:

java - Spring Redis Hash 操作 SCAN

Redis 上的 Spring HttpSession 更改键名

kubernetes - 如何从 cli 中退出 kubectl 集群?

kubernetes - 在一台或两台本地物理 Ubuntu 服务器上使用 helm 和带有 Microk8s 的 Kubernetes 集群

docker - 为什么我用Kubernetes在Ceph上收到 “Structure needs cleaning”消息?

python - 如何在python上获取系统时间(服务器redis)

ruby - resque redis 套接字超时

php redis扩展整数溢出

kubernetes - 无法使用kubectl命令获取代理ID

kubernetes - Kubernetes上的大三角帆:无法在kubernetes中启动大三角帆容器