Redis Sentinel 手动故障转移命令超时

Redis Sentinel 手动故障转移命令超时
我有一个 Redis 主机、一个从机和一个 Sentinel 监控它们。一切似乎都在正常工作，包括当主人被杀死时的故障转移。但是当我发出 SENTINEL FAILVER 命令时，Sentinel 会卡在 +failover-state-wait-promotion 状态几分钟。似乎奴隶没有得到提升命令。这没有任何意义，因为从 Sentinel 主机到任一 Redis 主机的网络通信似乎没有任何问题。我在 Docker 容器中运行所有 3 个过程，但我不确定这会如何导致问题。我可以从 Sentinel 主机(即从 Docker 容器内部)运行 redis-cli，并且可以远程执行 slaveof 命令。我还可以监控两个 Redis 实例并查看 SENTINEL ping 和信息请求。我查看了 master 和 slave 的日志，没有发现任何异常。看THIS发布并且 Sentinel 似乎没有任何理由认为 Redis 实例无效。
我对 Sentinel 相当有经验，但对 Docker 却很陌生。不确定如何继续确定问题所在。有任何想法吗？
哨兵日志

[8] 01 Jul 01:36:57.317 # Sentinel runid is c337f6f0dfa1d41357338591cd0181c07cb026d0
[8] 01 Jul 01:38:13.135 # +monitor master redis-holt-overflow 10.19.8.2 6380 quorum 1
[8] 01 Jul 01:38:13.135 # +set master redis-holt-overflow 10.19.8.2 6380 down-after-milliseconds 3100
[8] 01 Jul 01:38:13.199 * +slave slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.288 # Executing user requested FAILOVER of 'redis-holt-overflow'
[8] 01 Jul 01:38:42.288 # +new-epoch 1
[8] 01 Jul 01:38:42.288 # +try-failover master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.352 # +vote-for-leader c337f6f0dfa1d41357338591cd0181c07cb026d0 1
[8] 01 Jul 01:38:42.352 # +elected-leader master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.352 # +failover-state-select-slave master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.404 # +selected-slave slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.404 * +failover-state-send-slaveof-noone slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.488 * +failover-state-wait-promotion slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:41:42.565 # -failover-abort-slave-timeout master redis-holt-overflow 10.19.8.2 6380

Redis 主日志

[17] 01 Jul 01:13:58.251 # Server started, Redis version 2.8.21
[17] 01 Jul 01:13:58.252 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[17] 01 Jul 01:13:58.252 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[17] 01 Jul 01:13:58.252 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[17] 01 Jul 01:13:58.252 * DB loaded from disk: 0.000 seconds
[17] 01 Jul 01:13:58.252 * The server is now ready to accept connections on port 6380
[17] 01 Jul 01:34:45.796 * Slave 10.196.88.30:6381 asks for synchronization
[17] 01 Jul 01:34:45.796 * Full resync requested by slave 10.196.88.30:6381
[17] 01 Jul 01:34:45.796 * Starting BGSAVE for SYNC with target: disk
[17] 01 Jul 01:34:45.797 * Background saving started by pid 20
[20] 01 Jul 01:34:45.798 * DB saved on disk
[20] 01 Jul 01:34:45.799 * RDB: 0 MB of memory used by copy-on-write
[17] 01 Jul 01:34:45.808 * Background saving terminated with success
[17] 01 Jul 01:34:45.808 * Synchronization with slave 10.196.88.30:6381 succeeded
[17] 01 Jul 01:38:42.343 # Connection with slave 10.196.88.30:6381 lost.
[17] 01 Jul 01:38:43.275 * Slave 10.196.88.30:6381 asks for synchronization
[17] 01 Jul 01:38:43.275 * Full resync requested by slave 10.196.88.30:6381
[17] 01 Jul 01:38:43.275 * Starting BGSAVE for SYNC with target: disk
[17] 01 Jul 01:38:43.275 * Background saving started by pid 21
[21] 01 Jul 01:38:43.277 * DB saved on disk
[21] 01 Jul 01:38:43.277 * RDB: 0 MB of memory used by copy-on-write
[17] 01 Jul 01:38:43.368 * Background saving terminated with success
[17] 01 Jul 01:38:43.368 * Synchronization with slave 10.196.88.30:6381 succeeded

Redis 从属日志

[14] 01 Jul 01:15:51.435 # Server started, Redis version 2.8.21
[14] 01 Jul 01:15:51.435 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[14] 01 Jul 01:15:51.435 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[14] 01 Jul 01:15:51.435 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[14] 01 Jul 01:15:51.435 * DB loaded from disk: 0.000 seconds
[14] 01 Jul 01:15:51.435 * The server is now ready to accept connections on port 6381
[14] 01 Jul 01:34:45.088 * SLAVE OF 10.196.88.29:6380 enabled (user request)
[14] 01 Jul 01:34:45.947 * Connecting to MASTER 10.196.88.29:6380
[14] 01 Jul 01:34:45.947 * MASTER <-> SLAVE sync started
[14] 01 Jul 01:34:45.948 * Non blocking connect for SYNC fired the event.
[14] 01 Jul 01:34:45.948 * Master replied to PING, replication can continue...
[14] 01 Jul 01:34:45.948 * Partial resynchronization not possible (no cached master)
[14] 01 Jul 01:34:45.948 * Full resync from master: b912b647401917d52742c0eac3ae2f795f59f48f:1
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: receiving 18 bytes from master
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Flushing old data
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Loading DB in memory
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Finished with success
[14] 01 Jul 01:38:42.495 # Connection with master lost.
[14] 01 Jul 01:38:42.495 * Caching the disconnected master state.
[14] 01 Jul 01:38:42.495 * Discarding previously cached master state.
[14] 01 Jul 01:38:42.495 * MASTER MODE enabled (user request)
[14] 01 Jul 01:38:42.495 # CONFIG REWRITE executed with success.
[14] 01 Jul 01:38:42.506 * SLAVE OF 10.196.88.29:6380 enabled (user request)
[14] 01 Jul 01:38:43.425 * Connecting to MASTER 10.196.88.29:6380
[14] 01 Jul 01:38:43.426 * MASTER <-> SLAVE sync started
[14] 01 Jul 01:38:43.426 * Non blocking connect for SYNC fired the event.
[14] 01 Jul 01:38:43.427 * Master replied to PING, replication can continue...
[14] 01 Jul 01:38:43.427 * Partial resynchronization not possible (no cached master)
[14] 01 Jul 01:38:43.427 * Full resync from master: b912b647401917d52742c0eac3ae2f795f59f48f:10930
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: receiving 18 bytes from master
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Flushing old data
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Loading DB in memory
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Finished with success

哨兵配置

port 26379
pidfile "/var/run/redis-sentinel.pid"
logfile ""
daemonize no

Generated by CONFIG REWRITE

dir "/data"
sentinel monitor redis-holt-overflow 10.19.8.2 6380 1
sentinel down-after-milliseconds redis-holt-overflow 3100
sentinel config-epoch redis-holt-overflow 0
sentinel leader-epoch redis-holt-overflow 1
sentinel known-slave redis-holt-overflow 10.19.8.3 6381
sentinel current-epoch 1

Redis 和哨兵信息:

redis_version:2.8.21 redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:551c16ab9d912477
redis_mode:standalone
os:Linux 3.10.0-123.8.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.7.2
process_id:13
run_id:7e1a1b6c844a969424d16f3efa116707ea7a60bf
tcp_port:6380
uptime_in_seconds:1312
uptime_in_days:0
hz:10
lru_clock:9642428
config_file:/usr/local/etc/redis/redis.conf

最佳答案

看来您遇到了“docker network”问题。如果您查看日志，它们会显示不同的 IP。这是由于在发现期间检测到连接的 IP。这些在不同的 docker 主机上吗？

从文档中:

Since Sentinels auto detect slaves using masters INFO output information, the detected slaves will not be reachable, and Sentinel will never be able to failover the master, since there are no good slaves from the point of view of the system, so there is currently no way to monitor with Sentinel a set of master and slave instances deployed with Docker, unless you instruct Docker to map the port 1:1.

对于哨兵，可以在 https://registry.hub.docker.com/u/joshula/redis-sentinel/ 找到 docker 图像。它显示了使用announce-ip 和bind 来设置它。

更多详情，请参阅 http://redis.io/topics/sentinel特别是 Docker 部分，其中详细介绍了如何在 Docker 中进行设置以处理这种情况。

关于Redis Sentinel 手动故障转移命令超时，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31151815/

Redis Sentinel 手动故障转移命令超时

Generated by CONFIG REWRITE

上一篇：docker - docker net=host 如何处理端口冲突

下一篇：docker - 如何避免docker0接口(interface)启动？