docker - 故障转移后,哨兵无法将初始主服务器提升回主模式

标签 docker redis docker-swarm redis-sentinel

我在哨兵模式下运行 redis,有 1 个主节点、2 个副本节点和 3 个哨兵节点。我正在 docker swarm 环境中运行所有节点。
所有节点都开始正常。开始时,我们有以下节点 IP

master      10.0.20.2 
replica-1   10.0.20.5
replica-2   10.0.20.10


接下来我停止主容器以关闭主节点,以便哨兵应该选择一个副本节点作为新的主节点。
一切顺利,replica-1 节点被选为新的主节点。

与此同时,docker swarm 为 master 启动新容器,它作为从属节点加入 redis 哨兵网络。

接下来,我将 replica-1 节点关闭以进行另一次故障转移。现在,当哨兵尝试将 master 节点从从节点升级到主节点时,实际问题就会发生。

下面是当 sentinel 尝试使其成为 master 时的 master 节点 redis 配置文件。我想知道当这个节点是新的主节点并且 IP 是同一个节点时,为什么用 replicaof 10.0.20.2 6379 更新文件。
主节点 redis.conf
root@0fd67f6ceb37:/data# tail -f /etc/redis/redis.conf
replica-announce-ip "redis-master"
#replica-announce-port 6379
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes
# Generated by CONFIG REWRITE

replicaof 10.0.20.2 6379


这是错误的配置,所以它有时会失败,并且哨兵选择 replica-2 节点作为新的主节点
这是我在 master 节点记录时看到的错误(下面是详细的日志文件)Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master最后 replica-2 充当 masterreplica-1,master 充当两个从机。

主节点日志 (这是在主节点作为从节点加入并且哨兵尝试将其提升为主模式之后)
[docker@chopswarm1 redis-failover]$ d logs 0fd67f6ceb37
1:C 05 Nov 2019 06:43:49.360 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Nov 2019 06:43:49.360 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 05 Nov 2019 06:43:49.360 # Configuration loaded
1:M 05 Nov 2019 06:43:49.361 * Running mode=standalone, port=6379.
1:M 05 Nov 2019 06:43:49.361 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Nov 2019 06:43:49.361 # Server initialized
1:M 05 Nov 2019 06:43:49.361 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 05 Nov 2019 06:43:49.361 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Nov 2019 06:43:49.361 * DB loaded from disk: 0.000 seconds
1:M 05 Nov 2019 06:43:49.361 * Ready to accept connections
1:S 05 Nov 2019 06:43:59.817 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:43:59.817 * REPLICAOF 10.0.20.5:6379 enabled (user request from 'id=5 addr=10.0.20.7:60534 fd=10 name=sentinel-38a1e461-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=148 qbuf-free=32620 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:43:59.817 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:00.386 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:00.387 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:00.387 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:00.387 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:00.387 * Trying a partial resynchronization (request 0b1ed09c8d497744632c93cab960c4ca4ee9a11e:1).
1:S 05 Nov 2019 06:44:00.388 * Full resync from master: f3c311652d8860c93048eba075521df7033cab2f:38645
1:S 05 Nov 2019 06:44:00.388 * Discarding previously cached master state.
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: receiving 178 bytes from master
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Flushing old data
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Finished with success
1:S 05 Nov 2019 06:44:35.367 # Connection with master lost.
1:S 05 Nov 2019 06:44:35.367 * Caching the disconnected master state.
1:S 05 Nov 2019 06:44:35.464 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:35.465 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:35.465 # Error condition on socket for SYNC: Connection refused
1:S 05 Nov 2019 06:44:36.466 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:36.466 * MASTER <-> REPLICA sync started
1:M 05 Nov 2019 06:44:40.748 # Setting secondary replication ID to f3c311652d8860c93048eba075521df7033cab2f, valid up to offset: 46004. New replication ID is 77213f07383dd307e4b6d917b6a8789de42cad20
1:M 05 Nov 2019 06:44:40.748 * Discarding previously cached master state.
1:M 05 Nov 2019 06:44:40.748 * MASTER MODE enabled (user request from 'id=16 addr=10.0.20.7:60576 fd=17 name=sentinel-38a1e461-cmd age=31 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1:M 05 Nov 2019 06:44:40.748 # CONFIG REWRITE executed with success.
1:M 05 Nov 2019 06:44:41.881 * Replica redis-replica-2:6379 asks for synchronization
1:M 05 Nov 2019 06:44:41.881 * Partial resynchronization request from redis-replica-2:6379 accepted. Sending 881 bytes of backlog starting from offset 46004.
1:S 05 Nov 2019 06:44:43.132 # Connection with replica redis-replica-2:6379 lost.
1:S 05 Nov 2019 06:44:43.132 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:44:43.132 * REPLICAOF 10.0.20.2:6379 enabled (user request from 'id=24 addr=10.0.20.7:60636 fd=15 name=sentinel-38a1e461-cmd age=3 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=291 qbuf-free=32477 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:44:43.133 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:43.484 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:43.484 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:43.484 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:43.484 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:43.484 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:43.484 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:44.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:44.489 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:44.489 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:44.489 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:44.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:44.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:45.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:45.490 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:45.490 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:45.490 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:45.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:45.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:46.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:46.493 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:46.493 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:46.493 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:46.493 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:46.494 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:47.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:47.494 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:47.494 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:47.494 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:47.494 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).

<-- omitted few entries for the same errors as above for better readability -->

1:S 05 Nov 2019 06:45:21.575 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:45:21.575 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:21.575 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:21.575 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:45:21.575 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:21.575 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:45:22.456 * REPLICAOF 10.0.20.10:6379 enabled (user request from 'id=113 addr=10.0.20.7:60950 fd=12 name=sentinel-38a1e461-cmd age=5 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=150 qbuf-free=32618 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:45:22.456 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:45:22.577 * Connecting to MASTER 10.0.20.10:6379
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:22.577 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:22.577 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:45:22.577 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:22.577 * Successful partial resynchronization with master.
1:S 05 Nov 2019 06:45:22.577 # Master replication ID changed to 3235720aad34423d6f82f9db4a953042c1f16d58
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.


哨兵日志文件 (在故障转移开始时添加了额外的换行符)
root@3708cf05eca4:/data# cat sentinel.log
1:X 05 Nov 2019 06:40:49.116 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 05 Nov 2019 06:40:49.116 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 05 Nov 2019 06:40:49.116 # Configuration loaded
1:X 05 Nov 2019 06:40:49.117 * Running mode=sentinel, port=26379.
1:X 05 Nov 2019 06:40:49.117 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 05 Nov 2019 06:40:49.119 # Sentinel ID is 38a1e461910e17fb7be79e695040074df2dde2df
1:X 05 Nov 2019 06:40:49.119 # +monitor master eaas-redis-master 10.0.20.2 6379 quorum 2
1:X 05 Nov 2019 06:40:49.120 * +slave slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:51.183 * +sentinel sentinel 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 10.0.20.33 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.150 * +slave slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.202 * +fix-slave-config slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:01.362 * +sentinel sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:09.249 * +fix-slave-config slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379



1:X 05 Nov 2019 06:43:48.513 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:48.594 # +new-epoch 1
1:X 05 Nov 2019 06:43:48.595 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 1
1:X 05 Nov 2019 06:43:48.613 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 2/2
1:X 05 Nov 2019 06:43:48.613 # Next failover delay: I will not start a failover before Tue Nov  5 06:43:59 2019
1:X 05 Nov 2019 06:43:49.732 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:49.732 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.785 * +slave slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:59.816 * +convert-to-slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:09.832 * +slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379



1:X 05 Nov 2019 06:44:40.453 # +sdown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.524 # +odown master eaas-redis-master 10.0.20.5 6379 #quorum 2/2
1:X 05 Nov 2019 06:44:40.524 # +new-epoch 2
1:X 05 Nov 2019 06:44:40.524 # +try-failover master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.525 # +vote-for-leader 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.525 # 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 voted for 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 2
1:X 05 Nov 2019 06:44:40.528 # 464f3750404b419fccf513784f40baf7f6622cba voted for 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.580 # +elected-leader master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.580 # +failover-state-select-slave master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 # +selected-slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 * +failover-state-send-slaveof-noone slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.748 * +failover-state-wait-promotion slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +promoted-slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +failover-state-reconf-slaves master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.101 * +slave-reconf-sent slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.598 # -odown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-inprog slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-done slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.107 * +slave-reconf-sent slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-inprog slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-done slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.132 * +slave-reconf-sent slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:44.111 * +slave-reconf-inprog slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end-for-timeout master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +switch-master eaas-redis-master 10.0.20.5 6379 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:51.062 # +sdown slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379


1:X 05 Nov 2019 06:45:11.226 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:16.233 # +new-epoch 3
1:X 05 Nov 2019 06:45:16.234 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 3
1:X 05 Nov 2019 06:45:16.535 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 3/2
1:X 05 Nov 2019 06:45:16.535 # Next failover delay: I will not start a failover before Tue Nov  5 06:45:26 2019
1:X 05 Nov 2019 06:45:17.285 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:17.285 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:22.456 * +fix-slave-config slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.347 * +slave slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.348 * +slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
root@3708cf05eca4:/data#

所以我想知道为什么sentinel只为master节点用replicaof重写配置文件(它只发生在master节点,而不是副本节点提升到MASTER模式时)
我如何改进这种情况,以便 master 节点可以在故障转移期间再次以 MASTER 模式运行,如果哨兵在故障转移期间将其选中。

如果需要更多信息,请告诉我。

下面是我启动 docker swarm 堆栈时主节点和副本节点的 redis 配置文件。
redis.conf(主)
dir /data/

replica-announce-ip {{REDIS_MASTER}}

save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes

redis.conf(副本)
replicaof {{REDIS_MASTER}} 6379
dir /data/

replica-announce-ip {{REDIS_REPLICA}}

save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes

最佳答案

docker 服务与 docker 容器 IP 存在一些特定问题:https://github.com/moby/moby/issues/30963

因此,当您设置 REDIS_MASTER_HOST: redis-master (例如)环境时,它指向 docker 服务 IP,而不是实际的 redis-master 容器 IP - 这种行为破坏了哨兵逻辑。

我在 docker-compose 中使用了这个配置(注意 dnsrr endpoint_mode):

redis-master:
 image: bitnami/redis:5.0.9
 environment:
   REDIS_REPLICATION_MODE: master
   ALLOW_EMPTY_PASSWORD: 'yes'
   REDIS_AOF_ENABLED: 'no'
 deploy:
   endpoint_mode: dnsrr

此配置为 redis-master DNS 服务记录和容器本身提供了一个 IP。但在这种情况下,redis-master 容器重启后的 IP 地址会改变,所以在失败后我用命令重新创建哨兵配置:
# on redis-master
redis-cli SLAVEOF <new IP master> 6379
# on all sentinel nodes
redis-cli -p 26379 SENTINEL REMOVE mymaster
redis-cli -p 26379 SENTINEL monitor mymaster <new IP redis-master> 6379 2

在此操作后 redis-master 将正确更改角色。

另外一个选项:

我要求在 docker swarm 环境中为 redis 获取正确 IP 的功能:https://github.com/bitnami/bitnami-docker-redis/issues/174

我创建了 fork :https://github.com/tartemov/bitnami-docker-redis (dns_lookup 函数有一处变化)

在这种情况下,docker-compose 文件将如下所示:
services:
 redis-master:
  image: ndocker-registry/redis:5.0.9
  hostname: "redis-master"
   environment:
     REDIS_REPLICATION_MODE: master
     ALLOW_EMPTY_PASSWORD: 'yes'
     REDIS_AOF_ENABLED: 'no'
 redis-slave-1:
   image: docker-registry/redis:5.0.9
   environment:
     REDIS_REPLICATION_MODE: slave
     REDIS_MASTER_HOST: redis-master
     ALLOW_EMPTY_PASSWORD: 'yes'
     REDIS_AOF_ENABLED: 'no'
sentinel:
  image: bitnami/redis-sentinel:5.0.9-debian-10-r49
  environment:
    REDIS_MASTER_HOST: redis-master
    ALLOW_EMPTY_PASSWORD: 'yes'
    REDIS_AOF_ENABLED: 'no'
    REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: 5000
    REDIS_SENTINEL_FAILOVER_TIMEOUT: 60000
  deploy:
    mode: replicated
    replicas: 3

注意主机名属性 - 有了这个属性和新的 dns_lookup 函数 redis 可以注释 IP 地址

而且您不需要运行任何手动操作 - 服务地址不会改变,哨兵会为任何 redis 节点正确设置角色

关于docker - 故障转移后,哨兵无法将初始主服务器提升回主模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58706332/

相关文章:

javascript - 最快的手指优先应用程序架构 - Redis?

redis - 如何在 Redis 3.2.6 Sentinel 中禁用保护模式?

docker - Minikube、Kubernetes、Docker Compose、Docker Swarm 等的区别

r - docker中如何将文件复制到每个用户的空间

Docker 仅在第一次 vagrant up 时构建和运行

java - 从 Docker 容器访问主机 Java

node.js - 为 localhost 生成 MinIO presignedUrls,而不是 docker 服务名称

json - Redis:何时使用哈希与 RedisJSON?

python - 如何在Docker群中使用ray

docker - 在 swarm 集群中的特定节点上运行 docker 容器