distributed-computing - 网络断开后，筏跟随者如何重新加入？

标签 distributed-computing distributed-system raft

我在木筏上有问题。

在论文“寻找可理解的共识算法(扩展版)”中，它说:

To begin an election, a follower increments its current term and transitions to candidate state. (in section 5.2)

它还说:

reciever should be "Reply false if args.term < currentTerm" in AppendEntries RPC and RequestVot RPC

所以，让我们想想这个场景，raft系统有5台机器，现在机器0是leader，机器1到4是跟随者，现在是term 1。突然，机器1断网，然后机器1超时，然后它开始领导选举，它发送RequestVot RPC，肯定会失败(网络断开)。然后它将开始新的领导者选举......等等。机器1的任期是多次增加。可能增加到 10。当机器 1'Term 增加到 10 时，它连接网络。并且领导者(机器0)向机器1发送心跳，机器1将拒绝心跳(机器0'术语小于机器1)，现在机器1将无法重新加入系统。

最佳答案

这里要记住的重要一点是，当一个节点收到一个更大的术语时，它总是更新它的本地术语。因此，由于机器 1 将拒绝领导者的请求，领导者最终将了解更高的任期 (10) 并下台，然后将选出任期 >10 的新节点。

Obviously this is inefficient, but it's why most real world implementations use the so called "pre-vote"protocol, checking to ensure a node can win an election before it transitions to the candidate role and increments the term.

关于distributed-computing - 网络断开后，筏跟随者如何重新加入？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47568168/

上一篇：elixir - Repo.update_all 之后 Ecto 中的 Updated_at

下一篇：reactjs - 为什么 componentWillUnmount 在下一个组件 componentWillMount 之后触发？

相关文章：

design-patterns - 对消息总线/命令调度程序模式的困惑

networking - Raft 如何处理长时间的网络分区？

go - goraft中所有节点的状态

julia - 使用 ./startup.jl 文件设置 nprocs()

scala - 使用Scala在Apache Spark中连接不同RDD的数据集

multithreading - ZeroMQ:如何处理 ZeroMQ 节点中与消息无关的异步事件？

session - 如何避免给定分布式架构的单点故障

c++ - OpenCV - 如何在我的 LAN 中的单独主机中处理视频的每一帧？

algorithm - KMV算法中多个不同大小的K最小值集的并集

algorithm - 如果有多个领导者，Raft 算法如何保证共识？