文档章节

一致性算法探寻(扩展版)4

戴的天
 戴的天
发布于 2015/08/06 15:09
字数 1681
阅读 51
收藏 1

5.2 Leader election

Raft uses a heartbeat mechanism to trigger leader election. When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats (AppendEntries RPCs that carry no log entries) to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.

To begin an election, a follower increments its current term and transitions to candidate state. It then votes for itself and issues RequestVote RPCs in parallel to each of the other servers in the cluster. A candidate continues in this state until one of three things happens: (a) it wins the election, (b) another server establishes itself as leader, or (c) a period of time goes by with no winner. These outcomes are discussed separately in the paragraphs below.

A candidate wins an election if it receives votes from a majority of the servers in the full cluster for the same term. Each server will vote for at most one candidate in a given term, on a first-come-first-served basis (note: Section 5.4 adds an additional restriction on votes). The majority rule ensures that at most one candidate can win the election for a particular term (the Election Safety Property in Figure 3). Once a candidate wins an election, it becomes leader. It then sends heartbeat messages to all of the other servers to establish its authority and prevent new elections.

While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be leader. If the leader's term (included in its RPC) is at least as large as the candidate's current term, then the candidate recognizes the leader as legitimate and returns to follower state. If the term in the RPC is smaller than the candidate's current term, then the candidate rejects the RPC and continues in candidate state.

The third possible outcome is that a candidate neither wins nor loses the election: if many followers become candidates at the same time, votes could be split so that no candidate obtains a majority. When this happens, each candidate will time out and start a new election by incrementing its term and initiating another round of RequestVote RPCs. However, without extra measures split votes could repeat indefinitely.

Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly. To prevent split votes in the first place, election timeouts are chosen randomly from a fixed interval (e.g., 150–300ms).This spreads out the servers so that in most cases only a single server will time out; it wins the election and sends heartbeats before any other servers time out. The same mechanism is used to handle split votes. Each candidate restarts its randomized election timeout at the start of an election, and it waits for that timeout to elapse before starting the next election; this reduces the likelihood of another split vote in the new election. Section 9.3 shows that this approach elects a leader rapidly.

Elections are an example of how understandability guided our choice between design alternatives. Initially we planned to use a ranking system: each candidate was assigned a unique rank, which was used to select between competing candidates. If a candidate discovered another candidate with higher rank, it would return to follower state so that the higher ranking candidate could more easily win the next election. We found that this approach created subtle issues around availability (a lower-ranked server might need to time out and become a candidate again if a higher-ranked server fails, but if it does so too soon, it can reset progress towards electing a leader). We made adjustments to the algorithm several times, but after each adjustment new corner cases appeared. Eventually we concluded that the randomized retry approach is more obvious and understandable.

5.2 leader选举

Raft使用心跳机制来触发leader选举。当服务器启动时,他们将处于follower状态。服务器一直处于follower状态,直到收到leader或是candidate的有效RPC。leader定期发送心跳(没有日志条目的AppendEntries RPC)给follower以保持权限。如果一个follower一段时间内没有得到通信叫做选举超时,然后它假设没有有效的leader,发起一个新的选举来选出一个leader。

要开始选举时,follower自增自己的当前term,并将状态切换至candidate。然后它将投票给自己,并以并行的方式发送RequestVote RPC给集群中的其他服务器。candidate一直保持这种状态,直到以下三种情况之一发生:(a)它赢了选举,(b)另一个服务器已经成为了leader,(c)一段时间内都没有产生leader。下面将针对这些结果分开讨论。

一个candidate在相同的term里集群中大多数的服务器都投票给它,赢得选举。在一个给定的term里,每个服务器最多投出一票,按照先到先服务的原则(备注:5.4节增加了投票的额外限制)。大多数原则保证在特定的term,只有一个candidate可以赢得选举(图3中的 Election Safety Property)。一旦candidate选举成功,成为leader。然后它将发送心跳给其他服务器以保持权限和防止新的选举产生。

等待投票期间,candidate可能收到其他也要参选的服务器发来的AppendEntries RPC。如果leader的term(包括其RPC)就相当于candidate的term,那么candidate承认该leader的合法性并转为follower状态。如果RPC的term比candidate的term小,那么candidate将会拒绝RPC并继续保持candidate状态。

第三个可能的结果是一个candidate既没有赢也没有输:假如很多follower同时成为了candidate,选票就会被分割也就没有candidate能赢得大多数选票。发生这种情况时,每个candidate都会超时,增加他们的term并启动新一轮的RequestVote RPC来开始一个新的选举。然而没有其他举措,选票将会无限复制。

Raft用随机选举超时来保证选票分散是小概率事件而且能很快得到解决。为了防止开始就选票分散,选举超时在一个合理的时间间隔内(例如150-300ms)随机选取。这让服务器以便在大多数情况下只有一个服务器赢得选举;它赢得了选举,并在其他服务器超时前发送心跳。相同的处理机制也被用于投票分散的情况。每个candidate在选举开始的时候重启它的选举超时,它的超时将在新一轮选举开始前不停增加;这减少了新一轮选举投票分散的情况的可能性。9.3节展示了这种策略能快速选举一个leader。

选举是我们选择设计方案时如何易懂的一个例子。最初,我们打算使用一个排名系统:每个candidate分配一个唯一的用于选择参选candidate的排名。假如一个candidate发现另一个candidate排名比较高,就返回follower状态以便那个高排名的candidate可以容易的获得下一轮选举的胜利。我们发现这种策略就产生可用性的微妙问题了(假如一个高排位的服务器挂了的时候,低排位的服务器将会超时并重新成为candidate,不过如果这个发生的太快,就会重置leader选举的进度)。我们做了几次算法调整,不过每次调整就会发现新的极端状况出现。最后,我们得出结论,随机重试的方式是比较明显和易懂的。

© 著作权归作者所有

戴的天
粉丝 16
博文 63
码字总数 83564
作品 0
杭州
技术主管
私信 提问
一致性算法探寻(扩展版)10

8 Client interaction This section describes how clients interact with Raft, including how clients find the cluster leader and how Raft supports linearizable semantics [10]. Thes......

戴的天
2015/08/11
82
0
一致性算法探寻(扩展版)9

7 Log compaction Raft’s log grows during normal operation to incorporate more client requests, but in a practical system, it cannot grow without bound. As the log grows longer......

戴的天
2015/08/10
94
0
Raft算法官方论文中文翻译

寻找一种易于理解的一致性算法(扩展版) https://github.com/maemual/raft-zhcn/blob/master/raft-zhcn.md

满小茂
2016/04/14
374
0
一致性hash和solr千万级数据分布式搜索引擎中的应用

一致性hash和solr千万级数据分布式搜索引擎中的应用 互联网创业中大部分人都是草根创业,这个时候没有强劲的服务器,也没有钱去买很昂贵的海量数据库。在这样严峻的条件下,一批又一批的创业...

张升强
2013/10/16
1K
2
《算法之道》精华 经典算法部分

《算法之道》精华 经典算法部分 本书作者邹恒明,作者另有一本书《数据结构之弦》,以及《操作系统之哲学原理》都是很好的书 这本书可以算得上是深入浅出,文笔很好,作者添加了很多自己的思...

modernizr
2014/08/11
546
1

没有更多内容

加载失败,请刷新页面

加载更多

OSChina 周日乱弹 —— 别问,问就是没空

Osc乱弹歌单(2019)请戳(这里) 【今日歌曲】 @tom_tdhzz :#今日歌曲推荐# 分享容祖儿/彭羚的单曲《心淡》: 《心淡》- 容祖儿/彭羚 手机党少年们想听歌,请使劲儿戳(这里) @wqp0010 :周...

小小编辑
今天
243
5
golang微服务框架go-micro 入门笔记2.1 micro工具之micro api

micro api micro 功能非常强大,本文将详细阐述micro api 命令行的功能 重要的事情说3次 本文全部代码https://idea.techidea8.com/open/idea.shtml?id=6 本文全部代码https://idea.techidea8....

非正式解决方案
今天
5
0
Spring Context 你真的懂了吗

今天介绍一下大家常见的一个单词 context 应该怎么去理解,正确的理解它有助于我们学习 spring 以及计算机系统中的其他知识。 1. context 是什么 我们经常在编程中见到 context 这个单词,当...

Java知其所以然
昨天
5
0
Spring Boot + Mybatis-Plus 集成与使用(二)

前言: 本章节介绍MyBatis-Puls的CRUD使用。在开始之前,先简单讲解下上章节关于Spring Boot是如何自动配置MyBatis-Plus。 一、自动配置 当Spring Boot应用从主方法main()启动后,首先加载S...

伴学编程
昨天
8
0
用最通俗的方法讲spring [一] ──── AOP

@[TOC](用最通俗的方法讲spring [一] ──── AOP) 写这个系列的目的(可以跳过不看) 自己写这个系列的目的,是因为自己是个比较笨的人,我曾一度怀疑自己的智商不适合干编程这个行业.因为在我...

小贼贼子
昨天
8
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部