Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Tobias Schottdorf	ad49c8fd98	raft: fix bug in unbounded log growth prevention mechanism The previous code was using the proto-generated `Size()` method to track the size of an incoming proposal at the leader. This includes the Index and Term, which were mutated after the call to `Size()` when appending to the log. Additionally, it was not taking into account that an ignored configuration change would ignore the original proposal and append an empty entry instead. As a result, a fully committed Raft group could end up with a non- zero tracked uncommitted Raft log counter that would eventually hit the ceiling and drop all future proposals indiscriminately. It would also immediately imply that proposals exceeding the threshold alone would get refused (as the "first uncommitted proposal" gets special treatment and is always allowed in). Track only the size of the payload actually appended to the Raft log instead. For context, see: https://github.com/cockroachdb/cockroach/issues/31618#issuecomment-431374938	2018-10-22 11:28:39 +02:00
Nathan VanBenschoten	73c20cc1b7	raft: Fix comment on sendHeartbeat	2018-10-14 00:03:43 -04:00
Nathan VanBenschoten	f89b06dc6d	raft: provide protection against unbounded Raft log growth The suggested pattern for Raft proposals is that they be retried periodically until they succeed. This turns out to be an issue when a leader cannot commit entries because the leader will continue to append re-proposed entries to its log without committing anything. This can result in the uncommitted tail of a leader's log growing without bound until it is able to commit entries. This change add a safeguard to protect against this case where a leader's log can grow without bound during loss of quorum scenarios. It does so by introducing a new, optional ``MaxUncommittedEntriesSize configuration. This config limits the max aggregate size of uncommitted entries that may be appended to a leader's log. Once this limit is exceeded, proposals will begin to return ErrProposalDropped errors. See cockroachdb/cockroach#27772	2018-10-13 23:25:05 -04:00
Gyuho Lee	bb60f8ab1d	raft: change import paths to "go.etcd.io/etcd" Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2018-08-28 17:47:52 -07:00
Xiang Li	11dd0b583b	Merge pull request #9982 from bdarnell/pagination raft: Introduce CommittedEntries pagination	2018-08-11 09:12:46 +08:00
Ben Darnell	a9e7c1e11f	raft: Make flow control more aggressive We allow multiple in-flight append messages, but prior to this change the only way we'd ever send them is if there is a steady stream of new proposals. Catching up a follower that is far behind would be unnecessarily slow (this is exacerbated by a quirk of CockroachDB's use of raft which limits our ability to catch up via snapshot in some cases). See cockroachdb/cockroach#27983	2018-08-08 11:10:54 -04:00
Ben Darnell	0a670b7c9b	raft: Introduce CommittedEntries pagination The MaxSizePerMsg setting is now used to limit the size of Ready.CommittedEntries. This prevents out-of-memory errors if the raft log has become very large and commits all at once.	2018-08-07 12:54:34 -04:00
Nathan VanBenschoten	0a415cf0d6	raft: dont allocate slice and sort on every commit	2018-07-25 23:42:16 -04:00
Ben Darnell	20422c5b4d	raft: Really avoid scanning raft log in becomeLeader I meant to do this in #9073, but sent the PR before it was finished. The last log index is known directly; there is no need to fetch any entries here.	2018-06-26 15:29:51 -04:00
Xiang Li	357308bfcd	Merge pull request #9679 from lorneli/lorneli-raft-dev raft: describe the purpose of lockedRand	2018-05-26 22:03:18 -07:00
lorneli	a083282482	raft: describe the purpose of lockedRand Struct lockedRand wraps rand.Rand with mutex lock because it's accessed by multiple raft groups.	2018-05-26 21:59:24 +08:00
Jia Zhan	d14b705355	raft: fix a few comments	2018-04-27 11:25:06 -07:00
Gyuho Lee	8aae8c1c9c	raft: document disruptive rejoining server, add tests Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-06 09:54:29 -08:00
Gyuho Lee	01db389ea8	raft: document why reuse candidate's term for vote response in stepCandidate "stepCandidate" should reuse candidate's own term, not term in Message, because pre-vote is requested with future term. Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-02-21 16:11:01 -08:00
Gyuho Lee	38846c220a	raft: use leader's term when candidate becomes follower `raft.Step` already ensures that when `m.Term > r.Term`, candidate reverts back to follower with its term being reset with `m.Term`, thus it's always true that `m.Term == r.Term` in `stepCandidate`. This just makes `r.becomeFollower` calls consistent. Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-02-21 16:10:52 -08:00
Gyuho Lee	2b7c12fb12	raft: reuse "last index" in "appendEntry" No need to call "lastIndex" again. "append" call already returns "lastIndex". Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-02-05 21:26:45 -08:00
Xiang Li	d54f281b26	Merge pull request #8525 from shuaili87/pre-vote-compatible raft: fix deadlock during PreVote migration process	2018-01-26 16:34:59 -08:00
Ben Darnell	4e0291ff91	raft: Clarify conditions for granting votes and prevotes. This includes one theoretical logic change: A node that knows the leader of the current term will no longer grant votes, even if it has not yet voted in this term. It also adds a `m.Type == MsgPreVote` guard on the `m.Term > r.Term` check, which was previously thought to be incorrect (see #8517) but was actually just unclear. Closes #8517 Closes #8571	2018-01-23 15:05:11 -05:00
Xiang Li	c5532ebbf6	Merge pull request #9067 from absolute8511/optimize-raft-drop raft: let raft step return error when proposal is dropped to allow fail-fast	2018-01-11 19:54:52 -08:00
Vincent Lee	30ced5b2be	raft: let raft step return error when proposal is dropped to allow fail-fast.	2018-01-12 10:16:47 +08:00
Vincent Lee	11fa4f0275	raft: raft learners should be returned after applyConfChange	2018-01-11 17:30:17 +08:00
Xiang Li	ed1ff9e952	Merge pull request #9073 from bdarnell/pending-conf-index raft: Avoid scanning raft log in becomeLeader	2018-01-08 16:37:36 -08:00
Nathan VanBenschoten	e6dc57f708	raft: s/leaner/learner/g	2018-01-03 08:16:50 -05:00
Ben Darnell	8d8f3195e4	raft: Avoid scanning raft log in becomeLeader Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change could exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.	2017-12-30 10:13:36 -05:00
siddontang	c6f2db2e92	raft: support learner	2017-11-11 10:38:21 +08:00
Xiang	9801fd7297	raft: ensure CheckQuorum is enabled when readonlyoption is lease based	2017-09-17 10:46:12 -07:00
gladiator	58b98c6a14	raft: check leader request when becomeFollower	2017-09-15 08:23:18 +08:00
gladiator	8597361f01	raft: fix Pre-Vote migration	2017-09-09 09:12:39 +08:00
gladiator	3740793b42	raft: reset votes when becomePreCandidate	2017-08-01 22:42:09 +08:00
irfan sharif	a92ceeec25	raft: introduce/fix TestNodeWithSmallerTermCanCompleteElection TestNodeWithSmallerTermCanCompleteElection tests the scenario where a node that has been partitioned away (and fallen behind) rejoins the cluster at about the same time the leader node gets partitioned away. Previously the cluster would come to a standstill when run with PreVote enabled. When responding to Msg{Pre,}Vote messages we now include the term from the message, not the local term. To see why consider the case where a single node was previously partitioned away and it's local term is now of date. If we include the local term (recall that for pre-votes we don't update the local term), the (pre-)campaigning node on the other end will proceed to ignore the message (it ignores all out of date messages). The term in the original message and current local term are the same in the case of regular votes, but different for pre-votes. NB: Had to change TestRecvMsgVote to include pb.Message.Term when sending MsgVote messages. The new sanity checks on MsgVoteResp (m.Term != 0) would panic with the old test as raft.Term would be equal to 0 when responding with MsgVoteResp messages.	2017-07-21 02:26:02 -04:00
smetro	e461017ac5	raft: add DisableProposalForwarding option this allows users to disable followers from forwarding proposals to the leader.	2017-06-21 14:58:28 -07:00
Aaron Lehmann	52613b262b	raft: Set the RecentActive flag for newly added nodes I found that enabling the CheckQuorum flag led to spurious leader elections when new nodes joined. It looks like in the time between a new node joining the cluster, and that node first communicating with the leader, the quorum check could fail because the new node looks inactive. To solve this, set the RecentActive flag when nodes are first added. This gives a grace period for the node to communicate before it causes the quorum check to fail. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>	2017-04-27 11:19:29 -07:00
Dylan.Wen	9342647e0c	raft: fix read index request for #7331	2017-02-17 09:45:41 +08:00
Xiang Li	b940e0d514	Merge pull request #7042 from petermattis/pmattis/resume-after-heartbeat-resp raft: resume paused followers on receipt of MsgHeartbeatResp	2016-12-27 21:15:53 -08:00
Peter Mattis	e625400f1d	raft: resume paused followers on receipt of MsgHeartbeatResp Previously, paused followers were resumed upon sending a MsgHearbeat. Fixes #7037	2016-12-20 08:22:09 -05:00
Ben Darnell	f60a5d6025	raft: Export Progress.IsPaused CockroachDB would like to use this method for monitoring.	2016-12-04 13:14:08 +08:00
Vincent Lee	4401d88546	raft: add node should reset the pendingConf state After add node conf proposed twice with the same node id, the pending state is not reset because the addNode returned without setting the pending state at the second time and the pending state will always be true unless other conf changed. During this we can not add any new node because the propose will be ignored since the pending state is true.	2016-11-17 15:50:13 +08:00
Ben Darnell	2f34547d39	raft: Check promotable() in MsgTimeoutNow handling If MsgTimeoutNow arrived after a node was removed, the node could start and win an election, then panic in becomeLeader (see cockroachdb/cockroach#8535)	2016-11-07 20:02:21 +08:00
Gyu-Ho Lee	cb5c92f69b	raft: do not attach term to MsgReadIndex Fix https://github.com/coreos/etcd/issues/6744. MsgReadIndex, as MsgProp, is to be forwarded to leader. So we should treat it as local message.	2016-10-28 22:12:25 -07:00
Ben Darnell	22aa710c1f	raft: Improve comments and formatting for PreVote change	2016-10-24 22:29:33 +09:00
Ben Darnell	cf93a74aa8	raft: Refactor vote handling Move all vote handling from the per-state step functions to the top-level Step(). This wasn't necessary before because MsgVote would cause us to become a follower, but MsgPreVote needs to be handled without changing the node's current state.	2016-10-19 19:35:21 +08:00
Ben Darnell	73cae7abd0	raft: Implement the PreVote RPC described in thesis section 9.6 This prevents disruption when a node that has been partitioned away rejoins the cluster. Fixes #6522	2016-10-19 19:35:20 +08:00
Xiang Li	cfe717e926	Merge pull request #6275 from xiang90/raft_l raft: support safe readonly request	2016-09-13 01:36:04 -05:00
Xiang Li	710b14ce56	raft: support safe readonly request Implement raft readonly request described in raft thesis 6.4 along with the existing clock/lease based approach.	2016-09-12 15:13:52 +08:00
Peter Mattis	4a33aa3917	raft: use a singleton global rand rand.NewSource creates a 4872 byte object. With a small number of raft groups in a process this isn't a problem. With 10k raft groups we'd use 46MB for these random sources. The only usage is in raft.resetRandomizedElectionTimeout which isn't performance critical. Fixes #6347.	2016-09-05 09:03:18 -04:00
Ben Darnell	a7a867c1e6	raft: Allow an election immediately after start with checkQuorum Previously, the checkQuorum flag required an election timeout to expire before a node could cast its first vote. This change permits the node to cast a vote at any time when the leader is not known, including immediately after startup.	2016-08-30 08:28:41 +08:00
Gyu-Ho Lee	f4141f0f51	raft: handle 'MsgTransferLeader' in follower	2016-08-10 16:24:29 -07:00
Xiang Li	5f0c122496	raft: fix getting unapplied log entries	2016-08-08 10:44:02 -07:00
Xiang Li	484f579905	raft: hide Campaign rules on applying all entries	2016-07-25 15:53:39 -07:00
Xiang Li	1c5754f02d	raft: fix readindex	2016-07-19 15:00:58 -07:00

1 2 3 4 5 ...

384 Commits