Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Tobias Schottdorf	500af91653	raft: restore ability to bootstrap RawNode We are worried about breaking backwards compatibility for any application out there that may have relied on the old behavior. Their RawNode invocation would have been broken by the removal of the peers argument so it would not have changed silently; an associated comment tells callers how to fix it.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c9491d7861	raft: clean up bootstrap This is the first (maybe not last) step in cleaning up the bootstrap code around StartNode. Initializing a Raft group for the first time is awkward, since a configuration has to be pulled from thin air. The way this is solved today is unclean: The app is supposed to pass peers to StartNode(), we add configuration changes for them to the log, immediately pretend that they are applied, but actually leave them unapplied (to give the app a chance to observe them, though if the app did decide to not apply them things would really go off the rails), and then return control to the app. The app will then process the initial Readys and as a result the configuration will be persisted to disk; restarts of the node then use RestartNode which doesn't take any peers. The code that did this lived awkwardly in two places fairly deep down the callstack, though it was really only necessary in StartNode(). This commit refactors things to make this more obvious: only StartNode does this dance now. In particular, RawNode does not support this at all any more; it expects the app to set up its Storage correctly. Future work may provide helpers to make this "preseeding" of the Storage more user-friendly. It isn't entirely straightforward to do so since the Storage interface doesn't provide the right accessors for this purpose. Briefly speaking, we want to make sure that a non-bootstrapped node can never catch up via the log so that we can implicitly use one of the "skipped" log entries to represent the configuration change into the bootstrap configuration. This is an invasive change that affects all consumers of raft, and it is of lower urgency since the code (post this commit) already encapsulates the complexity sufficiently.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c62b7048b5	raft: use RawNode for node's event loop It has always bugged me that any new feature essentially needed to be tested twice due to the two ways in which apps can use raft (`node` and `RawNode`). Due to upcoming testing work for joint consensus, now is a good time to rectify this somewhat. This commit removes most logic from `(node).run` and uses `RawNode` internally. This simplifies the logic and also lead (via debugging) to some insight on how the semantics of the approaches differ, which is now documented in the comments.	2019-07-19 09:59:59 +02:00
Tobias Schottdorf	6b0322549f	raft: replace StatusWithoutProgress with BasicStatus Now that a Config is also added to the full status, the old name did not convey the intention, which was to get a Status without an associated allocation.	2019-07-18 16:28:37 +02:00
Tobias Schottdorf	b171e1c78b	raft: centralize configuration change application Put all the logic related to applying a configuration change in one place in preparation for adding joint consensus. This inspired various TODOs. I had to rewrite TestSnapshotSucceedViaAppResp since it was relying on a snapshot applied to the leader, which is now prevented.	2019-07-03 21:26:42 +02:00
Tobias Schottdorf	f9c2d00fb3	raft: extract 'tracker' package Mechanically extract `progressTracker`, `Progress`, and `inflights` to their own package named `tracker`. Add lots of comments in the progress, and take the opportunity to rename and clarify various fields.	2019-06-21 22:15:00 +02:00
Gyuho Lee	34bd797e67	*: revert module import paths Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-05-28 15:39:35 -07:00
Tobias Schottdorf	a11563737c	raft: use progress tracker APIs in more places This doesn't completely eliminate access to prs.nodes, but that's not really necessary. This commit uses the existing APIs in a few more places where it's convenient, and also sprinkles some assertions.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	ea82b2b758	raft: move more methods onto the progress tracker Continues what was initiated in the last commit.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	dbac67e7a8	raft: extract progress tracking into own component The Progress maps contain both the active configuration and information about the replication status. By pulling it into its own component, this becomes easier to unit test and also clarifies the code, which will see changes as etcd-io/etcd#7625 is addressed. More functionality will move into `prs` in self-contained follow-up commits.	2019-05-21 16:02:52 +02:00
shivaramr	9150bf52d6	go modules: Fix module path version to include version number	2019-04-26 15:29:50 -07:00
Tobias Schottdorf	bd332b318e	raft: add (*RawNode).WithProgress Calls to Status can be frequent and currently incur three heap allocations, but often the caller has no intention to hold on to the returned status. Add StatusWithoutProgress and WithProgress to allow avoiding heap allocations altogether. StatusWithoutProgress does what's on the tin and additionally returns a value (instead of a pointer) to avoid the associated heap allocation. By not returning a Progress map, it avoids all other allocations that Status incurs. To still introspect the Progress map, add WithProgress, which uses a simple visitor pattern. Add benchmarks to verify that this is indeed allocation free. ``` BenchmarkStatusProgress/members=1/Status-8 5000000 353 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=1/Status-example-8 5000000 372 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=1/StatusWithoutProgress-8 100000000 17.6 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=1/WithProgress-8 30000000 48.6 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=1/WithProgress-example-8 30000000 42.9 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=3/Status-8 5000000 395 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=3/Status-example-8 3000000 449 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=3/StatusWithoutProgress-8 100000000 18.7 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=3/WithProgress-8 20000000 78.1 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=3/WithProgress-example-8 20000000 70.7 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=5/Status-8 3000000 470 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=5/Status-example-8 3000000 544 ns/op 784 B/op 3 allocs/op BenchmarkStatusProgress/members=5/StatusWithoutProgress-8 100000000 19.7 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=5/WithProgress-8 20000000 105 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=5/WithProgress-example-8 20000000 94.0 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=100/Status-8 100000 11903 ns/op 22663 B/op 12 allocs/op BenchmarkStatusProgress/members=100/Status-example-8 100000 13330 ns/op 22669 B/op 12 allocs/op BenchmarkStatusProgress/members=100/StatusWithoutProgress-8 50000000 20.9 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=100/WithProgress-8 1000000 1731 ns/op 0 B/op 0 allocs/op BenchmarkStatusProgress/members=100/WithProgress-example-8 1000000 1571 ns/op 0 B/op 0 allocs/op ```	2018-12-06 19:02:48 +01:00
Nathan VanBenschoten	f89b06dc6d	raft: provide protection against unbounded Raft log growth The suggested pattern for Raft proposals is that they be retried periodically until they succeed. This turns out to be an issue when a leader cannot commit entries because the leader will continue to append re-proposed entries to its log without committing anything. This can result in the uncommitted tail of a leader's log growing without bound until it is able to commit entries. This change add a safeguard to protect against this case where a leader's log can grow without bound during loss of quorum scenarios. It does so by introducing a new, optional ``MaxUncommittedEntriesSize configuration. This config limits the max aggregate size of uncommitted entries that may be appended to a leader's log. Once this limit is exceeded, proposals will begin to return ErrProposalDropped errors. See cockroachdb/cockroach#27772	2018-10-13 23:25:05 -04:00
Tobias Schottdorf	7a8ab37bfd	raft: fix correctness bug in CommittedEntries pagination In #9982, a mechanism to limit the size of `CommittedEntries` was introduced. The way this mechanism worked was that it would load applicable entries (passing the max size hint) and would emit a `HardState` whose commit index was truncated to match the limitation applied to the entries. Unfortunately, this was subtly incorrect when the user-provided `Entries` implementation didn't exactly match what Raft uses internally. Depending on whether a `Node` or a `RawNode` was used, this would either lead to regressing the HardState's commit index or outright forgetting to apply entries, respectively. Asking implementers to precisely match the Raft size limitation semantics was considered but looks like a bad idea as it puts correctness squarely in the hands of downstream users. Instead, this PR removes the truncation of `HardState` when limiting is active and tracks the applied index separately. This removes the old paradigm (that the previous code tried to work around) that the client will always apply all the way to the commit index, which isn't true when commit entries are paginated. See [1] for more on the discovery of this bug (CockroachDB's implementation of `Entries` returns one more entry than Raft's when the size limit hits). [1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448	2018-09-04 14:52:23 +02:00
Gyuho Lee	bb60f8ab1d	raft: change import paths to "go.etcd.io/etcd" Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2018-08-28 17:47:52 -07:00
Vincent Lee	11fa4f0275	raft: raft learners should be returned after applyConfChange	2018-01-11 17:30:17 +08:00
Ben Darnell	8d8f3195e4	raft: Avoid scanning raft log in becomeLeader Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change could exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.	2017-12-30 10:13:36 -05:00
siddontang	c6f2db2e92	raft: support learner	2017-11-11 10:38:21 +08:00
Peter Mattis	37fa6ac45c	raft: add RawNode.TickQuiesced TickQuiesced allows the caller to support "quiesced" Raft groups which do not perform periodic heartbeats and elections. This is useful in a system with thousands of Raft groups where these periodic operations can be overwhelming in an otherwise idle system. It might seem possible to avoid advancing the logical clock at all in such Raft groups, but doing so has an interaction with the CheckQuorum functionality. If a follower is not quiesced while the leader is the follower can call an election that will fail because the leader's lease has not expired (electionElapsed < electionTimeout). The next time the leader sends a heartbeat to this follower the follower will see that the heartbeat is from a previous term and respond with a MsgAppResp. This in turn will cause the leader to step down and become a follower even though there isn't a leader in the group. By allowing the leader's logical clock to advance via TickQuiesced, the leader won't reject the election and there will be a smooth transfer of leadership to the follower.	2016-09-15 21:05:18 -04:00
Dylan.Wen	eeca614cd3	raft: add read index for RawNode	2016-09-14 14:43:46 +08:00
Xiang Li	484f579905	raft: hide Campaign rules on applying all entries	2016-07-25 15:53:39 -07:00
Gyu-Ho Lee	fe884f8209	raft: update LICENSE header	2016-05-12 20:49:15 -07:00
es-chow	ac059eb8cb	raft: transfer leader feature	2016-04-08 16:56:32 +08:00
Ben Darnell	22925a1d2f	raft: Remove redundant `raft.Commit` field. Keeping this field in sync with `raft.raftLog.committed` was error-prone, so instead we synthesize the `HardState` on demand. Fixes #4278.	2016-01-26 15:18:55 -05:00
siddontang	54a45ba2f5	*: fix typo	2016-01-06 16:17:02 +08:00
es-chow	5bc56786dc	raft: add RawNode which is a thread-unsafe node without goroutine and remove MultiNode	2015-11-26 17:14:14 +08:00

26 Commits