Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Wine93	165ba72593	raft/log_test: fixed wrong index	2019-08-26 12:37:07 -07:00
Wine93	9c850ccef0	raft: fixed some typos and simplify minor logic	2019-08-26 12:37:02 -07:00
nilsocket	18a077d3d3	raft : Write compact if statements	2019-08-23 00:36:44 -07:00
Tobias Schottdorf	982a8c9bc3	rafttest: print Ready before processing it It was confusing to see the effects of the Ready (i.e. log messages) printed before the Ready itself.	2019-08-16 08:10:17 -07:00
Tobias Schottdorf	b8e3e4e7cb	raft: fix a test file name	2019-08-16 08:10:07 -07:00
Tobias Schottdorf	4090edfb5b	raft: document problem with leader self-removal When a leader removes itself, it will retain its leadership but not accept new proposals, making the range effectively stuck until manual intervention triggers a campaign event. This commit documents the behavior. It does not correct it yet.	2019-08-16 08:09:56 -07:00
Tobias Schottdorf	078caccce5	raft: add a batch of interaction-driven conf change tests Verifiy the behavior in various v1 and v2 conf change operations. This also includes various fixups, notably it adds protection against transitioning in and out of new configs when this is not permissible. There are more threads to pull, but those are left for future commits.	2019-08-16 08:09:44 -07:00
Tobias Schottdorf	d177b7f6b4	raft: proactively probe newly added followers When the leader applied a new configuration that added voters, it would not immediately probe these voters, delaying when they would be caught up. I noticed this while writing an interaction-driven test, which has now been cleaned up and completed.	2019-08-16 08:09:33 -07:00
Tobias Schottdorf	2c1a1d8c32	rafttest: add _breakpoint directive It is a helper case to attach a debugger to when a problem needs to be investigated in a longer test file. In such a case, add the following stanza immediately before the interesting behavior starts: _breakpoint: ---- ok and set a breakpoint on the _breakpoint case.	2019-08-16 08:09:23 -07:00
Tobias Schottdorf	0fc108428e	raft: initialize new Progress at LastIndex, not LastIndex+1 Initializing at LastIndex+1 meant that new peers would not be probed immediately when they appeared in the leader's config, which delays their getting caught up.	2019-08-16 08:09:11 -07:00
Tobias Schottdorf	df489e7a2c	raft/rafttest: fix stabilize handler It was bailing out too early.	2019-08-16 08:08:28 -07:00
Tobias Schottdorf	ac6b604bb8	raft/rafttest: introduce datadriven testing It has often been tedious to test the interactions between multi-member Raft groups, especially when many steps were required to reach a certain scenario. Often, this boilerplate was as boring as it is hard to write and hard to maintain, making it attractive to resort to shortcuts whenever possible, which in turn tended to undercut how meaningful and maintainable the tests ended up being - that is, if the tests were even written, which sometimes they weren't. This change introduces a datadriven framework specifically for testing deterministically the interaction between multiple members of a raft group with the goal of reducing the friction for writing these tests to near zero. In the near term, this will be used to add thorough testing for joint consensus (which is already available today, but wildly undertested), but just converting an existing test into this framework has shown that the concise representation and built-in inspection of log messages highlights unexpected behavior much more readily than the previous unit tests did (the test in question is `snapshot_succeed_via_app_resp`; the reader is invited to compare the old and new version of it). The main building block is `InteractionEnv`, which holds on to the state of the whole system and exposes various relevant methods for manipulating it, including but not limited to adding nodes, delivering and dropping messages, and proposing configuration changes. All of this is extensible so that in the future I hope to use it to explore the phenomena discussed in https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263 which requires injecting appropriate "crash points" in the Ready handling loop. Discussions of the "what if X happened in state Y" can quickly be made concrete by "scripting up an interaction test". Additionally, this framework is intentionally not kept internal to the raft package.. Though this is in its infancy, a goal is that it should be possible for a suite of interaction tests to allow applications to validate that their Storage implementation behaves accordingly, simply by running a raft-provided interaction suite against their Storage.	2019-08-12 08:10:29 -07:00
Tobias Schottdorf	dbe5198c45	raft: fix restoring joint configurations While writing interaction tests for joint configuration changes, I realized that this wasn't working yet - restoring had no notion of the joint configuration and was simply dropping it on the floor. This commit introduces a helper `confchange.Restore` which takes a `ConfState` and initializes a `Tracker` from it. This is then used both in `(*raft).restore` as well as in `newRaft`.	2019-08-09 11:18:40 -07:00
Tobias Schottdorf	39d0f4e53c	confchange: clean up unnecessary block	2019-08-09 11:18:30 -07:00
nilsocket	a8b4213ec0	raft : `newRaft()` does check for validity of `Config`	2019-08-09 11:18:06 -07:00
Tobias Schottdorf	a945379ce4	raft/tracker: visit Progress in stable order This is helpful for upcoming testing work which allows datadriven testing of the interaction of multiple nodes. This testing requires determinism to work correctly.	2019-08-09 08:39:52 -07:00
Tobias Schottdorf	7a50cd7074	raft/auorum: remove unused type	2019-08-09 08:39:44 -07:00
Tobias Schottdorf	9018b3dc4d	raft: let learners vote It turns out that that learners must be allowed to cast votes. This seems counter- intuitive but is necessary in the situation in which a learner has been promoted (i.e. is now a voter) but has not learned about this yet. For example, consider a group in which id=1 is a learner and id=2 and id=3 are voters. A configuration change promoting 1 can be committed on the quorum `{2,3}` without the config change being appended to the learner's log. If the leader (say 2) fails, there are de facto two voters remaining. Only 3 can win an election (due to its log containing all committed entries), but to do so it will need 1 to vote. But 1 considers itself a learner and will continue to do so until 3 has stepped up as leader, replicates the conf change to 1, and 1 applies it. Ultimately, by receiving a request to vote, the learner realizes that the candidate believes it to be a voter, and that it should act accordingly. The candidate's config may be stale, too; but in that case it won't win the election, at least in the absence of the bug discussed in: https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.	2019-08-08 09:10:21 -07:00
Gyuho Lee	4e43a082b2	raft: use mutex in "SetLogger" to avoid race conditions in tests Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-07-29 15:43:19 -07:00
Gyuho Lee	936c506e8d	Merge pull request #10945 from tbg/add-todo raft: leave TODO about leaving StateSnapshot	2019-07-29 13:51:38 -07:00
Tobias Schottdorf	3b02d4c5ff	raft: leave TODO about leaving StateSnapshot The condition is overly strict, which has popped up in CockroachDB recently.	2019-07-26 23:19:34 +02:00
Gyuho Lee	c7c9428f6b	raft: move "RawNode", clarify tick miss Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-07-24 23:35:36 -07:00
Tobias Schottdorf	721127da12	raft: require app to consume result from Ready() I changed `(*RawNode).Ready`'s behavior in #10892 in a problematic way. Previously, `Ready()` would create and immediately "accept" a Ready (i.e. commit the app to actually handling it). In #10892, Ready() became a pure read-only operation and the "accepting" was moved to `Advance(rd)`. As a result it was illegal to use the RawNode in certain ways while the Ready was being handled. Failure to do so would result in dropped messages (and perhaps worse). For example, with the following operations 1. `rd := rawNode.Ready()` 2. `rawNode.Step(someMsg)` 3. `rawNode.Advance(rd)` `someMsg` would be dropped, because `Advance()` would clear out the outgoing messages thinking that they had all been handled by the client. I mistakenly assumed that this restriction had existed prior, but this is incorrect. I noticed this while trying to pick up the above PR in CockroachDB, where it caused unit test failures, precisely due to the above example. This PR reestablishes the previous behavior (result of `Ready()` must be handled by the app) and adds a regression test. While I was there, I carried out a few small clarifying refactors.	2019-07-23 22:45:01 +02:00
Tobias Schottdorf	b9c051e7a7	raftpb: clean up naming in ConfChange	2019-07-23 10:40:03 +02:00
Tobias Schottdorf	b67303c6a2	raft: allow use of joint quorums This change introduces joint quorums by changing the Node and RawNode API to accept pb.ConfChangeV2 (on top of pb.ConfChange). pb.ConfChange continues to work as today: it allows carrying out a single configuration change. A pb.ConfChange proposal gets added to the Raft log as such and is thus also observed by the app during Ready handling, and fed back to ApplyConfChange. ConfChangeV2 allows joint configuration changes but will continue to carry out configuration changes in "one phase" (i.e. without ever entering a joint config) when this is possible.	2019-07-23 10:40:03 +02:00
Tobias Schottdorf	88f5561733	raft: use ConfChangeSingle internally	2019-07-23 10:39:48 +02:00
Tobias Schottdorf	10680744b9	raft: introduce protos for joint quorums	2019-07-23 10:39:48 +02:00
Tobias Schottdorf	caa48bcc3d	raft: remove TestNodeBoundedLogGrowthWithPartition It has a data race between the test's call to `reduceUncommittedSize` and a corresponding call during Ready handling in `(*node).run()`. The corresponding RawNode test still verifies the functionality, so instead of fixing the test we can remove it.	2019-07-19 12:35:14 +02:00
Tobias Schottdorf	500af91653	raft: restore ability to bootstrap RawNode We are worried about breaking backwards compatibility for any application out there that may have relied on the old behavior. Their RawNode invocation would have been broken by the removal of the peers argument so it would not have changed silently; an associated comment tells callers how to fix it.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c9491d7861	raft: clean up bootstrap This is the first (maybe not last) step in cleaning up the bootstrap code around StartNode. Initializing a Raft group for the first time is awkward, since a configuration has to be pulled from thin air. The way this is solved today is unclean: The app is supposed to pass peers to StartNode(), we add configuration changes for them to the log, immediately pretend that they are applied, but actually leave them unapplied (to give the app a chance to observe them, though if the app did decide to not apply them things would really go off the rails), and then return control to the app. The app will then process the initial Readys and as a result the configuration will be persisted to disk; restarts of the node then use RestartNode which doesn't take any peers. The code that did this lived awkwardly in two places fairly deep down the callstack, though it was really only necessary in StartNode(). This commit refactors things to make this more obvious: only StartNode does this dance now. In particular, RawNode does not support this at all any more; it expects the app to set up its Storage correctly. Future work may provide helpers to make this "preseeding" of the Storage more user-friendly. It isn't entirely straightforward to do so since the Storage interface doesn't provide the right accessors for this purpose. Briefly speaking, we want to make sure that a non-bootstrapped node can never catch up via the log so that we can implicitly use one of the "skipped" log entries to represent the configuration change into the bootstrap configuration. This is an invasive change that affects all consumers of raft, and it is of lower urgency since the code (post this commit) already encapsulates the complexity sufficiently.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c62b7048b5	raft: use RawNode for node's event loop It has always bugged me that any new feature essentially needed to be tested twice due to the two ways in which apps can use raft (`node` and `RawNode`). Due to upcoming testing work for joint consensus, now is a good time to rectify this somewhat. This commit removes most logic from `(node).run` and uses `RawNode` internally. This simplifies the logic and also lead (via debugging) to some insight on how the semantics of the approaches differ, which is now documented in the comments.	2019-07-19 09:59:59 +02:00
Jingyi Hu	233be58056	Merge pull request #10839 from needkane/pr raft: update log info and annotation	2019-07-18 23:26:44 -07:00
Tobias Schottdorf	6b0322549f	raft: replace StatusWithoutProgress with BasicStatus Now that a Config is also added to the full status, the old name did not convey the intention, which was to get a Status without an associated allocation.	2019-07-18 16:28:37 +02:00
Tobias Schottdorf	7ce934cbec	raft: return active config in Status This is useful for debug purposes, and more so once we support joint quorums.	2019-07-17 14:29:45 +02:00
Tobias Schottdorf	26a1e60eab	raft: return non-nil Inflights in raft status Recent refactoring to the String() method of `Progress` hit an NPE because we return nil Inflights as part of the Raft status. Just fix this at the source and properly populate the Raft status instead of teaching String() to ignore nil. A real Progress always has a non-nil Inflights.	2019-07-17 12:53:28 +02:00
Tobias Schottdorf	aa158f36b9	raft: internally support joint consensus This commit introduces machinery to safely apply joint consensus configuration changes to Raft. The main contribution is the new package, `confchange`, which offers the primitives `Simple`, `EnterJoint`, and `LeaveJoint`. The first two take a list of configuration changes. `Simple` only declares success if these configuration changes (applied atomically) change the set of voters by at most one (i.e. it's fine to add or remove any number of learners, but change only one voter). `EnterJoint` makes the configuration joint and then applies the changes to it, in preparation of the caller returning later and transitioning out of the joint config into the final desired configuration via `LeaveJoint()`. This commit streamlines the conversion between voters and learners, which is now generally allowed whenever the above conditions are upheld (i.e. it's not possible to demote a voter and add a new voter in the context of a Simple configuration change, but it is possible via EnterJoint). Previously, we had the artificial restriction that a voter could not be demoted to a learner, but had to be removed first. Even though demoting a learner is generally less useful than promoting a learner (the latter is used to catch up future voters), demotions could see use in improved handling of temporary node unavailability, where it is desired to remove voting power from a down node, but to preserve its data should it return. An additional change that was made in this commit is to prevent the use of empty commit quorums, which was previously possible but for no good reason; this: Closes #10884. The work left to do in a future PR is to actually expose joint configurations to the applications using Raft. This will entail mostly API design and the addition of suitable testing, which to be carried out ergonomically is likely to motivate a larger refactor. Touches #7625.	2019-07-16 15:36:04 +02:00
Changkun Ou	856097181b	raft/rafttest: simulate async send in node test In order to cover message can well be received when a node is paused, this commit sends message async using goroutine and random sleep. This change makes recvms is possible to cache message during node.pause is triggered.	2019-07-13 16:22:33 +02:00
Tobias Grieger	b2274efee0	Merge pull request #10864 from tbg/learner-snap raft: allow voter to become learner through snapshot	2019-07-11 15:48:09 +02:00
Tobias Schottdorf	95024fa3cc	raft: optimize string representation of Progress Make it less verbose by omitting the values for the steady state. Also rearrange the order so that information that is typically more relevant is printed first.	2019-07-09 11:22:37 +02:00
Tobias Schottdorf	6f009d211f	raft: allow voter to become learner through snapshot At the time of writing, we don't allow configuration changes to change voters to learners directly, but note that a snapshot may compress multiple changes to the configuration into one: the voter could have been removed, then readded as a learner and the snapshot reflects both changes. In that case, a voter receives a snapshot telling it that it is now a learner. In fact, the node has to accept that snapshot, or it is permanently cut off from the Raft log. I think this just wasn't realized in the original work, but this is just my guess since there generally is very little rationale on the various decisions made. I also generally haven't been able to figure out whether the decision to prevent voters from becoming learners without first having been removed was motivated by some particular concern, or if it just wasn't deemed necessary. I suspect it is the latter because demoting a voter seems perfectly safe. See https://github.com/etcd-io/etcd/pull/8751#issuecomment-342028091.	2019-07-08 09:32:24 +02:00
Tobias Schottdorf	6697adfff8	raft/tracker: pull Voters and Learners into Config struct This is helpful to quickly print the configuration log messages without having to specify Voters and Learners separately. It will also come in handy for joint quorums because it allows holding on to voters and learners as a unit, which is useful for unit testing.	2019-07-03 21:26:42 +02:00
Tobias Schottdorf	b171e1c78b	raft: centralize configuration change application Put all the logic related to applying a configuration change in one place in preparation for adding joint consensus. This inspired various TODOs. I had to rewrite TestSnapshotSucceedViaAppResp since it was relying on a snapshot applied to the leader, which is now prevented.	2019-07-03 21:26:42 +02:00
kane	4f7d83a249	raft: update log info and annotation	2019-07-02 23:43:56 -04:00
Xiang Li	d506962fec	Merge pull request #10848 from spzala/raftthesis10831 raftdoc: fix raft thesis link	2019-06-28 12:43:32 -07:00
Sahdev P. Zala	655ab0ac6a	raftdoc: fix raft thesis link The current link does not work and not valid anymore per stanford support. Replace all current refs with a link that is used by the https://raft.github.io/ Fixes # https://github.com/etcd-io/etcd/issues/10831	2019-06-24 19:01:00 -04:00
Tobias Schottdorf	f9c2d00fb3	raft: extract 'tracker' package Mechanically extract `progressTracker`, `Progress`, and `inflights` to their own package named `tracker`. Add lots of comments in the progress, and take the opportunity to rename and clarify various fields.	2019-06-21 22:15:00 +02:00
Tobias Schottdorf	e262542d6d	quorum: fix vet failure This slipped in during a rename and I didn't see it in CI because of CI flakiness and a general intransparency about which failures are important.	2019-06-20 23:40:08 +02:00
Tobias Schottdorf	e039629907	raft: use half-populated joint quorum To ease a future transition into joint quorums, this commit removes the previous "ad-hoc" majority-based quorum and vote computations with that introduced in the `raft/quorum` package. More specifically, the progressTracker now uses a quorum.JointConfig for which the "second" majority quorum is always empty; in this case the quorum behaves like the one quorum.MajorityConfig that is actually present. Or, more briefly, this change is a no-op, but it will take the busywork out of actually starting to make use of joint quorums in the future. On a side node, I suspect that this might've fixed a bug regarding the read index though I haven't been able to explicitly come up with a counter-example. The problem was that the acks collected for the read index weren't taking into account membership changes, so they'd run the danger of using acks from nodes since removed to claim that a quorum of acks had been received. There's a chance that there isn't a counter-example (the only guarantee extracted from the "quorum" is that there isn't another leader, but even if there's another leader all that matters is that that leader doesn't have a divergent history from the stale leader in the hypothetical counter-example), but either way there is morally a bug here that is now fixed because VoteCommitted doesn't care about votes from members that are not voters known to the currently active configuration.	2019-06-19 14:19:35 +02:00
Tobias Schottdorf	0384c587eb	raft: rename makeP{RS,rogressTracker}	2019-06-19 14:19:35 +02:00
Tobias Schottdorf	3def2364e4	raft: use membership sets in progress tracking Instead of having disjoint mappings of ID to Progress for voters and learners, use a map[id]struct{} for each and share a map of Progress among them. This is easier to handle when joint quorums are introduced, at which point a node may be a voting member of two quorums.	2019-06-19 14:19:35 +02:00

1 2 3 4 5 ...

954 Commits