Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
wpedrak	758ff0163c	raft: postpone MsgReadIndex until first commit in the term Fixes #12680	2021-03-23 12:28:42 +01:00
Piotr Tabor	87258efd90	Integration tests: Use zaptest.Logger based testing.TB Thanks to this the logs: - are automatically printed if the test fails. - are in pretty consistent format. - are annotated by 'member' information of the cluster emitting them. Side changes: - Set propert default got DefaultWarningApplyDuration (used to be '0') - Name the members based on their 'place' on the list (as opposed to 'random')	2021-03-09 18:19:51 +01:00
Tobias Grieger	73c50b869a	Merge pull request #12637 from BusyJay/check-outgoingvoters-when-restoring raft: check `VotersOutgoing` for snapshot	2021-02-16 09:43:08 +01:00
Tobias Grieger	c1e8d3a63f	Clarify documentation of probing - Add a large detailed comment about the use and necessity of both the follower and leader probing optimization - fix the log message in stepLeader that previously mixed up the log term for the rejection and the index of the append - improve the test via subtests - add some verbiage in findConflictByTerm around first index	2021-02-15 09:47:18 +01:00
qupeng	6828517965	raft: implement fast log rejection Signed-off-by: qupeng <qupeng@pingcap.com>	2021-02-10 15:48:32 -05:00
Jay Lee	f947c815d0	raft: check `VotersOutgoing` for snapshot Close #12631. Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2021-01-21 16:09:37 +08:00
Piotr Tabor	5472b3336b	Merge pull request #12525 from sakateka/remove_raft.peers raft tests: Remove Config.peers and Config.learners	2021-01-19 16:00:54 +01:00
Sergey Kacheev	ccfd00f687	raft: specify voters and learners via snapshot	2021-01-16 13:03:47 +07:00
Piotr Tabor	bf6f173d5e	Document Raft.send method. The change makes it explicit that sending messages does not happen immidietely and is subject to proper persist & then send protocol on the application side. See: https://github.com/etcd-io/etcd/issues/12589#issuecomment-752867024 for more context.	2021-01-15 12:35:58 +01:00
Piotr Tabor	e62417297d	: Rename of imports of raft (as its now a module) % find -name '.go' -o -name '.md' -o -name '.sh' \| xargs sed -i --follow-symlinks 's\|etcd/v3/raft\|etcd/raft/v3\|g'	2020-10-16 13:58:18 +02:00
Jay	26b89fd418	raft: don't campaign with pending snapshot (#12163 ) Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-26 00:04:46 -07:00
Jay	d0e4fe56a5	raft: check pending conf change before campaign (#12134 ) * raft: check conf change before campaign Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * raft: extract hup function Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * raft: check pending conf change for transferleader Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-22 17:04:48 -07:00
Jay	cc656718fa	raft: correct pendingConfIndex check for AutoLeave (#12137 ) Close #12136 Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-20 16:49:22 -07:00
Zhihong Yu	7cc2f8a411	raft: break out of nested loop when id is found (#11870 ) Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-05-12 16:59:22 -07:00
Brandon Philips	96cce208c2	go.mod: use go.etcd.io/etcd/v3 versioning This change makes the etcd package compatible with the existing Go ecosystem for module versioning. Used this tool to update package imports: https://github.com/KSubedi/gomove	2020-04-28 00:57:35 +00:00
Fullstop000	7eae024ead	raft: only redirect msg produced by own node (#11466 ) Signed-off-by: Fullstop000 <fullstop1005@gmail.com>	2020-04-06 20:27:46 -07:00
qupeng	6f850a65a1	raft: cleanup read index code (#11528 ) Signed-off-by: qupeng <qupeng@pingcap.com>	2020-03-03 09:20:25 -08:00
Tobias Schottdorf	0544f33248	raft: clarify ApplyConfChange contract for rejected conf changes Apps typically maintain the raft configuration as part of the state machine. As a result, they want to be able to reject configuration change entries at apply time based on the state on which the entry is supposed to be applied. When this happens, the app should not call ApplyConfChange, but the comments did not make this clear. As a result, it was tempting to pass an empty pb.ConfChange or it's V2 version instead of not calling ApplyConfChange. However, an empty V1 or V2 proto aren't noops when the configuration is joint: an empty V1 change is treated internally as a single configuration change for NodeID zero and will cause a panic when applied in a joint state. An empty V2 proto is treated as a signal to leave a joint state, which means that the app's config and raft's would diverge. The comments updated in this commit now ask users to not call ApplyConfState when they reject a conf change. Apps that never use joint consensus can keep their old behavior since the distinction only matters when in a joint state, but we don't want to encourage that.	2020-02-25 12:45:45 +01:00
Tobias Schottdorf	37c7e4d1d8	raft: fix auto-transitioning out of joint config The code doing so was undertested and buggy: it would launch multiple attempts to transition out when the conf change was not the last element in the log. This commit fixes the problem and adds a regression test. It also reworks the code to handle a former untested edge case, in which the auto-transition append is refused. This can't happen any more with the current version of the code because this proposal has size zero and is special cased in increaseUncommittedSize. Last but not least, the auto-leave proposal now also bumps pendingConfIndex, which was not done previously due to an oversight.	2020-02-25 12:35:51 +01:00
qupeng	eaa0612e02	raft: abort leader transferring if the target is demoted (#11417 ) Signed-off-by: qupeng <qupeng@pingcap.com>	2019-12-20 12:07:52 +08:00
Wine93	5f42161750	raft: fixed some typos and simplify minor logic	2019-08-25 04:46:29 +00:00
Tobias Schottdorf	306e75a96f	raft: add a batch of interaction-driven conf change tests Verifiy the behavior in various v1 and v2 conf change operations. This also includes various fixups, notably it adds protection against transitioning in and out of new configs when this is not permissible. There are more threads to pull, but those are left for future commits.	2019-08-16 09:38:44 +02:00
Tobias Schottdorf	4e19150676	raft: proactively probe newly added followers When the leader applied a new configuration that added voters, it would not immediately probe these voters, delaying when they would be caught up. I noticed this while writing an interaction-driven test, which has now been cleaned up and completed.	2019-08-14 20:53:34 +02:00
Tobias Grieger	029401ab81	Merge pull request #11005 from tbg/interactiontest raft/rafttest: introduce datadriven testing	2019-08-12 11:52:52 +02:00
Tobias Schottdorf	e8090e57a2	raft/rafttest: introduce datadriven testing It has often been tedious to test the interactions between multi-member Raft groups, especially when many steps were required to reach a certain scenario. Often, this boilerplate was as boring as it is hard to write and hard to maintain, making it attractive to resort to shortcuts whenever possible, which in turn tended to undercut how meaningful and maintainable the tests ended up being - that is, if the tests were even written, which sometimes they weren't. This change introduces a datadriven framework specifically for testing deterministically the interaction between multiple members of a raft group with the goal of reducing the friction for writing these tests to near zero. In the near term, this will be used to add thorough testing for joint consensus (which is already available today, but wildly undertested), but just converting an existing test into this framework has shown that the concise representation and built-in inspection of log messages highlights unexpected behavior much more readily than the previous unit tests did (the test in question is `snapshot_succeed_via_app_resp`; the reader is invited to compare the old and new version of it). The main building block is `InteractionEnv`, which holds on to the state of the whole system and exposes various relevant methods for manipulating it, including but not limited to adding nodes, delivering and dropping messages, and proposing configuration changes. All of this is extensible so that in the future I hope to use it to explore the phenomena discussed in https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263 which requires injecting appropriate "crash points" in the Ready handling loop. Discussions of the "what if X happened in state Y" can quickly be made concrete by "scripting up an interaction test". Additionally, this framework is intentionally not kept internal to the raft package.. Though this is in its infancy, a goal is that it should be possible for a suite of interaction tests to allow applications to validate that their Storage implementation behaves accordingly, simply by running a raft-provided interaction suite against their Storage.	2019-08-12 11:13:51 +02:00
Gyuho Lee	6c87b21821	raft: fix typo Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-08-09 21:26:48 -07:00
Tobias Schottdorf	37ab5bdd21	raft: fix restoring joint configurations While writing interaction tests for joint configuration changes, I realized that this wasn't working yet - restoring had no notion of the joint configuration and was simply dropping it on the floor. This commit introduces a helper `confchange.Restore` which takes a `ConfState` and initializes a `Tracker` from it. This is then used both in `(*raft).restore` as well as in `newRaft`.	2019-08-09 19:28:43 +02:00
Tobias Schottdorf	c30c2e345b	raft: let learners vote It turns out that that learners must be allowed to cast votes. This seems counter- intuitive but is necessary in the situation in which a learner has been promoted (i.e. is now a voter) but has not learned about this yet. For example, consider a group in which id=1 is a learner and id=2 and id=3 are voters. A configuration change promoting 1 can be committed on the quorum `{2,3}` without the config change being appended to the learner's log. If the leader (say 2) fails, there are de facto two voters remaining. Only 3 can win an election (due to its log containing all committed entries), but to do so it will need 1 to vote. But 1 considers itself a learner and will continue to do so until 3 has stepped up as leader, replicates the conf change to 1, and 1 applies it. Ultimately, by receiving a request to vote, the learner realizes that the candidate believes it to be a voter, and that it should act accordingly. The candidate's config may be stale, too; but in that case it won't win the election, at least in the absence of the bug discussed in: https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.	2019-08-07 12:03:18 +02:00
Tobias Schottdorf	3b02d4c5ff	raft: leave TODO about leaving StateSnapshot The condition is overly strict, which has popped up in CockroachDB recently.	2019-07-26 23:19:34 +02:00
Tobias Schottdorf	b9c051e7a7	raftpb: clean up naming in ConfChange	2019-07-23 10:40:03 +02:00
Tobias Schottdorf	b67303c6a2	raft: allow use of joint quorums This change introduces joint quorums by changing the Node and RawNode API to accept pb.ConfChangeV2 (on top of pb.ConfChange). pb.ConfChange continues to work as today: it allows carrying out a single configuration change. A pb.ConfChange proposal gets added to the Raft log as such and is thus also observed by the app during Ready handling, and fed back to ApplyConfChange. ConfChangeV2 allows joint configuration changes but will continue to carry out configuration changes in "one phase" (i.e. without ever entering a joint config) when this is possible.	2019-07-23 10:40:03 +02:00
Tobias Schottdorf	88f5561733	raft: use ConfChangeSingle internally	2019-07-23 10:39:48 +02:00
Jingyi Hu	233be58056	Merge pull request #10839 from needkane/pr raft: update log info and annotation	2019-07-18 23:26:44 -07:00
Tobias Schottdorf	aa158f36b9	raft: internally support joint consensus This commit introduces machinery to safely apply joint consensus configuration changes to Raft. The main contribution is the new package, `confchange`, which offers the primitives `Simple`, `EnterJoint`, and `LeaveJoint`. The first two take a list of configuration changes. `Simple` only declares success if these configuration changes (applied atomically) change the set of voters by at most one (i.e. it's fine to add or remove any number of learners, but change only one voter). `EnterJoint` makes the configuration joint and then applies the changes to it, in preparation of the caller returning later and transitioning out of the joint config into the final desired configuration via `LeaveJoint()`. This commit streamlines the conversion between voters and learners, which is now generally allowed whenever the above conditions are upheld (i.e. it's not possible to demote a voter and add a new voter in the context of a Simple configuration change, but it is possible via EnterJoint). Previously, we had the artificial restriction that a voter could not be demoted to a learner, but had to be removed first. Even though demoting a learner is generally less useful than promoting a learner (the latter is used to catch up future voters), demotions could see use in improved handling of temporary node unavailability, where it is desired to remove voting power from a down node, but to preserve its data should it return. An additional change that was made in this commit is to prevent the use of empty commit quorums, which was previously possible but for no good reason; this: Closes #10884. The work left to do in a future PR is to actually expose joint configurations to the applications using Raft. This will entail mostly API design and the addition of suitable testing, which to be carried out ergonomically is likely to motivate a larger refactor. Touches #7625.	2019-07-16 15:36:04 +02:00
Tobias Schottdorf	6f009d211f	raft: allow voter to become learner through snapshot At the time of writing, we don't allow configuration changes to change voters to learners directly, but note that a snapshot may compress multiple changes to the configuration into one: the voter could have been removed, then readded as a learner and the snapshot reflects both changes. In that case, a voter receives a snapshot telling it that it is now a learner. In fact, the node has to accept that snapshot, or it is permanently cut off from the Raft log. I think this just wasn't realized in the original work, but this is just my guess since there generally is very little rationale on the various decisions made. I also generally haven't been able to figure out whether the decision to prevent voters from becoming learners without first having been removed was motivated by some particular concern, or if it just wasn't deemed necessary. I suspect it is the latter because demoting a voter seems perfectly safe. See https://github.com/etcd-io/etcd/pull/8751#issuecomment-342028091.	2019-07-08 09:32:24 +02:00
Tobias Schottdorf	6697adfff8	raft/tracker: pull Voters and Learners into Config struct This is helpful to quickly print the configuration log messages without having to specify Voters and Learners separately. It will also come in handy for joint quorums because it allows holding on to voters and learners as a unit, which is useful for unit testing.	2019-07-03 21:26:42 +02:00
Tobias Schottdorf	b171e1c78b	raft: centralize configuration change application Put all the logic related to applying a configuration change in one place in preparation for adding joint consensus. This inspired various TODOs. I had to rewrite TestSnapshotSucceedViaAppResp since it was relying on a snapshot applied to the leader, which is now prevented.	2019-07-03 21:26:42 +02:00
kane	4f7d83a249	raft: update log info and annotation	2019-07-02 23:43:56 -04:00
Tobias Schottdorf	f9c2d00fb3	raft: extract 'tracker' package Mechanically extract `progressTracker`, `Progress`, and `inflights` to their own package named `tracker`. Add lots of comments in the progress, and take the opportunity to rename and clarify various fields.	2019-06-21 22:15:00 +02:00
Tobias Schottdorf	e039629907	raft: use half-populated joint quorum To ease a future transition into joint quorums, this commit removes the previous "ad-hoc" majority-based quorum and vote computations with that introduced in the `raft/quorum` package. More specifically, the progressTracker now uses a quorum.JointConfig for which the "second" majority quorum is always empty; in this case the quorum behaves like the one quorum.MajorityConfig that is actually present. Or, more briefly, this change is a no-op, but it will take the busywork out of actually starting to make use of joint quorums in the future. On a side node, I suspect that this might've fixed a bug regarding the read index though I haven't been able to explicitly come up with a counter-example. The problem was that the acks collected for the read index weren't taking into account membership changes, so they'd run the danger of using acks from nodes since removed to claim that a quorum of acks had been received. There's a chance that there isn't a counter-example (the only guarantee extracted from the "quorum" is that there isn't another leader, but even if there's another leader all that matters is that that leader doesn't have a divergent history from the stale leader in the hypothetical counter-example), but either way there is morally a bug here that is now fixed because VoteCommitted doesn't care about votes from members that are not voters known to the currently active configuration.	2019-06-19 14:19:35 +02:00
Tobias Schottdorf	0384c587eb	raft: rename makeP{RS,rogressTracker}	2019-06-19 14:19:35 +02:00
Tobias Schottdorf	c844526002	raft: prevent learners from becoming leader We were already taking some precautions against learners campaigning, but there was no safeguard against an explicit call to `Campaign()`. The newly added test also verifies that leadership transfers to learners are ignored.	2019-06-17 09:20:45 +02:00
Gyuho Lee	34bd797e67	*: revert module import paths Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-05-28 15:39:35 -07:00
Tobias Schottdorf	5dd45011d6	raft: rename prs to progressTracker	2019-05-21 16:03:36 +02:00
Tobias Schottdorf	02b0d80234	raft: remove quorum() dependency from readOnly This now delegates the quorum computation to r.prs, which will allow it to generalize in a straightforward way when etcd-io/etcd#7625 is addressed.	2019-05-21 16:03:36 +02:00
Tobias Schottdorf	57a1b39fcd	raft: avoid another call to quorum() This particular caller just wanted to know whether it was in a single-voter cluster configuration, which is now a question prs can answer.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	bc828e939a	raft: pull checkQuorumActive into prs It's looking at each voter's Progress and needs to know how quorums work, so this is the ideal new home for it.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	a6f222e62d	raft: establish an interface around vote counting This cleans up the mechanical refactor in the last commit and will help with etcd-io/etcd#7625 as well.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	26eaadb1d1	raft: move votes into prs This is purely mechanical. Cleanup deferred to the next commit.	2019-05-21 16:02:52 +02:00
Tobias Schottdorf	a11563737c	raft: use progress tracker APIs in more places This doesn't completely eliminate access to prs.nodes, but that's not really necessary. This commit uses the existing APIs in a few more places where it's convenient, and also sprinkles some assertions.	2019-05-21 16:02:52 +02:00

1 2 3 4 5 ...

397 Commits