Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Pavel Kalinnikov	4969aa81ae	raft: send empty appends when replication is paused When Inflights to a particular node is full, i.e. MaxInflightMsgs for the append messages flow is saturated, it is still necessary to continue sending MsgApp to ensure progress. Currently this is achieved by "forgetting" the first in-flight message in the window, which frees up quota for one new MsgApp. This new message is constructed in such a way that it potentially has multiple entries, or a large entry. The effect of this is that the in-flight limitations can be exceeded arbitrarily, for as long as the flow to this node continues being saturated. In particular, if a follower is stuck, the leader will keep sending entries to it. This commit makes the MsgApp empty when Inflights is saturated, and prevents the described leakage of Entries to slow followers. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:39 +00:00
Pavel Kalinnikov	0a0f0ae719	raft/rafttest: add test for replication pausing This commit adds a data-driven test which simulates conditions under which Raft messages flow to a particular node is throttled while in StateReplicate. The test demonstrates that MsgApp messages with non-empty Entries may "leak" to a paused stream every time there is successful heartbeat exchange. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:38 +00:00
Nathan VanBenschoten	bd34388721	raft: broadcast MsgApp on auto-leave joint config proposal This commit ensures that the raft leader eagerly broadcasts a MsgApp to each follower when initiating an automatic transition out of a joint configuration. This had been missed previously, which could lead to delayed completion of an auto-transition. Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>	2022-09-29 12:33:20 -04:00
Tobias Grieger	f7dcb9ec2a	TestInteraction Reviewed the diff in detail. The changes here were benign, just the extra raft cycle. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 08:59:37 +02:00
Tobias Grieger	169f4c3cc7	raft: don't emit unstable CommittedEntries See https://github.com/etcd-io/etcd/issues/14370. When run in a single-voter configuration, prior to this PR raft would emit `HardState`s that would emit a proposed `Entry` simultaneously in `CommittedEntries` and `Entries`. To be correct, this requires users of the raft library to enforce an ordering between appending to the log and notifying the client about `CommittedEntries` also present in `Entries`. This was easy to miss. Walk back this behavior to arrive at a simpler contract: what's emitted in `CommittedEntries` is truly committed, i.e. present in stable storage on a quorum of voters. This in turn pessimizes the single-voter case: rather than fully handling an `Entry` in just one `Ready`, now two are required, and in particular one has to do extra work to save on allocations. We accept this as a good tradeoff, since raft primarily serves multi-voter configurations. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 08:59:37 +02:00
Tobias Grieger	21be9fa337	raft: add single_node InteractionEnv test case Show-cases the current behavior and changes made in future commits for [^1]. The test demonstrates that a single-voter raft instance will emit an entry as committed while it still needs to be appended to the log. [^1]: https://github.com/etcd-io/etcd/issues/14370 Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 08:59:37 +02:00
shralex	ea3c86ef5b	raft: add test for leadership transfer in joint configuration	2021-10-25 14:10:27 -07:00
Nathan VanBenschoten	e51c697ec6	raft: disable XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache fields in protos This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and `XXX_sizecache` auto-generated fields from generated protobuf structs in the raft package. This was done for all of the same reasons CockroachDB removed the generation of these fields in https://github.com/cockroachdb/cockroach/pull/38404. They come with very limited advantages but moderate disadvantages. `XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in cc7b4fa, and this appears to have been unintentional. Meanwhile, `XXX_unrecognized` has been around for longer and has arguably more reason to stay because it can assist with forwards compatibility. However, any real mixed-version upgrade story for this package is mostly untold at this point, and keeping this field seems just as likely to cause unexpected bugs (e.g. a field was propagated but not updated correctly when passed through an old version) as it seems to fix real issues, so it also doesn't warrant its cost. This reduces the in-memory representation size of all Raft protos. Notably, it reduces the memory size of an `Entry` proto from 80 bytes to 48 bytes and the memory size of a `Message` proto from 392 bytes to 264 bytes. Both of these structs are used frequently, and often in slices, where this wasted space really starts to add up. This was motivated by a regression in microbenchmarks in CockroachDB due to cc7b4fa, which was caught in https://github.com/cockroachdb/cockroach/issues/62212.	2021-03-20 03:24:18 -04:00
Tobias Grieger	c1e8d3a63f	Clarify documentation of probing - Add a large detailed comment about the use and necessity of both the follower and leader probing optimization - fix the log message in stepLeader that previously mixed up the log term for the rejection and the index of the append - improve the test via subtests - add some verbiage in findConflictByTerm around first index	2021-02-15 09:47:18 +01:00
qupeng	6828517965	raft: implement fast log rejection Signed-off-by: qupeng <qupeng@pingcap.com>	2021-02-10 15:48:32 -05:00
Nathan VanBenschoten	b757e1bc87	raft: create new probe_and_replicate.txt interactive test This commit creates a new probe_and_replicate.txt interactive test. The test creates a complete Raft log configuration and demonstrates how a leader probes and replicates to each of its followers. The log configuration constructed is identical to the one present in Figure 7 of the raft paper (https://raft.github.io/raft.pdf), which looks like: ``` 1 2 3 4 5 6 7 8 9 10 11 12 n1: [1][1][1][4][4][5][5][6][6][6] n2: [1][1][1][4][4][5][5][6][6] n3: [1][1][1][4] n4: [1][1][1][4][4][5][5][6][6][6][6] n5: [1][1][1][4][4][5][5][6][7][7][7][7] n6: [1][1][1][4][4][4][4] n7: [1][1][1][2][2][2][3][3][3][3][3] ``` Once in this state, we then elect node 1 as the leader and stabilize the entire raft group. This demonstrates how a newly elected leader probes for matching indexes, overwrites conflicting entries, and catches up all followers. This will be useful to demonstrate the impact of more efficient probing behavior.	2021-02-10 15:02:36 -05:00
Piotr Tabor	371ddf0b69	tests: Update diagnostic update in tests after change of proto version.	2020-10-14 18:46:38 +02:00
Tobias Schottdorf	0544f33248	raft: clarify ApplyConfChange contract for rejected conf changes Apps typically maintain the raft configuration as part of the state machine. As a result, they want to be able to reject configuration change entries at apply time based on the state on which the entry is supposed to be applied. When this happens, the app should not call ApplyConfChange, but the comments did not make this clear. As a result, it was tempting to pass an empty pb.ConfChange or it's V2 version instead of not calling ApplyConfChange. However, an empty V1 or V2 proto aren't noops when the configuration is joint: an empty V1 change is treated internally as a single configuration change for NodeID zero and will cause a panic when applied in a joint state. An empty V2 proto is treated as a signal to leave a joint state, which means that the app's config and raft's would diverge. The comments updated in this commit now ask users to not call ApplyConfState when they reject a conf change. Apps that never use joint consensus can keep their old behavior since the distinction only matters when in a joint state, but we don't want to encourage that.	2020-02-25 12:45:45 +01:00
Tobias Schottdorf	37c7e4d1d8	raft: fix auto-transitioning out of joint config The code doing so was undertested and buggy: it would launch multiple attempts to transition out when the conf change was not the last element in the log. This commit fixes the problem and adds a regression test. It also reworks the code to handle a former untested edge case, in which the auto-transition append is refused. This can't happen any more with the current version of the code because this proposal has size zero and is special cased in increaseUncommittedSize. Last but not least, the auto-leave proposal now also bumps pendingConfIndex, which was not done previously due to an oversight.	2020-02-25 12:35:51 +01:00
Tobias Schottdorf	2332705f10	raft: remove bogus tail end of membership change interaction test The test was supposed to end earlier, but some old copy pasta survived.	2020-02-25 12:35:51 +01:00
Tobias Schottdorf	47ae53d25d	rafttest: print Ready before processing it It was confusing to see the effects of the Ready (i.e. log messages) printed before the Ready itself.	2019-08-16 09:41:35 +02:00
Tobias Schottdorf	99f8046fd1	raft: fix a test file name	2019-08-16 09:38:44 +02:00
Tobias Schottdorf	8d1946d16a	raft: document problem with leader self-removal When a leader removes itself, it will retain its leadership but not accept new proposals, making the range effectively stuck until manual intervention triggers a campaign event. This commit documents the behavior. It does not correct it yet.	2019-08-16 09:38:44 +02:00
Tobias Schottdorf	306e75a96f	raft: add a batch of interaction-driven conf change tests Verifiy the behavior in various v1 and v2 conf change operations. This also includes various fixups, notably it adds protection against transitioning in and out of new configs when this is not permissible. There are more threads to pull, but those are left for future commits.	2019-08-16 09:38:44 +02:00
Tobias Schottdorf	4e19150676	raft: proactively probe newly added followers When the leader applied a new configuration that added voters, it would not immediately probe these voters, delaying when they would be caught up. I noticed this while writing an interaction-driven test, which has now been cleaned up and completed.	2019-08-14 20:53:34 +02:00
Tobias Schottdorf	c2d9514370	raft/rafttest: fix stabilize handler It was bailing out too early.	2019-08-14 17:24:14 +02:00
Tobias Schottdorf	e8090e57a2	raft/rafttest: introduce datadriven testing It has often been tedious to test the interactions between multi-member Raft groups, especially when many steps were required to reach a certain scenario. Often, this boilerplate was as boring as it is hard to write and hard to maintain, making it attractive to resort to shortcuts whenever possible, which in turn tended to undercut how meaningful and maintainable the tests ended up being - that is, if the tests were even written, which sometimes they weren't. This change introduces a datadriven framework specifically for testing deterministically the interaction between multiple members of a raft group with the goal of reducing the friction for writing these tests to near zero. In the near term, this will be used to add thorough testing for joint consensus (which is already available today, but wildly undertested), but just converting an existing test into this framework has shown that the concise representation and built-in inspection of log messages highlights unexpected behavior much more readily than the previous unit tests did (the test in question is `snapshot_succeed_via_app_resp`; the reader is invited to compare the old and new version of it). The main building block is `InteractionEnv`, which holds on to the state of the whole system and exposes various relevant methods for manipulating it, including but not limited to adding nodes, delivering and dropping messages, and proposing configuration changes. All of this is extensible so that in the future I hope to use it to explore the phenomena discussed in https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263 which requires injecting appropriate "crash points" in the Ready handling loop. Discussions of the "what if X happened in state Y" can quickly be made concrete by "scripting up an interaction test". Additionally, this framework is intentionally not kept internal to the raft package.. Though this is in its infancy, a goal is that it should be possible for a suite of interaction tests to allow applications to validate that their Storage implementation behaves accordingly, simply by running a raft-provided interaction suite against their Storage.	2019-08-12 11:13:51 +02:00

22 Commits