22 Commits

Author SHA1 Message Date
Pavel Kalinnikov
4969aa81ae raft: send empty appends when replication is paused
When Inflights to a particular node is full, i.e. MaxInflightMsgs for the
append messages flow is saturated, it is still necessary to continue sending
MsgApp to ensure progress. Currently this is achieved by "forgetting" the first
in-flight message in the window, which frees up quota for one new MsgApp.

This new message is constructed in such a way that it potentially has multiple
entries, or a large entry. The effect of this is that the in-flight limitations
can be exceeded arbitrarily, for as long as the flow to this node continues
being saturated. In particular, if a follower is stuck, the leader will keep
sending entries to it.

This commit makes the MsgApp empty when Inflights is saturated, and prevents
the described leakage of Entries to slow followers.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:39 +00:00
Pavel Kalinnikov
0a0f0ae719 raft/rafttest: add test for replication pausing
This commit adds a data-driven test which simulates conditions under which Raft
messages flow to a particular node is throttled while in StateReplicate. The
test demonstrates that MsgApp messages with non-empty Entries may "leak" to a
paused stream every time there is successful heartbeat exchange.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:38 +00:00
Nathan VanBenschoten
bd34388721 raft: broadcast MsgApp on auto-leave joint config proposal
This commit ensures that the raft leader eagerly broadcasts a MsgApp to
each follower when initiating an automatic transition out of a joint
configuration. This had been missed previously, which could lead to
delayed completion of an auto-transition.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-09-29 12:33:20 -04:00
Tobias Grieger
f7dcb9ec2a TestInteraction
Reviewed the diff in detail.
The changes here were benign, just the extra raft cycle.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
169f4c3cc7 raft: don't emit unstable CommittedEntries
See https://github.com/etcd-io/etcd/issues/14370.

When run in a single-voter configuration, prior to this PR
raft would emit `HardState`s that would emit a proposed `Entry`
simultaneously in `CommittedEntries` and `Entries`.

To be correct, this requires users of the raft library to enforce an
ordering between appending to the log and notifying the client about
`CommittedEntries` also present in `Entries`. This was easy to miss.

Walk back this behavior to arrive at a simpler contract: what's
emitted in `CommittedEntries` is truly committed, i.e. present
in stable storage on a quorum of voters.

This in turn pessimizes the single-voter case: rather than fully
handling an `Entry` in just one `Ready`, now two are required,
and in particular one has to do extra work to save on allocations.

We accept this as a good tradeoff, since raft primarily serves
multi-voter configurations.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
21be9fa337 raft: add single_node InteractionEnv test case
Show-cases the current behavior and changes made in future commits for [^1].

The test demonstrates that a single-voter raft instance will emit an
entry as committed while it still needs to be appended to the log.

[^1]: https://github.com/etcd-io/etcd/issues/14370

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
shralex
ea3c86ef5b raft: add test for leadership transfer in joint configuration 2021-10-25 14:10:27 -07:00
Nathan VanBenschoten
e51c697ec6 raft: disable XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache fields in protos
This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and
`XXX_sizecache` auto-generated fields from generated protobuf structs in
the raft package. This was done for all of the same reasons CockroachDB
removed the generation of these fields in https://github.com/cockroachdb/cockroach/pull/38404.
They come with very limited advantages but moderate disadvantages.

`XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in
cc7b4fa, and this appears to have been unintentional. Meanwhile,
`XXX_unrecognized` has been around for longer and has arguably more
reason to stay because it can assist with forwards compatibility.
However, any real mixed-version upgrade story for this package is mostly
untold at this point, and keeping this field seems just as likely to
cause unexpected bugs (e.g. a field was propagated but not updated
correctly when passed through an old version) as it seems to fix real
issues, so it also doesn't warrant its cost.

This reduces the in-memory representation size of all Raft protos.
Notably, it reduces the memory size of an `Entry` proto from *80 bytes*
to *48 bytes* and the memory size of a `Message` proto from *392 bytes*
to *264 bytes*. Both of these structs are used frequently, and often in
slices, where this wasted space really starts to add up.

This was motivated by a regression in microbenchmarks in CockroachDB due
to cc7b4fa, which was caught in https://github.com/cockroachdb/cockroach/issues/62212.
2021-03-20 03:24:18 -04:00
Tobias Grieger
c1e8d3a63f Clarify documentation of probing
- Add a large detailed comment about the use and necessity of
  both the follower and leader probing optimization
- fix the log message in stepLeader that previously mixed up the
  log term for the rejection and the index of the append
- improve the test via subtests
- add some verbiage in findConflictByTerm around first index
2021-02-15 09:47:18 +01:00
qupeng
6828517965 raft: implement fast log rejection
Signed-off-by: qupeng <qupeng@pingcap.com>
2021-02-10 15:48:32 -05:00
Nathan VanBenschoten
b757e1bc87 raft: create new probe_and_replicate.txt interactive test
This commit creates a new probe_and_replicate.txt interactive test. The
test creates a complete Raft log configuration and demonstrates how a
leader probes and replicates to each of its followers. The log
configuration constructed is identical to the one present in Figure 7 of
the raft paper (https://raft.github.io/raft.pdf), which looks like:

```
     1  2  3  4  5  6  7  8  9  10 11 12
n1: [1][1][1][4][4][5][5][6][6][6]
n2: [1][1][1][4][4][5][5][6][6]
n3: [1][1][1][4]
n4: [1][1][1][4][4][5][5][6][6][6][6]
n5: [1][1][1][4][4][5][5][6][7][7][7][7]
n6: [1][1][1][4][4][4][4]
n7: [1][1][1][2][2][2][3][3][3][3][3]
```

Once in this state, we then elect node 1 as the leader and stabilize the
entire raft group. This demonstrates how a newly elected leader probes
for matching indexes, overwrites conflicting entries, and catches up all
followers.

This will be useful to demonstrate the impact of more efficient probing
behavior.
2021-02-10 15:02:36 -05:00
Piotr Tabor
371ddf0b69 tests: Update diagnostic update in tests after change of proto version. 2020-10-14 18:46:38 +02:00
Tobias Schottdorf
0544f33248 raft: clarify ApplyConfChange contract for rejected conf changes
Apps typically maintain the raft configuration as part of the state
machine. As a result, they want to be able to reject configuration change
entries at apply time based on the state on which the entry is supposed
to be applied. When this happens, the app should not call
ApplyConfChange, but the comments did not make this clear.

As a result, it was tempting to pass an empty pb.ConfChange or it's V2
version instead of not calling ApplyConfChange.

However, an empty V1 or V2 proto aren't noops when the configuration is
joint: an empty V1 change is treated internally as a single
configuration change for NodeID zero and will cause a panic when applied
in a joint state. An empty V2 proto is treated as a signal to leave a
joint state, which means that the app's config and raft's would diverge.

The comments updated in this commit now ask users to not call
ApplyConfState when they reject a conf change. Apps that never use joint
consensus can keep their old behavior since the distinction only matters
when in a joint state, but we don't want to encourage that.
2020-02-25 12:45:45 +01:00
Tobias Schottdorf
37c7e4d1d8 raft: fix auto-transitioning out of joint config
The code doing so was undertested and buggy: it would launch multiple
attempts to transition out when the conf change was not the last element
in the log.

This commit fixes the problem and adds a regression test. It also
reworks the code to handle a former untested edge case, in which the
auto-transition append is refused. This can't happen any more with the
current version of the code because this proposal has size zero and is
special cased in increaseUncommittedSize. Last but not least, the
auto-leave proposal now also bumps pendingConfIndex, which was not done
previously due to an oversight.
2020-02-25 12:35:51 +01:00
Tobias Schottdorf
2332705f10 raft: remove bogus tail end of membership change interaction test
The test was supposed to end earlier, but some old copy pasta
survived.
2020-02-25 12:35:51 +01:00
Tobias Schottdorf
47ae53d25d rafttest: print Ready before processing it
It was confusing to see the effects of the Ready (i.e. log messages)
printed before the Ready itself.
2019-08-16 09:41:35 +02:00
Tobias Schottdorf
99f8046fd1 raft: fix a test file name 2019-08-16 09:38:44 +02:00
Tobias Schottdorf
8d1946d16a raft: document problem with leader self-removal
When a leader removes itself, it will retain its leadership but not
accept new proposals, making the range effectively stuck until manual
intervention triggers a campaign event.

This commit documents the behavior. It does not correct it yet.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf
306e75a96f raft: add a batch of interaction-driven conf change tests
Verifiy the behavior in various v1 and v2 conf change operations.
This also includes various fixups, notably it adds protection
against transitioning in and out of new configs when this is not
permissible.

There are more threads to pull, but those are left for future commits.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf
4e19150676 raft: proactively probe newly added followers
When the leader applied a new configuration that added voters, it would
not immediately probe these voters, delaying when they would be caught
up.

I noticed this while writing an interaction-driven test, which has now
been cleaned up and completed.
2019-08-14 20:53:34 +02:00
Tobias Schottdorf
c2d9514370 raft/rafttest: fix stabilize handler
It was bailing out too early.
2019-08-14 17:24:14 +02:00
Tobias Schottdorf
e8090e57a2 raft/rafttest: introduce datadriven testing
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
2019-08-12 11:13:51 +02:00