While it appears that etcd is not vulnerable to CVE-2021-3121,
it is a good idea to update to the new generator so that new
vulnerable code isn't generated in any future APIs. Also, this
lays the issue to rest of whether there is any issue with
etcd and CVE-2021-3121.
This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and
`XXX_sizecache` auto-generated fields from generated protobuf structs in
the raft package. This was done for all of the same reasons CockroachDB
removed the generation of these fields in https://github.com/cockroachdb/cockroach/pull/38404.
They come with very limited advantages but moderate disadvantages.
`XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in
cc7b4fa, and this appears to have been unintentional. Meanwhile,
`XXX_unrecognized` has been around for longer and has arguably more
reason to stay because it can assist with forwards compatibility.
However, any real mixed-version upgrade story for this package is mostly
untold at this point, and keeping this field seems just as likely to
cause unexpected bugs (e.g. a field was propagated but not updated
correctly when passed through an old version) as it seems to fix real
issues, so it also doesn't warrant its cost.
This reduces the in-memory representation size of all Raft protos.
Notably, it reduces the memory size of an `Entry` proto from *80 bytes*
to *48 bytes* and the memory size of a `Message` proto from *392 bytes*
to *264 bytes*. Both of these structs are used frequently, and often in
slices, where this wasted space really starts to add up.
This was motivated by a regression in microbenchmarks in CockroachDB due
to cc7b4fa, which was caught in https://github.com/cockroachdb/cockroach/issues/62212.
The code doing so was undertested and buggy: it would launch multiple
attempts to transition out when the conf change was not the last element
in the log.
This commit fixes the problem and adds a regression test. It also
reworks the code to handle a former untested edge case, in which the
auto-transition append is refused. This can't happen any more with the
current version of the code because this proposal has size zero and is
special cased in increaseUncommittedSize. Last but not least, the
auto-leave proposal now also bumps pendingConfIndex, which was not done
previously due to an oversight.
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.
This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.
In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).
The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263
which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".
Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.
This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.
This is then used both in `(*raft).restore` as well as in `newRaft`.
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).
pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.
ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
The Entry struct has misaligned fields that are accessed atomically. The
misalignment is caused by the EntryType enum which the Protocol Buffers
spec forces to be a 32bit int.
Moving the order of the fields without renumbering them in the .proto file
seems to align the go structure without changing the wire format.
Using Go-style import paths in protos is not idiomatic. Normally, this
detail would be internal to etcd, but the path from which gogoproto
is imported affects downstream consumers (e.g. cockroachdb).
In cockroach, we want to avoid including `$GOPATH/src` in our protoc
include path for various reasons. This patch puts etcd on the same
convention, which allows this for cockroach.
More information: https://github.com/cockroachdb/cockroach/pull/2339#discussion_r38663417
This commit also regenerates all the protos, which seem to have
drifted a tiny bit.
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.