415 Commits

Author SHA1 Message Date
Benjamin Wang
f64bed6033
Merge pull request #14698 from ahrtr/raft_warn_20221107
raft: change the log from debug to warning when uncommitted size exceeds threshold
2022-11-07 19:57:33 +08:00
Benjamin Wang
3e07097d77
Merge pull request #14545 from nvanbenschoten/nvanbenschoten/simplifyAutoLeave
raft: simplify auto-leave joint config on entry application logic
2022-11-07 17:20:26 +08:00
Benjamin Wang
a671e3ebd1 raft: change the log from debug to warning when uncommitted size exceeds max threshold
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-07 17:17:48 +08:00
王霄霄
aac5feec94 raft: remove duplicate letter in comment.
Signed-off-by: Wang Xiaoxiao 1141195807@qq.com
Signed-off-by: 王霄霄 <1141195807@qq.com>
2022-10-22 19:13:40 +08:00
Nathan VanBenschoten
419ee8a9c6 raft: panic on self-addressed messages
These are nonsensical and a network implementation is not required
to handle them correctly, so panic instead of sending them out.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-10-06 20:25:07 -04:00
Nathan VanBenschoten
c50e728518 raft: simplify auto-leave joint config on entry application logic
This commit simplifies the logic added in 37c7e4d to auto-leave joint
configurations. It does so by making the following adjustments to the
code:

- remove the `oldApplied <= r.pendingConfIndex` condition. This does
  not seem necessary. When a node first attempts to auto-leave a joint
  config, it will bump `r.pendingConfIndex` when proposing. In cases
  where `oldApplied >= r.pendingConfIndex`, the proposal must have
  already been applied. Reviewers should double check this.
- use raft.Step instead of custom proposal code. This code was already
  present in stepLeader, so there was no reason to duplicate it. This
  would have avoided bugs like the one we fixed in #14538.
- use `confChangeToMsg` to generate message, to centralize the creation
  of all `MsgProp{EntryConfChange}` messages.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-10-03 02:11:56 -04:00
Nathan VanBenschoten
bd34388721 raft: broadcast MsgApp on auto-leave joint config proposal
This commit ensures that the raft leader eagerly broadcasts a MsgApp to
each follower when initiating an automatic transition out of a joint
configuration. This had been missed previously, which could lead to
delayed completion of an auto-transition.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-09-29 12:33:20 -04:00
Benjamin Wang
31d9664cb5
Merge pull request #14413 from tbg/raft-single-voter
raft: don't emit unstable CommittedEntries
2022-09-22 08:43:37 +08:00
Tobias Grieger
9ad36eecab fixup! address comments
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
3c3e30a30e Revert "raft: directly update leader in advance"
This reverts commit d73a986e4edb15ef9dbfc994f1cbf5e96694d877, which
was added only for benchmarking purposes.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
67c3522893 raft: directly update leader in advance
This makes the alternative option of implementing the leader's self-ack
of entry append the default.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
169f4c3cc7 raft: don't emit unstable CommittedEntries
See https://github.com/etcd-io/etcd/issues/14370.

When run in a single-voter configuration, prior to this PR
raft would emit `HardState`s that would emit a proposed `Entry`
simultaneously in `CommittedEntries` and `Entries`.

To be correct, this requires users of the raft library to enforce an
ordering between appending to the log and notifying the client about
`CommittedEntries` also present in `Entries`. This was easy to miss.

Walk back this behavior to arrive at a simpler contract: what's
emitted in `CommittedEntries` is truly committed, i.e. present
in stable storage on a quorum of voters.

This in turn pessimizes the single-voter case: rather than fully
handling an `Entry` in just one `Ready`, now two are required,
and in particular one has to do extra work to save on allocations.

We accept this as a good tradeoff, since raft primarily serves
multi-voter configurations.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
3ad363d070 raft: always mark leader as RecentActive
RecentActive is now initialized to true in `becomeLeader`. Both
configuration changes and CheckQuorum make sure not to break this,
so we now now that the leader is always RecentActive.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
demoManito
a9c3d56508 etcd: remove redundant type conversion
Signed-off-by: demoManito <1430482733@qq.com>
2022-09-20 11:26:02 +08:00
Abirdcfly
08a9d1da07
chore: remove duplicate word in comments
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-27 13:39:48 +08:00
howz97
f9c9bfa44c fix comment in raft.go 2022-04-02 14:27:33 +08:00
dengziming
a286f5bb99 MINOR: Fix typos(hearbeat -> heartbeat) 2021-08-07 11:41:13 +08:00
Lili Cosic
ddd390af01 raft/raft.go: Log unhandled errors 2021-06-02 11:41:26 +02:00
wpedrak
758ff0163c raft: postpone MsgReadIndex until first commit in the term
Fixes #12680
2021-03-23 12:28:42 +01:00
Piotr Tabor
87258efd90 Integration tests: Use zaptest.Logger based testing.TB
Thanks to this the logs:
  - are automatically printed if the test fails.
  - are in pretty consistent format.
  - are annotated by 'member' information of the cluster emitting them.

Side changes:
  - Set propert default got DefaultWarningApplyDuration (used to be '0')
  - Name the members based on their 'place' on the list (as opposed to
'random')
2021-03-09 18:19:51 +01:00
Tobias Grieger
73c50b869a
Merge pull request #12637 from BusyJay/check-outgoingvoters-when-restoring
raft: check `VotersOutgoing` for snapshot
2021-02-16 09:43:08 +01:00
Tobias Grieger
c1e8d3a63f Clarify documentation of probing
- Add a large detailed comment about the use and necessity of
  both the follower and leader probing optimization
- fix the log message in stepLeader that previously mixed up the
  log term for the rejection and the index of the append
- improve the test via subtests
- add some verbiage in findConflictByTerm around first index
2021-02-15 09:47:18 +01:00
qupeng
6828517965 raft: implement fast log rejection
Signed-off-by: qupeng <qupeng@pingcap.com>
2021-02-10 15:48:32 -05:00
Jay Lee
f947c815d0
raft: check VotersOutgoing for snapshot
Close #12631.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2021-01-21 16:09:37 +08:00
Piotr Tabor
5472b3336b
Merge pull request #12525 from sakateka/remove_raft.peers
raft tests: Remove Config.peers and Config.learners
2021-01-19 16:00:54 +01:00
Sergey Kacheev
ccfd00f687
raft: specify voters and learners via snapshot 2021-01-16 13:03:47 +07:00
Piotr Tabor
bf6f173d5e Document Raft.send method.
The change makes it explicit that sending messages does not happen
immidietely and is subject to proper persist & then send protocol
on the application side. See:

https://github.com/etcd-io/etcd/issues/12589#issuecomment-752867024

for more context.
2021-01-15 12:35:58 +01:00
Piotr Tabor
e62417297d *: Rename of imports of raft (as its now a module)
% find -name '*.go' -o -name '*.md' -o -name '*.sh' | xargs sed -i --follow-symlinks 's|etcd/v3/raft|etcd/raft/v3|g'
2020-10-16 13:58:18 +02:00
Jay
26b89fd418
raft: don't campaign with pending snapshot (#12163)
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-26 00:04:46 -07:00
Jay
d0e4fe56a5
raft: check pending conf change before campaign (#12134)
* raft: check conf change before campaign

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: extract hup function

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: check pending conf change for transferleader

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-22 17:04:48 -07:00
Jay
cc656718fa
raft: correct pendingConfIndex check for AutoLeave (#12137)
Close #12136

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-20 16:49:22 -07:00
Zhihong Yu
7cc2f8a411
raft: break out of nested loop when id is found (#11870)
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-05-12 16:59:22 -07:00
Brandon Philips
96cce208c2 go.mod: use go.etcd.io/etcd/v3 versioning
This change makes the etcd package compatible with the existing Go
ecosystem for module versioning.

Used this tool to update package imports:
  https://github.com/KSubedi/gomove
2020-04-28 00:57:35 +00:00
Fullstop000
7eae024ead
raft: only redirect msg produced by own node (#11466)
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
2020-04-06 20:27:46 -07:00
qupeng
6f850a65a1
raft: cleanup read index code (#11528)
Signed-off-by: qupeng <qupeng@pingcap.com>
2020-03-03 09:20:25 -08:00
Tobias Schottdorf
0544f33248 raft: clarify ApplyConfChange contract for rejected conf changes
Apps typically maintain the raft configuration as part of the state
machine. As a result, they want to be able to reject configuration change
entries at apply time based on the state on which the entry is supposed
to be applied. When this happens, the app should not call
ApplyConfChange, but the comments did not make this clear.

As a result, it was tempting to pass an empty pb.ConfChange or it's V2
version instead of not calling ApplyConfChange.

However, an empty V1 or V2 proto aren't noops when the configuration is
joint: an empty V1 change is treated internally as a single
configuration change for NodeID zero and will cause a panic when applied
in a joint state. An empty V2 proto is treated as a signal to leave a
joint state, which means that the app's config and raft's would diverge.

The comments updated in this commit now ask users to not call
ApplyConfState when they reject a conf change. Apps that never use joint
consensus can keep their old behavior since the distinction only matters
when in a joint state, but we don't want to encourage that.
2020-02-25 12:45:45 +01:00
Tobias Schottdorf
37c7e4d1d8 raft: fix auto-transitioning out of joint config
The code doing so was undertested and buggy: it would launch multiple
attempts to transition out when the conf change was not the last element
in the log.

This commit fixes the problem and adds a regression test. It also
reworks the code to handle a former untested edge case, in which the
auto-transition append is refused. This can't happen any more with the
current version of the code because this proposal has size zero and is
special cased in increaseUncommittedSize. Last but not least, the
auto-leave proposal now also bumps pendingConfIndex, which was not done
previously due to an oversight.
2020-02-25 12:35:51 +01:00
qupeng
eaa0612e02 raft: abort leader transferring if the target is demoted (#11417)
Signed-off-by: qupeng <qupeng@pingcap.com>
2019-12-20 12:07:52 +08:00
Wine93
5f42161750 raft: fixed some typos and simplify minor logic 2019-08-25 04:46:29 +00:00
Tobias Schottdorf
306e75a96f raft: add a batch of interaction-driven conf change tests
Verifiy the behavior in various v1 and v2 conf change operations.
This also includes various fixups, notably it adds protection
against transitioning in and out of new configs when this is not
permissible.

There are more threads to pull, but those are left for future commits.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf
4e19150676 raft: proactively probe newly added followers
When the leader applied a new configuration that added voters, it would
not immediately probe these voters, delaying when they would be caught
up.

I noticed this while writing an interaction-driven test, which has now
been cleaned up and completed.
2019-08-14 20:53:34 +02:00
Tobias Grieger
029401ab81
Merge pull request #11005 from tbg/interactiontest
raft/rafttest: introduce datadriven testing
2019-08-12 11:52:52 +02:00
Tobias Schottdorf
e8090e57a2 raft/rafttest: introduce datadriven testing
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
2019-08-12 11:13:51 +02:00
Gyuho Lee
6c87b21821 raft: fix typo
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-09 21:26:48 -07:00
Tobias Schottdorf
37ab5bdd21 raft: fix restoring joint configurations
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.

This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.

This is then used both in `(*raft).restore` as well as in `newRaft`.
2019-08-09 19:28:43 +02:00
Tobias Schottdorf
c30c2e345b raft: let learners vote
It turns out that that learners must be allowed to cast votes.

This seems counter- intuitive but is necessary in the situation in which
a learner has been promoted (i.e. is now a voter) but has not learned
about this yet.

For example, consider a group in which id=1 is a learner and id=2 and
id=3 are voters. A configuration change promoting 1 can be committed on
the quorum `{2,3}` without the config change being appended to the
learner's log. If the leader (say 2) fails, there are de facto two
voters remaining. Only 3 can win an election (due to its log containing
all committed entries), but to do so it will need 1 to vote. But 1
considers itself a learner and will continue to do so until 3 has
stepped up as leader, replicates the conf change to 1, and 1 applies it.

Ultimately, by receiving a request to vote, the learner realizes that
the candidate believes it to be a voter, and that it should act
accordingly. The candidate's config may be stale, too; but in that case
it won't win the election, at least in the absence of the bug discussed
in:
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.
2019-08-07 12:03:18 +02:00
Tobias Schottdorf
3b02d4c5ff raft: leave TODO about leaving StateSnapshot
The condition is overly strict, which has popped up in CockroachDB
recently.
2019-07-26 23:19:34 +02:00
Tobias Schottdorf
b9c051e7a7 raftpb: clean up naming in ConfChange 2019-07-23 10:40:03 +02:00
Tobias Schottdorf
b67303c6a2 raft: allow use of joint quorums
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
2019-07-23 10:40:03 +02:00
Tobias Schottdorf
88f5561733 raft: use ConfChangeSingle internally 2019-07-23 10:39:48 +02:00