1142 Commits

Author SHA1 Message Date
jianfei.zhang
cadf9de3f8 feat: raft/log_test se testify packages in tests
Signed-off-by: jianfei.zhang <jianfei.zhang@daocloud.io>
2022-11-15 10:35:30 +08:00
Benjamin Wang
970ecfcddb
Merge pull request #14721 from nvanbenschoten/nvanbenschoten/noCommittedOnSnap
raft: don't apply entries when applying snapshot
2022-11-15 06:41:12 +08:00
Tobias Grieger
edd4d5122f
Merge pull request #14723 from nvanbenschoten/nvanbenschoten/localMsgCleanup 2022-11-14 23:34:39 +01:00
Tobias Grieger
e64d644989
Merge pull request #14624 from pavelkalinnikov/limit_inflight_bytes 2022-11-14 12:10:13 +01:00
Pavel Kalinnikov
68af01ca6e raft: add MaxInflightBytes to Config
This commit introduces the max inflight bytes setting at the Config level, and
tests that raft flow control honours it.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-13 23:05:16 +01:00
Pavel Kalinnikov
8c9c557d85 raft: factor out payloadsSize helper
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-13 23:05:16 +01:00
Pavel Kalinnikov
7bda0d7773 raft/tracker: add MaxInflightBytes to ProgressTracker
This commit plumbs the max total byte size of the Inflights type higher up the
stack to the ProgressTracker.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-13 23:05:16 +01:00
Pavel Kalinnikov
bfb7b16f4f raft/tracker: add byte size limit to Inflights type
The Inflights type has limits on the message size and the number of inflight
messages. However, a single large entry that exceeds the size limit can still
be sent. In combination with the max messages count limit, many large messages
can be sent in a row and overflow the receiver. In effect, the "max" values act
as "target" rather than hard limits.

This commit adds an additional soft limit on the total size of inflight
messages, which catches such situations and prevents the receiver overflow.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-13 23:05:16 +01:00
Tobias Grieger
de97f6aa3d raft: tidy up the unit tests some more
Use `t.Run` for each test case, and make some tests more idiomatic.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-11-13 22:34:47 +01:00
jianfei.zhang
1f4f70723f feat: use testify packages in tests
Signed-off-by: jianfei.zhang <jianfei.zhang@daocloud.io>
2022-11-13 23:38:57 +08:00
Nathan VanBenschoten
539a8410f4 raft: don't apply entries when applying snapshot
This commit removes the ability to apply log entries at the same time as
applying a snapshot. Doing so it possible, but it leads to complex code and
raises questions about what should be applied first. It also raises additional
complexity when we start allowing concurrent, asynchronous log appends and log
application. It's easiest to just disallow this.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-11 13:58:09 -05:00
Nathan VanBenschoten
95c5fed3cf raft: remove IsEmptySnap check from raftLog.hasPendingSnapshot
unstable.snapshot is never an empty snapshot. This check made it look
like it could be.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-11 13:57:40 -05:00
Benjamin Wang
bdd5347313
Merge pull request #14719 from nvanbenschoten/nvanbenschoten/nextCommittedEnts
raft: rename raftLog.nextEnts to raftLog.nextCommittedEnts
2022-11-12 02:51:30 +08:00
Nathan VanBenschoten
3711fde822 raft: rename raftLog.nextEnts to raftLog.nextCommittedEnts
Also rename hasNextEnts to hasNextCommittedEnts.
Also rename maxNextEntsSize to maxNextCommittedEntsSize.

Pure refactor.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-11 13:21:20 -05:00
Tobias Grieger
22d930b3d5
Merge pull request #14722 from nvanbenschoten/nvanbenschoten/unusedReadyContainsUpdates 2022-11-11 16:25:46 +01:00
Nathan VanBenschoten
0ea6fa542a raft: clean up IsLocalMsg and IsResponseMsg logic
Use array indexing to clean up the code and make it constant time.

Also, add a test for IsResponseMsg.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-10 03:31:31 +00:00
Nathan VanBenschoten
e0beef6830 raft: delete unused Ready.containsUpdates method
Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-09 22:25:34 -05:00
Nathan VanBenschoten
c18d79df37 raft: clarify conditions in unstable.stableTo
No change in behavior, but clarify interaction with unstable snapshot.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-09 22:12:49 -05:00
Nathan VanBenschoten
0f9d7a4f95 raft: make Message.Snapshot nullable, halve struct size
This commit makes the rarely used `raftpb.Message.Snapshot` field nullable.
In doing so, it reduces the memory size of a `raftpb.Message` message from
264 bytes to 128 bytes — a 52% reduction in size.

While this commit does not change the protobuf encoding, it does change
how that encoding is used. `(gogoproto.nullable) = false` instruct the
generated proto marshaling logic to always encode a value for the field,
even if that value is empty. `(gogoproto.nullable) = true` instructs the
generated proto marshaling logic to omit an encoded value for the field
if the field is nil.

This raises compatibility concerns in both directions. Messages encoded
by new binary versions without a `Snapshot` field will be decoded as an
empty field by old binary versions. In other words, old binary versions
can't tell the difference. However, messages encoded by old binary versions
with an empty Snapshot field will be decoded as a non-nil, empty field by
new binary versions. As a result, new binary versions need to be prepared
to handle such messages.

While Message.Snapshot is not intentionally part of the external interface
of this library, it was possible for users of the library to access it and
manipulate it. As such, this change may be considered a breaking change.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-09 17:35:52 +00:00
Pavel Kalinnikov
765a2660bc raft/tracker: use testify packages in tests
This commit removes the verbose comparisons in tests, in favor of using
assert/require helpers from the testify packages.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 23:08:54 +00:00
Pavel Kalinnikov
1ea13494eb raft/tracker: rename and comment MsgApp paused field
Make the field name and comment clearer on the fact that it's used both in
StateProbe and StateReplicate. The old name ProbeSent was slightly confusing,
and also triggered thinking that it's used only in StateProbe.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:39 +00:00
Pavel Kalinnikov
467114ed87 raft/tracker: remove unused Inflights.FreeFirstOne
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:39 +00:00
Pavel Kalinnikov
4969aa81ae raft: send empty appends when replication is paused
When Inflights to a particular node is full, i.e. MaxInflightMsgs for the
append messages flow is saturated, it is still necessary to continue sending
MsgApp to ensure progress. Currently this is achieved by "forgetting" the first
in-flight message in the window, which frees up quota for one new MsgApp.

This new message is constructed in such a way that it potentially has multiple
entries, or a large entry. The effect of this is that the in-flight limitations
can be exceeded arbitrarily, for as long as the flow to this node continues
being saturated. In particular, if a follower is stuck, the leader will keep
sending entries to it.

This commit makes the MsgApp empty when Inflights is saturated, and prevents
the described leakage of Entries to slow followers.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:39 +00:00
Pavel Kalinnikov
3bc3d2071e raft: extract Progress update on MsgApp to a method
Previously, Progress update on MsgApp send was scattered across raft.go and
tracker/progress.go. This commit better encapsulates this logic in the Progress
type.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:38 +00:00
Pavel Kalinnikov
d5ac7b833f raft: cleanup maybeSendAppend method
- avoid large indented blocks, leave the main block unindented
- declare pb.Message inlined in the sending call

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:38 +00:00
Pavel Kalinnikov
5619953f33 raft: elaborate checks in flow control tests
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:38 +00:00
Pavel Kalinnikov
0a0f0ae719 raft/rafttest: add test for replication pausing
This commit adds a data-driven test which simulates conditions under which Raft
messages flow to a particular node is throttled while in StateReplicate. The
test demonstrates that MsgApp messages with non-empty Entries may "leak" to a
paused stream every time there is successful heartbeat exchange.

Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
2022-11-08 22:21:38 +00:00
Marek Siarkowicz
2a1055c7f3 raft: Remove dependency on etcd api
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-11-08 13:56:46 +01:00
Benjamin Wang
fe7da79594 raft: remove the raft dependency on go.etcd.io/etcd/client/pkg/v3
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-08 09:20:19 +08:00
Benjamin Wang
f64bed6033
Merge pull request #14698 from ahrtr/raft_warn_20221107
raft: change the log from debug to warning when uncommitted size exceeds threshold
2022-11-07 19:57:33 +08:00
Benjamin Wang
3e07097d77
Merge pull request #14545 from nvanbenschoten/nvanbenschoten/simplifyAutoLeave
raft: simplify auto-leave joint config on entry application logic
2022-11-07 17:20:26 +08:00
Benjamin Wang
a671e3ebd1 raft: change the log from debug to warning when uncommitted size exceeds max threshold
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-07 17:17:48 +08:00
王霄霄
aac5feec94 raft: remove duplicate letter in comment.
Signed-off-by: Wang Xiaoxiao 1141195807@qq.com
Signed-off-by: 王霄霄 <1141195807@qq.com>
2022-10-22 19:13:40 +08:00
Nathan VanBenschoten
419ee8a9c6 raft: panic on self-addressed messages
These are nonsensical and a network implementation is not required
to handle them correctly, so panic instead of sending them out.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-10-06 20:25:07 -04:00
Nathan VanBenschoten
c50e728518 raft: simplify auto-leave joint config on entry application logic
This commit simplifies the logic added in 37c7e4d to auto-leave joint
configurations. It does so by making the following adjustments to the
code:

- remove the `oldApplied <= r.pendingConfIndex` condition. This does
  not seem necessary. When a node first attempts to auto-leave a joint
  config, it will bump `r.pendingConfIndex` when proposing. In cases
  where `oldApplied >= r.pendingConfIndex`, the proposal must have
  already been applied. Reviewers should double check this.
- use raft.Step instead of custom proposal code. This code was already
  present in stepLeader, so there was no reason to duplicate it. This
  would have avoided bugs like the one we fixed in #14538.
- use `confChangeToMsg` to generate message, to centralize the creation
  of all `MsgProp{EntryConfChange}` messages.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-10-03 02:11:56 -04:00
Benjamin Wang
a932fb58f2
Merge pull request #14539 from nvanbenschoten/nvanbenschoten/advanceHardState
raft: update prevHardSt on Ready accept, not advance
2022-09-30 16:55:45 +08:00
Nathan VanBenschoten
bd34388721 raft: broadcast MsgApp on auto-leave joint config proposal
This commit ensures that the raft leader eagerly broadcasts a MsgApp to
each follower when initiating an automatic transition out of a joint
configuration. This had been missed previously, which could lead to
delayed completion of an auto-transition.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-09-29 12:33:20 -04:00
Nathan VanBenschoten
9a03632043 raft: update prevHardSt on Ready accept, not advance
This commit updates the `RawNode`'s `prevHardSt` to the new HardState in
`acceptReady` instead of on `Advance`. This aligns the handling of
`prevHardSt` with that of `prevSoftSt` (and other fields like `msgs`)
and simplifies the logic in `Advance`.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-09-29 12:33:04 -04:00
Benjamin Wang
7f10dccbaf Bump go 1.19: update all the dependencies and go.sum files
1. run ./scripts/fix.sh;
2. cd tools/mod; gofmt -w . & go mod tidy;

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-22 08:47:46 +08:00
Benjamin Wang
cd0b1d0c66 Bump go 1.19: upgrade go version to 1.19 in all go.mod files
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-22 08:47:46 +08:00
Benjamin Wang
31d9664cb5
Merge pull request #14413 from tbg/raft-single-voter
raft: don't emit unstable CommittedEntries
2022-09-22 08:43:37 +08:00
Tobias Grieger
9ad36eecab fixup! address comments
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
3c3e30a30e Revert "raft: directly update leader in advance"
This reverts commit d73a986e4edb15ef9dbfc994f1cbf5e96694d877, which
was added only for benchmarking purposes.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
67c3522893 raft: directly update leader in advance
This makes the alternative option of implementing the leader's self-ack
of entry append the default.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
894e5cb685 move ctx param to the front
to appease linter

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:42 +02:00
Tobias Grieger
f62b9d5e19 remove TestNodeReadIndex
This is tested directly at the level of `RawNode` in
`TestRawNodeReadIndex`. `*node` is a thin wrapper around `RawNode` so
this is sufficient.

The reason to remove the test is that it now incurs data races
since it's not possible to adjust the `readStates` and `step`
fields while the node is running, and there is no primitive
to synchronize with its goroutine. This could all be fixed
but it's not worth it.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 09:01:40 +02:00
Tobias Grieger
f7dcb9ec2a TestInteraction
Reviewed the diff in detail.
The changes here were benign, just the extra raft cycle.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
f7b0a6ad33 TestRawNodeBoundedLogGrowthWithPartition
Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
02efe5135d TestRawNodeStart
Now also sees the extra Ready cycle.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00
Tobias Grieger
79bf3b0df4 TestRawNodeJointAutoLeave
This now needs an additional Ready cycle to apply the previous conf change,
so the finalizing conf change does too.

Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>
2022-09-20 08:59:37 +02:00