This commit makes the rarely used `raftpb.Message.Snapshot` field nullable.
In doing so, it reduces the memory size of a `raftpb.Message` message from
264 bytes to 128 bytes — a 52% reduction in size.
While this commit does not change the protobuf encoding, it does change
how that encoding is used. `(gogoproto.nullable) = false` instruct the
generated proto marshaling logic to always encode a value for the field,
even if that value is empty. `(gogoproto.nullable) = true` instructs the
generated proto marshaling logic to omit an encoded value for the field
if the field is nil.
This raises compatibility concerns in both directions. Messages encoded
by new binary versions without a `Snapshot` field will be decoded as an
empty field by old binary versions. In other words, old binary versions
can't tell the difference. However, messages encoded by old binary versions
with an empty Snapshot field will be decoded as a non-nil, empty field by
new binary versions. As a result, new binary versions need to be prepared
to handle such messages.
While Message.Snapshot is not intentionally part of the external interface
of this library, it was possible for users of the library to access it and
manipulate it. As such, this change may be considered a breaking change.
Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This fix avoids the assumption of knowing the current version of the
binary. We can query the binary with the version flag to get the actual
version of the given binary we upgrade and downgrade to. The
respectively reported versions should match what is returned by the
version endpoint.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
This commit removes the verbose comparisons in tests, in favor of using
assert/require helpers from the testify packages.
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Make the field name and comment clearer on the fact that it's used both in
StateProbe and StateReplicate. The old name ProbeSent was slightly confusing,
and also triggered thinking that it's used only in StateProbe.
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
When Inflights to a particular node is full, i.e. MaxInflightMsgs for the
append messages flow is saturated, it is still necessary to continue sending
MsgApp to ensure progress. Currently this is achieved by "forgetting" the first
in-flight message in the window, which frees up quota for one new MsgApp.
This new message is constructed in such a way that it potentially has multiple
entries, or a large entry. The effect of this is that the in-flight limitations
can be exceeded arbitrarily, for as long as the flow to this node continues
being saturated. In particular, if a follower is stuck, the leader will keep
sending entries to it.
This commit makes the MsgApp empty when Inflights is saturated, and prevents
the described leakage of Entries to slow followers.
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Previously, Progress update on MsgApp send was scattered across raft.go and
tracker/progress.go. This commit better encapsulates this logic in the Progress
type.
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
- avoid large indented blocks, leave the main block unindented
- declare pb.Message inlined in the sending call
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
This commit adds a data-driven test which simulates conditions under which Raft
messages flow to a particular node is throttled while in StateReplicate. The
test demonstrates that MsgApp messages with non-empty Entries may "leak" to a
paused stream every time there is successful heartbeat exchange.
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Afer moving `ClusterVersion` and related constants into e2e packages,
some e2e test cases are broken, so we need to update them to use the
correct definitions.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
ClusterContext is used by "e2e" or "integration" to extend the
ClusterConfig. The common test cases shouldn't care about what
data is encoded or included; instead "e2e" or "integration"
framework should decode or parse it separately.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The client may connect to a different endpint as the target to be
maintained. When auth is enabled, the target endpoint might haven't
finished applying the authentiation request, so it might reject the
corresponding maintenance request due to permission denied.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The existing client may connect to different endpoint from the
specific endpoint to be maintained. Maintenance operations do not
go through raft at all, so it might run into issue if the server
hasn't finished applying the authentication request.
Let's work with an example. Assuming the existing client connects to
ep1, while the user wants to maintain ep2. If we getToken again, it
sends an authentication request, which goes through raft. When the
specific endpoint receives the maintenance request, it might haven't
finished previous authentication request, but the new token is already
carried in the context, so it will reject the maintenance request
due to invalid token.
We already have retry logic in `unaryClientInterceptor` and
`streamClientInterceptor`. When the token expires, it can automatically
refresh the token, so it should be safe to remove the `getToken`
logic in `maintenance.dial`
Signed-off-by: Benjamin Wang <wachao@vmware.com>