The gRPC server supports to use GracefulStop to drain all the inflight RPCs,
including streaming RPCs.
When we use non-cmux mode to start gRPC server (non-TLS or
TLS+gRPC-only), we always invoke GracefulStop to drain requests.
For cmux mode (gRPC.ServeHTTP), since the connection is maintained by http
server, gRPC server is unable to send GOAWAY control frame to client.
So, it's always force close all the connections and doesn't drain
requests by default.
In gRPC v1.61.0 version, it introduces new experimental feature
`WaitForHandlers` to block gRPC.Stop() until all the RPCs finish. This patch
is to use `WaitForHandlers` for cmux mode's graceful shutdown.
This patch also introduces `v3rpcBeforeSnapshot` failpoint. That's used
to verify cmux mode's graceful shutdown behaviour.
For TestAuthGracefulDisable (tests/common) case, increased timeout from
10s to 15s because we try to graceful shutdown after connection closed
and it takes more time than before.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
As the doc of http.Response.Body says:
// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body.
Signed-off-by: Jes Cok <xigua67damn@gmail.com>
When implementing the fix for progress notifications
(https://github.com/etcd-io/etcd/pull/15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.
Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Add two separate probes, one for liveness and one for readiness. The liveness probe would check that the local individual node is up and running, or else restart the node, while the readiness probe would check that the cluster is ready to serve traffic. This would make etcd health-check fully Kubernetes API complient.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
From the Go specification [1]:
"1. For a nil slice, the number of iterations is 0."
`len` returns 0 if the slice or map is nil [2]. Therefore, checking
`len(v) > 0` around a loop is unnecessary.
[1]: https://go.dev/ref/spec#For_range
[2]: https://pkg.go.dev/builtin#len
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
It's to deflake TestAuthMemberRemove.
When the client has multiple endpoints, the client might send a request
with valid token to the follower member which hasn't received token
replicated log yet. The member will reject the request.
For instance, the maintenance.Status API will return "auth: invalid auth
token". But the client doesn't identify the error. The client won't retry to
refresh auth token. The maintenance.Status should togRPCError before return
so that the client can reflesh token. It's align with existing API.
Since the maintenance client always creates one connection to target
member, the member will have the token after refresh auth.
Maybe we can introduce a sync to wait for member is ready with token,
instead of refreshing.
Fixes: #15758
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.
This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.
1. Generate a message with WatchId -1, indicating the revision for
*all* watchers in the stream
2. Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.
Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>