The client retries connection without backoff when the server is gone
after the watch stream is established. This results in high CPU usage
in the client process. This change introduces backoff when the stream is
failed and unavailable.
Signed-off-by: Hisanobu Tomari <posco.grubb@gmail.com>
Currently, watch cancel requests are only sent to the server after a
message comes through on a watch where the client has cancelled. This
means that cancelled watches that don't receive any new messages are
never cancelled; they persist for the lifetime of the client stream.
This has negative connotations for locking applications where a watch
may observe a key which might never change again after cancellation,
leading to many accumulating watches on the server.
By cancelling proactively, in most cases we simply move the cancel
request to happen earlier, and additionally we solve the case where the
cancel request would never be sent.
Fixes#9416
Heavy inspiration drawn from the solutions proposed there.
Description: w.mu is locked at line 385 and unlocked at line 396. Among 5 return statements in this function, 4 are below line 396 but there is 1 return at line 387.
Fix: Add w.mu.Unlock() before that return at line 387.
Closing of watch by client will cancel the watch grpc stream and
can produce a context canceled error. However, since client
simply wanted to close the watcher the error can create confusion
that something went wrong instead of a successful close. Ensure
that Close do not return error.
Fixed#10340
This change allows Watch users to retrieve the cancel reason when a
watch is canceled by the server. Additionally WatchResponses with
closeErr set now have Canceled set true which is in line with the
documentation for the Canceled field.
The "too slow" comment is rather vague. If the server closes
the watch for being too slow (it doesn't seem to any more), the
watch client should gracefully resume instead of forcing the
user to handle it.
Also removed the 'opts' comment since it wasn't being maintained.
Turns out the optimization to ignore setting the init rev for
current revision watches breaks some ordering assumptions. Since
Watch only returns a channel once it gets a response, it should
bind the revision at the time of the first create response.
Was causing TestWatchReconnInit to fail.
When a grpc watch stream is torn down, it will join on its logical substream
goroutines by waiting for each to close a channel. This doesn't guarantee
the substream is fully exited, though, but only about to exit and can be
waiting to resume even after Watch.Close finishes. Instead, use a
waitgroup.Done at the very end of the substream defer.
Fixes#7573
If substream is closing but outc is still open while reconnecting, then outc
would only be closed once the watch client would connect or once the watch
client is closed. This was leading to deadlocks in the proxy tests. Instead,
close immediately if the context is canceled.
Fixes#7503
Was overcounting the number of expected closing messages; the resuming
list may have nil entries. Also the full client wasn't closing the watcher
client, only canceling its context, so client closes weren't joining with
the watcher shutdown.
Fixes#6605
The initial revision was being updated in the substream goroutine defer;
this was racing with the resume path fetching the initial revision when
the substream closes during resume. Instead, update the initial revision
whenever the substream processes a new watch response. Since the substream
cannot receive a watch response while it is resuming, the write to the
initial revision is ordered to always happen after the resume read.
Fixes#6586