Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.
This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.
1. Generate a message with WatchId -1, indicating the revision for
*all* watchers in the stream
2. Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
We shouldn't fail the grpc-server (completely) by a not implemented RPC.
Failing whole server by remote request is anti-pattern and security
risk.
Refer to https://github.com/etcd-io/etcd/runs/7034342964?check_suite_focus=true#step:5:2284
```
=== RUN TestWatchRequestProgress/1-watcher
panic: not implemented
goroutine 83024 [running]:
go.etcd.io/etcd/proxy/grpcproxy.(*watchProxyStream).recvLoop(0xc009232f00, 0x4a73e1, 0xc00e2406e0)
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:265 +0xbf2
go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch.func1(0xc0038a3bc0, 0xc009232f00)
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:125 +0x70
created by go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:123 +0x73b
FAIL go.etcd.io/etcd/clientv3/integration 222.813s
FAIL
```
Signed-off-by: Benjamin Wang <wachao@vmware.com>
grpc proxy opens additional 2 watching channels. The metric is shared
between etcd-server & grpc_proxy, so all assertions on number of open
watch channels need to take in consideration the additional "2"
channels.
Currently, watch cancel requests are only sent to the server after a
message comes through on a watch where the client has cancelled. This
means that cancelled watches that don't receive any new messages are
never cancelled; they persist for the lifetime of the client stream.
This has negative connotations for locking applications where a watch
may observe a key which might never change again after cancellation,
leading to many accumulating watches on the server.
By cancelling proactively, in most cases we simply move the cancel
request to happen earlier, and additionally we solve the case where the
cancel request would never be sent.
Fixes#9416
Heavy inspiration drawn from the solutions proposed there.
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated
Related #10528#10438
Current error paths of TestV3WatchFromCurrentRevision don't clean the
used resources including goroutines. Because go's tests are executed
continuously in a single process, the leaked goroutines makes error
logs bloated like the below case:
https://jenkins-etcd-public.prod.coreos.systems/job/etcd-coverage/2143/
This commit lets the error paths clean the resources.
Common pattern was defer cancel(), but clus.Terminate() at the end of
the test. This appears to lead to a deadlock that is only released
once the context times out, causing inflated test times.
Now user can filter events with types. The API is also extensible.
It might make sense for the proxy to filter out events based on
more expensive/customized filter.
progressReportIntervalMilliseconds (old
ProgressReportIntervalMilliseconds) is accessed by multiple goroutines
and it is reported as race.
For avoiding this report, this commit wraps the variable with
functions. They access the variable with atomic operations so the race
won't be reported.
Fix https://github.com/coreos/etcd/issues/4730.
Previously we put keys async and there might be a race when
the watch triggers before the put receives the response. When that
happens, put might fails to get the response since we shutdown the server
when watch triggers.