The versions such as V3_5, V3_6, etc. are defined everywhere in
the repo. In this commit, we get all of the constant versions defined
in a centralized place, so that they can be reused by all cases.
Downstream users of etcd experience build issues when using dependencies
which require more recent (incompatible) versions of opentelemetry. This
commit upgrades the dependencies so that downstream users stop
experiencing these issues.
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.
Upgrading from v1.0.1.
Upgrading related dependencies
------------------------------
The following dependencies also had to be upgraded:
- go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.26.1
From v0.25.0. This gets rid of a transitive dependency on go.opentelemetry.io/otel@v1.0.1.
- google.golang.org/genproto@v0.0.0-20211118181313-81c1377c94b1
- ErrGRPCNotCapable("etcdserver: not capable") -> codes.FailedPrecondition (it will not autofix, it requires new version of server)
- ErrGPRCNotSupportedForLearner("etcdserver: rpc not supported for learner") -> codes.FailedPrecondition (as long as its learner, the call will not work)
- ErrGRPCClusterVersionUnavailable("etcdserver: cluster version not found during downgrade") -> codes.FailedPrecondition (backend does not contain the version (old etcd?) so retry will not help)
https://github.com/etcd-io/etcd/runs/2599598633?check_suite_focus=true
```
{"level":"warn","ts":"2021-05-17T09:55:30.246Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000539880/#initially=[unix://localhost:m30]","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
{"level":"warn","ts":"2021-05-17T09:55:30.270Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker
failed","target":"etcd-endpoints://0xc000539880/#initially=[unix://localhost:m30]","attempt":1,"error":"rpc
error: code = Unavailable desc = etcdserver: rpc not supported for
learner"}`
```
While it appears that etcd is not vulnerable to CVE-2021-3121,
it is a good idea to update to the new generator so that new
vulnerable code isn't generated in any future APIs. Also, this
lays the issue to rest of whether there is any issue with
etcd and CVE-2021-3121.
Before this patch, a client which cancels the context for a watch results in the
server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of
a gRPC `Unavailable` metric in association with the client watch cancellation.
The metric looks like this:
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}
So, the watch server has misidentified the error as a server error and then
propagates the mistake to metrics, leading to a false indicator that the leader
has been lost. This false signal then leads to false alerting.
The commit 9c103dd0dedfc723cd4f33b6a5e81343d8a6bae7 introduced an interceptor which wraps
watch streams requiring a leader, causing those streams to be actively canceled
when leader loss is detected.
However, the error handling code assumes all stream context cancellations are
from the interceptor. This assumption is broken when the context was canceled
because of a client stream cancelation.
The core challenge is lack of information conveyed via `context.Context` which
is shared by both the send and receive sides of the stream handling and is
subject to cancellation by all paths (including the gRPC library itself). If any
piece of the system cancels the shared context, there's no way for a context
consumer to understand who cancelled the context or why.
To solve the ambiguity of the stream interceptor code specifically, this patch
introduces a custom context struct which the interceptor uses to expose a custom
error through the context when the interceptor decides to actively cancel a
stream. Now the consuming side can more safely assume a generic context
cancellation can be propagated as a cancellation, and the server generated
leader error is preserved and propagated normally without any special inference.
When a client cancels the stream, there remains a race in the error handling
code between the send and receive goroutines whereby the underlying gRPC error
is lost in the case where the send path returns and is handled first, but this
issue can be taken separately as no matter which paths wins, we can detect a
generic cancellation.
This is a replacement of https://github.com/etcd-io/etcd/pull/11375.
Fixes#10289, #9725, #9576, #9166