This attempts to fix a special case of the problem described in #12385,
where trying to do `clientv3.Watch` with an expired token would result
in `ErrGRPCPermissionDenied`, due to the failing authorization check in
`isWatchPermitted`. Furthermore, the client can't auto recover, since
`shouldRefreshToken` rightly returns false for the permission denied
error.
In this case, we would like to have a runbook to dynamically disable
auth, without causing any disruption. Doing so would immediately expire
all existing tokens, which would then cause the behavior described
above. This means existing watchers would still work for a period of
time after disabling auth, until they have to reconnect, e.g. due to a
rolling restart of server nodes.
This commit adds a client-side fix and a server-side fix, either of
which is sufficient to get the added test case to pass. Note that it is
an e2e test case instead of an integration one, as the reconnect only
happens if the server node is stopped via SIGINT or SIGTERM.
A generic fix for the problem described in #12385 would be better, as
that shall also fix this special case. However, the fix would likely be
a lot more involved, as some untangling of authn/authz is required.
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Current checkpointing mechanism is buggy. New checkpoints for any lease
are scheduled only until the first leader change. Added fix for that
and a test that will check it.
Upgrading from v1.0.1.
Upgrading related dependencies
------------------------------
The following dependencies also had to be upgraded:
- go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.26.1
From v0.25.0. This gets rid of a transitive dependency on go.opentelemetry.io/otel@v1.0.1.
- google.golang.org/genproto@v0.0.0-20211118181313-81c1377c94b1
This is because etcd v3.5 will panic when it encounters
ClusterVersionSet entry with version >3.5.0. For downgrades to v3.5 to
work we need to make sure this entry is snapshotted.
Problem with old code was that during downgrade only members with
downgrade target version were allowed to join. This is unrealistic as
it doesn't handle any members to disconnect/rejoin.
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>