Afer moving `ClusterVersion` and related constants into e2e packages,
some e2e test cases are broken, so we need to update them to use the
correct definitions.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
ClusterContext is used by "e2e" or "integration" to extend the
ClusterConfig. The common test cases shouldn't care about what
data is encoded or included; instead "e2e" or "integration"
framework should decode or parse it separately.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The client may connect to a different endpint as the target to be
maintained. When auth is enabled, the target endpoint might haven't
finished applying the authentiation request, so it might reject the
corresponding maintenance request due to permission denied.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The existing client may connect to different endpoint from the
specific endpoint to be maintained. Maintenance operations do not
go through raft at all, so it might run into issue if the server
hasn't finished applying the authentication request.
Let's work with an example. Assuming the existing client connects to
ep1, while the user wants to maintain ep2. If we getToken again, it
sends an authentication request, which goes through raft. When the
specific endpoint receives the maintenance request, it might haven't
finished previous authentication request, but the new token is already
carried in the context, so it will reject the maintenance request
due to invalid token.
We already have retry logic in `unaryClientInterceptor` and
`streamClientInterceptor`. When the token expires, it can automatically
refresh the token, so it should be safe to remove the `getToken`
logic in `maintenance.dial`
Signed-off-by: Benjamin Wang <wachao@vmware.com>
All maintenance APIs require admin privilege when auth is enabled,
otherwise, the request will be rejected. If auth isn't enabled,
then no such requirement any more.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In the TestDowngradeUpgradeCluster case, the brand-new cluster is using
simple-config-changer, which means that entries has been committed
before leader election and these entries will be applied when etcdserver
starts to receive apply-requests. The simple-config-changer will mark
the `confState` dirty and the storage backend precommit hook will update
the `confState`.
For the new cluster, the storage version is nil at the beginning. And
it will be v3.5 if the `confState` record has been committed. And it
will be >v3.5 if the `storageVersion` record has been committed.
When the new cluster is ready, the leader will set init cluster version
with v3.6.x. And then it will trigger the `monitorStorageVersion` to
update the `storageVersion` to v3.6.x. If the `confState` record has
been updated before cluster version update, we will get storageVersion
record.
If the storage backend doesn't commit in time, the
`monitorStorageVersion` won't update the version because of `cannot
detect storage schema version: missing confstate information`.
And then we file the downgrade request before next round of
`monitorStorageVersion`(per 4 second), the cluster version will be
v3.5.0 which is equal to the `UnsafeDetectSchemaVersion`'s result.
And we won't see that `The server is ready to downgrade`.
It is easy to reproduce the issue if you use cpuset or taskset to limit
in two cpus.
So, we should wait for the new cluster's storage ready before downgrade
request.
Fixes: #14540
Signed-off-by: Wei Fu <fuweid89@gmail.com>