Afer moving `ClusterVersion` and related constants into e2e packages,
some e2e test cases are broken, so we need to update them to use the
correct definitions.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In the TestDowngradeUpgradeCluster case, the brand-new cluster is using
simple-config-changer, which means that entries has been committed
before leader election and these entries will be applied when etcdserver
starts to receive apply-requests. The simple-config-changer will mark
the `confState` dirty and the storage backend precommit hook will update
the `confState`.
For the new cluster, the storage version is nil at the beginning. And
it will be v3.5 if the `confState` record has been committed. And it
will be >v3.5 if the `storageVersion` record has been committed.
When the new cluster is ready, the leader will set init cluster version
with v3.6.x. And then it will trigger the `monitorStorageVersion` to
update the `storageVersion` to v3.6.x. If the `confState` record has
been updated before cluster version update, we will get storageVersion
record.
If the storage backend doesn't commit in time, the
`monitorStorageVersion` won't update the version because of `cannot
detect storage schema version: missing confstate information`.
And then we file the downgrade request before next round of
`monitorStorageVersion`(per 4 second), the cluster version will be
v3.5.0 which is equal to the `UnsafeDetectSchemaVersion`'s result.
And we won't see that `The server is ready to downgrade`.
It is easy to reproduce the issue if you use cpuset or taskset to limit
in two cpus.
So, we should wait for the new cluster's storage ready before downgrade
request.
Fixes: #14540
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Rebased this PR. There is no response from the original author,
so Benjamin (ahrtr@) continue to work on this PR.
Signed-off-by: Vitalii Levitskii <vitalii@uber.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
submitConcurrentWatch use sleep 3s to wait for all the watch connections
ready. When the number of connections increases, like 1000, the 3s is not
enough and the test case becomes flaky.
In this commit, spawn curl process and check the ouput line with
`created":true}}` to make sure that the connection has been initialized
and ready to receive the events. It is reliable to test the following
range request.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Due to a duplicate call of clientConfigFromCmd, the move-leader command
would fail with "conflicting environment variable is shadowed by corresponding command-line flag".
Also in scenarios where no command-line flag was supplied.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
When we can't reach quorum, we were waiting forever and never sending
the systemd notify message. As a result, systemd would eventually time out
and restart the etcd process which likely would make the unhealthy cluster
in an even worse state
Improves #13785
Signed-off-by: Nicolai Moore <niconorsk@gmail.com>
Notes:
1. compactPhysical in ctlCtx and withQuota aren't used at all, they are dead code.
2. quotaBackendBytes in ctlCtx isn't used either. Instead, users (test cases) set the QuotaBackendBytes directly.
Signed-off-by: Benjamin Wang <wachao@vmware.com>