ClusterContext is used by "e2e" or "integration" to extend the
ClusterConfig. The common test cases shouldn't care about what
data is encoded or included; instead "e2e" or "integration"
framework should decode or parse it separately.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In the TestDowngradeUpgradeCluster case, the brand-new cluster is using
simple-config-changer, which means that entries has been committed
before leader election and these entries will be applied when etcdserver
starts to receive apply-requests. The simple-config-changer will mark
the `confState` dirty and the storage backend precommit hook will update
the `confState`.
For the new cluster, the storage version is nil at the beginning. And
it will be v3.5 if the `confState` record has been committed. And it
will be >v3.5 if the `storageVersion` record has been committed.
When the new cluster is ready, the leader will set init cluster version
with v3.6.x. And then it will trigger the `monitorStorageVersion` to
update the `storageVersion` to v3.6.x. If the `confState` record has
been updated before cluster version update, we will get storageVersion
record.
If the storage backend doesn't commit in time, the
`monitorStorageVersion` won't update the version because of `cannot
detect storage schema version: missing confstate information`.
And then we file the downgrade request before next round of
`monitorStorageVersion`(per 4 second), the cluster version will be
v3.5.0 which is equal to the `UnsafeDetectSchemaVersion`'s result.
And we won't see that `The server is ready to downgrade`.
It is easy to reproduce the issue if you use cpuset or taskset to limit
in two cpus.
So, we should wait for the new cluster's storage ready before downgrade
request.
Fixes: #14540
Signed-off-by: Wei Fu <fuweid89@gmail.com>
It doesn't make sense to always pass a AuthConfig parameter for
test cases which do not enable auth at all. So refactoring the
Client interface method so that it accepts a `ClientOption`
variadic parameter.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Rebased this PR. There is no response from the original author,
so Benjamin (ahrtr@) continue to work on this PR.
Signed-off-by: Vitalii Levitskii <vitalii@uber.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,
```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl alarm list; done
kill -9 <etcd_pid>
etcd
```
Signed-off-by: Benjamin Wang <wachao@vmware.com>
When we can't reach quorum, we were waiting forever and never sending
the systemd notify message. As a result, systemd would eventually time out
and restart the etcd process which likely would make the unhealthy cluster
in an even worse state
Improves #13785
Signed-off-by: Nicolai Moore <niconorsk@gmail.com>