The cluster version will be initialized after the member becomes leader.
The update is handled asynchronously. It couldn't be updated if the member
has been closed and the go-runtime picks the `s.stopping` channel first.
```go
// e2a5df534c/server/etcdserver/server.go (L2170)
func (s *EtcdServer) monitorClusterVersions() {
...
for {
select {
case <-s.firstCommitInTerm.Receive():
case <-time.After(monitorVersionInterval):
case <-s.stopping:
return
}
...
}
}
```
Or after the `s.stopping` has been closed, the [UpdateClusterVersion][1] won't
file GoAttach successfully. For the #15409, we can see the warn log
`server has stopped; skipping GoAttach` from GoAttach:
```plain
https://github.com/etcd-io/etcd/actions/runs/4340931587/jobs/7580103902
logger.go:130: 2023-03-06T07:36:44.253Z WARN default stopping grpc server due to error {"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
logger.go:130: 2023-03-06T07:36:44.253Z WARN default stopped grpc server due to error {"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
logger.go:130: 2023-03-06T07:36:44.253Z ERROR default setting up serving from embedded etcd failed. {"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
logger.go:130: 2023-03-06T07:36:44.253Z ERROR default setting up serving from embedded etcd failed. {"error": "http: Server closed"}
logger.go:130: 2023-03-06T07:36:44.253Z INFO default skipped leadership transfer for single voting member cluster {"local-member-id": "8e9e05c52164694d", "current-leader-member-id": "8e9e05c52164694d"}
logger.go:130: 2023-03-06T07:36:44.253Z WARN default server has stopped; skipping GoAttach
...
```
If the cluster version isn't updated, the minimum storage version will
be v3.5 because the [AuthStatus][2] is introduced in [v3.5][3].
The compare will fail.
To fix this issue, we should wait for cluster version to become ready
after server is ready to serve request.
[1]: <e2a5df534c/server/etcdserver/adapters.go (L45)>
[2]: <071e70cdc4>
[3]: <1b4e54c238>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The TestLeasingDeleteRangeContendTxn is trying to test for RangeDelete when
the target resources are being updated. When the `txnLeasing` wants a
server-side transaction, it needs to ensure all the keys mod revision should
be leass than what it saw. If the compare fails, it will repeat to apply the
server-side transaction until it is sucessful. I believe the test-case is
trying to verify how the `txnLeasing` handles the race issue.
Before the patch #15401, the resource-updating goroutine keeps updating until
the RangeDelete finishes. The testcase is flaky because two goroutines are
sharing one `ctx` and grpc-go client won't wait for the response if `ctx`
has been canceled.
For example,
| DelLease Goroutine | PutLease Goroutine | ETCD Server | Key/0 Status |
| -- | --- | -- | -- |
| deleted | | | version = 0 |
| | send update(key/0=123) req | received update(key/0=123) req | version = 0 |
| cancel | | | version = 0 |
| | exit because of cancel | | version = 0 |
| get key/0 by putkv | | | version = 0 |
| | | applied update(key/0=123) | version = 1 |
| get key/0 by raw-cli | | | version = 1 |
So `raw-cli` gets `[key/0=123]` while the `putkv` gets `[]`. If `putkv`
applies two update reqs to ETCD server and the last one is canceled
before apply, the error will be like:
```
expected [key:"key/0" version:2 value:"123" ], got [key:"key/0" version:1 value:"123" ]
```
The resource-updating goroutine should not share the ctx with RangeDelete here.
And I also revert current main branch because the resource-update goroutine
only updates 8 times and might exit before `RangeDelete`. In this case,
the `txnLeasing` is not handling the race issue.
Fixes: #15352
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The huge (100k+) value was justified when storev2 was being dumped completely with every snapshot.
With storev2 being decomissioned we can checkpoint more frequently for faster recovery.
Signed-off-by: James Blair <mail@jamesblair.net>