Wei Fu
32ee8b877a
etcdserver: drain leaky goroutines before test completed
...
If pending changes aren't committed before test completed, it might cause
data race when we don't drain all the background goroutines.
```bash
$ cd server
$ go test -race -v -run TestApplyRepeat ./etcdserver
...
panic: Log in goroutine after TestApplyRepeat has completed: 2024-02-03T17:06:13.262+0800 DEBUG bbolt Committing transaction 2
goroutine 81 [running]:
testing.(*common).logDepth(0xc000502820, {0xc0001b0460, 0x41}, 0x3)
/usr/local/go/src/testing/testing.go:1022 +0x6d4
testing.(*common).log(...)
/usr/local/go/src/testing/testing.go:1004
testing.(*common).Logf(0xc000502820, {0x1421ad7, 0x2}, {0xc000603520, 0x1, 0x1})
/usr/local/go/src/testing/testing.go:1055 +0xa5
go.uber.org/zap/zaptest.testingWriter.Write({{0x15f1f90?, 0xc000502820?}, 0xda?}, {0xc000119800, 0x42, 0x400})
/home/fuwei/go/pkg/mod/go.uber.org/zap@v1.26.0/zaptest/logger.go:130 +0x11e
go.uber.org/zap/zapcore.(*ioCore).Write(0xc0000b55c0, {0xff, {0xc1679e614f9fd7a4, 0x73a3657, 0x1cc2400}, {0x1422b2d, 0x5}, {0xc0001a0330, 0x18}, {0x0, ...}, ...}, ...)
/home/fuwei/go/pkg/mod/go.uber.org/zap@v1.26.0/zapcore/core.go:99 +0x193
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000115930, {0x0, 0x0, 0x0})
/home/fuwei/go/pkg/mod/go.uber.org/zap@v1.26.0/zapcore/entry.go:253 +0x2f0
go.uber.org/zap.(*SugaredLogger).log(0xc0001960f8, 0xff, {0x1437885, 0x19}, {0xc0006034e0, 0x1, 0x1}, {0x0, 0x0, 0x0})
/home/fuwei/go/pkg/mod/go.uber.org/zap@v1.26.0/sugar.go:316 +0x130
go.uber.org/zap.(*SugaredLogger).Debugf(...)
/home/fuwei/go/pkg/mod/go.uber.org/zap@v1.26.0/sugar.go:171
go.etcd.io/bbolt.(*Tx).Commit(0xc0001aa9a0)
/home/fuwei/go/pkg/mod/go.etcd.io/bbolt@v1.4.0-alpha.0/tx.go:173 +0x206
go.etcd.io/etcd/server/v3/storage/backend.(*batchTx).commit(0xc00019b180, 0x0)
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/batch_tx.go:269 +0xdf
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).unsafeCommit(0xc00019b180, 0x0)
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/batch_tx.go:378 +0x425
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).commit(0xc00019b180, 0x80?)
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/batch_tx.go:355 +0x78
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).Commit(0xc00019b180)
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/batch_tx.go:342 +0x35
go.etcd.io/etcd/server/v3/storage/backend.(*backend).run(0xc000478180)
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/backend.go:426 +0x2c7
created by go.etcd.io/etcd/server/v3/storage/backend.newBackend in goroutine 80
/home/fuwei/go/src/go.etcd.io/etcd/server/storage/backend/backend.go:227 +0xbfd
FAIL go.etcd.io/etcd/server/v3/etcdserver 0.129s
FAIL
```
This patch also drains goroutines related to raftNode and watch store.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2024-02-03 18:58:17 +08:00
Benjamin Wang
fae9435b66
test: fix unit test Instability
...
When two members in a 5 member cluster are corrupted, and they
have different hashes, etcd will raise alarm for both members,
but the order isn't guaranteed. But if the two corrupted members
have the same hash, then the order is guaranteed. The leader
always raise alarm in the same order as the member list.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-29 06:51:50 +08:00
Benjamin Wang
8b98fee9ce
etcdserver: detect corrupted member based on quorum
...
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.
We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00