Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
* Use mod revision for optimistic concurrency.
* Introduce range requests as more general then get
* Add kubernetes specific traffic generation, for now using pull, but
expected to evolve to use watch.
* Introduce kubernetes specific test scenario
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.
Signed-off-by: James Blair <mail@jamesblair.net>
In a [scheduled test][1], the error shows
```
2023-04-19T11:16:15.8166316Z traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```
According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.
I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.
```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
t.controlBuf.throttle()
frame, err := t.framer.fr.ReadFrame()
if t.keepaliveEnabled {
+ time.Sleep(2 * time.Millisecond)
atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
}
if err != nil {
```
`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.
And in a [scheduled test][5], the error shows
```
logger.go:130: 2023-04-22T10:45:52.646Z INFO Failed to trigger failpoint {"failpoint": "blackhole", "error": "context deadline exceeded"}
```
Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.
The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.
```
injectFailpoints(1min timeout)
failpoint.Inject
triggerBlockhole.Trigger
blackhole
waitTillSnapshot
```
> NOTE: I didn't reproduce it either. :(
Reference:
[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>
REF: #15763
Signed-off-by: Wei Fu <fuweid89@gmail.com>
It's followup of #15667.
This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.
The test env
```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```
Before change:
* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04
After change:
* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07
Based on the test result, I think it's safe to be enabled by default.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.
Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.
T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)
After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.
However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.
```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...
INFO m2.raft 3da8ba707f1a21a4 became leader at term 2 {"member": "m2"}
...
INFO m2 triggering snapshot {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader
INFO m0.raft 99626fe5001fde8b became follower at term 2 {"member": "m0"}
INFO m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2 {"member": "m0"}
DEBUG m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23 {"member": "m2"}
DEBUG m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15] {"member": "m2"}
DEBUG m0 Applying entries {"member": "m0", "num-entries": 15}
DEBUG m0 Applying entry {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}
....
INFO m2 saved snapshot {"member": "m2", "snapshot-index": 22}
INFO m2 compacted Raft logs {"member": "m2", "compact-index": 17}
```
To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.
Fixes: #15545
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This issue is somewhat easily reproduced simply by bombarding the
server with requests for progress notifications, which eventually
leads to one being delivered ahead of the payload message. This is
then caught by the watch response validation code previously added by
Marek Siarkowicz.
Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
This will fail basically every time, as the progress notification
request catches the watcher in an asynchronised state.
Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>