Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Marek Siarkowicz	40f71ef3c6	tests/robustness: Implement delete request for kubernetes scenario Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-05 13:40:46 +02:00
Marek Siarkowicz	9b5680c5f1	tests/robustness: Implement first step in validating the Kubernetes-etcd contract. * Use mod revision for optimistic concurrency. * Introduce range requests as more general then get * Add kubernetes specific traffic generation, for now using pull, but expected to evolve to use watch. * Introduce kubernetes specific test scenario Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-04 13:26:54 +02:00
Wei Fu	09d053e035	tests/robustness: tune timeout policy In a [scheduled test][1], the error shows ``` 2023-04-19T11:16:15.8166316Z traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout ``` According to [grpc-keepalive@v1.51.0][2], each frame from server will fresh the `lastRead` and it won't file `Ping` frame to server. But the client used by [`tombstone` request][3] might hit the race. Since we use 5ms as timeout, the client might not receive the result of `Ping` from server in time. The keepalive will mark it timeout and close the connection. I didn't reproduce it in my local. If we add the sleep before update `lastRead`, it can reproduce it sometimes. Still investigating this part. ```diff diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go index d518b07e..bee9c00a 100644 --- a/internal/transport/http2_client.go +++ b/internal/transport/http2_client.go @@ -1560,6 +1560,7 @@ func (t http2Client) reader(errCh chan<- error) { t.controlBuf.throttle() frame, err := t.framer.fr.ReadFrame() if t.keepaliveEnabled { + time.Sleep(2 time.Millisecond) atomic.StoreInt64(&t.lastRead, time.Now().UnixNano()) } if err != nil { ``` `DialKeepAliveTime` is always >= [10s][4]. I think we should increase the timeout to avoid flaky caused by unstable env. And in a [scheduled test][5], the error shows ``` logger.go:130: 2023-04-22T10:45:52.646Z INFO Failed to trigger failpoint {"failpoint": "blackhole", "error": "context deadline exceeded"} ``` Before sending `Status` to member, the client doesn't [pick][6] the connection in time (100ms) and returns the error. The `waitTillSnapshot` is used to ensure that it is good enough to trigger snapshot transfer. And we have 1min timeout for injectFailpoints, so I think we can remove the 100ms timeout to reduce unnecessary stop. ``` injectFailpoints(1min timeout) failpoint.Inject triggerBlockhole.Trigger blackhole waitTillSnapshot ``` > NOTE: I didn't reproduce it either. :( Reference: [1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899> [2]: <`eeb9afa1f6/internal/transport/http2_client.go (L1647)`> [3]: <`7450cd886d/tests/robustness/traffic.go (L94)`> [4]: <`eeb9afa1f6/dialoptions.go (L445)`> [5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015> [6]: <`eeb9afa1f6/clientconn.go (L932)`> REF: #15763 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-29 07:03:47 +08:00
Marek Siarkowicz	d475cf81a0	tests: Rename linearizability tests to robustness Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-02-26 14:36:18 +01:00

4 Commits