67 Commits

Author SHA1 Message Date
Marek Siarkowicz
6979318108 tests/robustness: Make Range a proper request type to allow setting Range.Revision != 0 for stale reads
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-06-14 13:55:45 +02:00
Marek Siarkowicz
974655e02c tests/robustness: Rename operations const to separate from RequestType
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-06-14 13:24:14 +02:00
Marek Siarkowicz
da49157b20
Merge pull request #16066 from serathius/robusness-validate
tests/robustness: Extract validation to separate package
2023-06-14 11:54:46 +02:00
Marek Siarkowicz
7bbc738ec4 tests/robustness: Extract validation to separate package
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-06-14 09:14:27 +02:00
Marek Siarkowicz
a268a67e45 tests/robustness: Move get to list of randomized operations 2023-06-13 21:09:05 +02:00
Marek Siarkowicz
a6ab774458
Merge pull request #16044 from serathius/robusness-empty
tests/robustness: Assume starting from empty etcd instead of throwing out first failed request
2023-06-12 10:18:34 +02:00
Marek Siarkowicz
b366cda70f
Merge pull request #16046 from serathius/robusness-test-diff
tests/robustness: Provide a response diff in model test to make debugging easier
2023-06-12 10:15:43 +02:00
Marek Siarkowicz
f410c6e6df tests/robustness: Provide a response diff in model test to make debugging easier
Signed-off-by: Marek Siarkowicz <serathius@users.noreply.github.com>
2023-06-09 22:42:17 +02:00
Marek Siarkowicz
53af854871 tests/robustness: Assume starting from empty etcd instead of throwing out first failed request
Signed-off-by: Marek Siarkowicz <serathius@users.noreply.github.com>
2023-06-09 22:38:16 +02:00
Marek Siarkowicz
f91f6d8414 tests/robustness: Put traffic type on second place before cluster size in test name
Signed-off-by: Marek Siarkowicz <serathius@users.noreply.github.com>
2023-06-09 22:30:53 +02:00
Marek Siarkowicz
16bf0f6641 tests/robustness: Use traffic.RecordingClient in watch
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-25 22:17:23 +02:00
Marek Siarkowicz
4872b679a5 tests/robustness: Expect revions to be unique for Kubernetes Traffic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-23 15:51:10 +02:00
Marek Siarkowicz
f3c9db9c46
Merge pull request #15893 from serathius/robustness-validate-client-watch
tests/robustness: Validate all etcd watches opened to etcd
2023-05-16 10:51:55 +02:00
Marek Siarkowicz
6429f47631 tests/robustness: Validate all etcd watches opened to etcd
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-16 10:28:01 +02:00
Marek Siarkowicz
112aad1ea7 tests/robustness: Unify model test cases
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-16 10:13:08 +02:00
Marek Siarkowicz
6e53792568 tests/robustness: Implement Kubernetes optimistic concurrency operations
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-15 13:45:27 +02:00
Marek Siarkowicz
911c40a347 tests/robustness: Implement kubernetes list watch protocol
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-15 10:11:05 +02:00
Bogdan Kanivets
c338882d7a tests/robustness: use monotonic clock for watch events
see: https://github.com/etcd-io/etcd/pull/15323
For consistency watch events should also use only time-measurement operations.

fixes: https://github.com/etcd-io/etcd/issues/15328
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
2023-05-14 12:58:13 -07:00
Marek Siarkowicz
2a0c989662
Merge pull request #15882 from serathius/robustness-txn-fields
tests/robustness: Improve naming of Txn fields
2023-05-12 13:34:02 +02:00
Marek Siarkowicz
831ce4c3cf tests/robustness: Improve naming of Txn fields
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-12 13:10:25 +02:00
Marek Siarkowicz
e9900f6fff tests/robustness: Separate stream id from client id and improve AppendableHistory doc
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-11 21:03:52 +02:00
Marek Siarkowicz
9a922091ed
Merge pull request #15873 from serathius/robustness-safeguards
tests/robustness: Add safeguards to client and history
2023-05-11 13:37:42 +02:00
Marek Siarkowicz
962e15038e tests/robustness: Add safeguards to client and history
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-11 13:12:09 +02:00
Marek Siarkowicz
165a76b506 tests/robustness: Fix pointer causing all cluster tests using kubernetes traffic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-10 16:08:08 +02:00
Marek Siarkowicz
dd248518d1 tests/robustness: Move request progress field from traffic to watch config and pass testScenario to reduce number of arguments
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-10 11:43:02 +02:00
Marek Siarkowicz
ad20230e07 test/robustness: Create dedicated traffic package
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-09 10:50:13 +02:00
Marek Siarkowicz
f6161673af
Merge pull request #15851 from serathius/robustness-generic
tests/robustness: Make weighted pick random generic
2023-05-09 10:36:11 +02:00
Marek Siarkowicz
b14b468661 tests/robustness: Make weighted pick random generic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-08 19:58:38 +02:00
Marek Siarkowicz
7c68be4cf3 tests/robustness: Implement Range limit and count
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-07 09:32:07 +02:00
Marek Siarkowicz
40f71ef3c6 tests/robustness: Implement delete request for kubernetes scenario
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-05 13:40:46 +02:00
Marek Siarkowicz
92366a5338 tests/robustness: Split model code into deterministic and non-deterministic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
2023-05-05 12:25:10 +02:00
Marek Siarkowicz
cfe154209c tests/robustness: Separate describe model functions to dedicated file
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 14:03:18 +02:00
Marek Siarkowicz
9b5680c5f1 tests/robustness: Implement first step in validating the Kubernetes-etcd contract.
* Use mod revision for optimistic concurrency.
* Introduce range requests as more general then get
* Add kubernetes specific traffic generation, for now using pull, but
  expected to evolve to use watch.
* Introduce kubernetes specific test scenario

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 13:26:54 +02:00
Wei Fu
09d053e035 tests/robustness: tune timeout policy
In a [scheduled test][1], the error shows

```
2023-04-19T11:16:15.8166316Z     traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```

According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.

I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.

```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
                t.controlBuf.throttle()
                frame, err := t.framer.fr.ReadFrame()
                if t.keepaliveEnabled {
+                       time.Sleep(2 * time.Millisecond)
                        atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
                }
                if err != nil {
```

`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.

And in a [scheduled test][5], the error shows

```
logger.go:130: 2023-04-22T10:45:52.646Z	INFO	Failed to trigger failpoint	{"failpoint": "blackhole", "error": "context deadline exceeded"}
```

Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.

The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.

```
injectFailpoints(1min timeout)
  failpoint.Inject
    triggerBlockhole.Trigger
      blackhole
        waitTillSnapshot
```

> NOTE: I didn't reproduce it either. :(

Reference:

[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>

REF: #15763

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-29 07:03:47 +08:00
Benjamin Wang
c7d81acaf0 test: forcibly save data on pinicking
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-27 14:54:35 +08:00
Marek Siarkowicz
e48ef9ea6a tests/robustness: Disable blackholing traffic till snapshot for v3.4.X
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:39:57 +02:00
Marek Siarkowicz
d19752f16a tests/robustness: Unify failpoint lists by depending on availability checking
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:38:30 +02:00
Marek Siarkowicz
625d427eb5 tests/robustness: Separate triggering failpoint from injection
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:36:50 +02:00
Marek Siarkowicz
932415a8d5 tests/robustness: Verify cluster configuration in failpoint availability
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:36:50 +02:00
Chao Chen
941c4afb0c tests/framwork/e2e/cluster.go: revert back to sequential cluster stop to reduce e2e test run time
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-04-11 22:08:18 -07:00
Marek Siarkowicz
7153a8f2f4
Merge pull request #15646 from serathius/robustness-readme-watch-issue
tests/robustness: Document analysing watch issue
2023-04-07 23:45:42 +02:00
Marek Siarkowicz
540d012e5e tests/robustness: Ensure that etcdctl binary is provided
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-05 23:04:20 +02:00
Marek Siarkowicz
1e41d95ab2 tests/robustness: Document analysing watch issue
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-05 22:40:47 +02:00
Peter Wortmann
42a2643df9 tests/robustness: Reproduce issue #15220
This issue is somewhat easily reproduced simply by bombarding the
server with requests for progress notifications, which eventually
leads to one being delivered ahead of the payload message. This is
then caught by the watch response validation code previously added by
Marek Siarkowicz.

Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
2023-04-05 11:23:02 +01:00
Marek Siarkowicz
5bae6b1e44 tests/robustness: Detect trigger timeout and exit
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 15:23:58 +02:00
James Blair
1227754284 Cancel watch if cluster not healthy before or after injecting failpoints.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-04 13:58:17 +02:00
Marek Siarkowicz
6582e349db tests: Enfoce timeout on failpoints
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 12:25:07 +02:00
Marek Siarkowicz
523f235c82
Merge pull request #15603 from serathius/robustness-finish-with-success
tests: Ensure that operation history finishes with successful request
2023-04-04 12:03:36 +02:00
Marek Siarkowicz
6a5d326519 tests: Ensure that operation history finishes with successful request
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 09:40:17 +02:00
Marek Siarkowicz
138fae6246
Merge pull request #15632 from serathius/fix-comparing-etcd-version
tests: Fix comparing etcd version
2023-04-04 09:34:55 +02:00