1329 Commits

Author SHA1 Message Date
Chao Chen
c846b087db migrate e2e auth tests to common #8
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-12 22:51:47 -07:00
Marek Siarkowicz
2a0c989662
Merge pull request #15882 from serathius/robustness-txn-fields
tests/robustness: Improve naming of Txn fields
2023-05-12 13:34:02 +02:00
Marek Siarkowicz
831ce4c3cf tests/robustness: Improve naming of Txn fields
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-12 13:10:25 +02:00
Benjamin Wang
67ec1a4d30
Merge pull request #15862 from pchan/bump_dependency
dependency: bump dependabot dependencies
2023-05-12 06:59:47 +08:00
Marek Siarkowicz
e9900f6fff tests/robustness: Separate stream id from client id and improve AppendableHistory doc
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-11 21:03:52 +02:00
Marek Siarkowicz
9a922091ed
Merge pull request #15873 from serathius/robustness-safeguards
tests/robustness: Add safeguards to client and history
2023-05-11 13:37:42 +02:00
Marek Siarkowicz
962e15038e tests/robustness: Add safeguards to client and history
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-11 13:12:09 +02:00
Marek Siarkowicz
d77c618e6a
Merge pull request #15874 from serathius/robustness-fix-traffic-pointer
tests/robustness: Fix pointer causing all cluster tests using kuberne…
2023-05-11 11:16:21 +02:00
Benjamin Wang
05b663fbe8
Merge pull request #15828 from chaochn47/add_leadership_transfer_coverage
tests/e2e: add graceful shutdown test
2023-05-11 07:39:25 +08:00
Benjamin Wang
e3db9dc616
Merge pull request #15868 from jmhbnz/main
tests: Deflake TestEtcdGrpcResolverRoundRobin
2023-05-11 05:08:53 +08:00
Marek Siarkowicz
165a76b506 tests/robustness: Fix pointer causing all cluster tests using kubernetes traffic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-10 16:08:08 +02:00
Marek Siarkowicz
dd248518d1 tests/robustness: Move request progress field from traffic to watch config and pass testScenario to reduce number of arguments
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-10 11:43:02 +02:00
James Blair
3f5ad36039
Deflake TestEtcdGrpcResolverRoundRobin.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-05-10 21:03:01 +12:00
Chao Chen
f31d0eafb9 tests/e2e: add graceful shutdown test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-09 17:08:53 -07:00
Benjamin Wang
b404d25d84
Merge pull request #15741 from AngstyDuck/set-default-value-for-AutoCompactionMode
server: default value for config file field auto-compaction-mode is n…
2023-05-10 05:44:16 +08:00
AngstyDuck
a7344da7d3 server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined
Signed-off-by: AngstyDuck <solsticedante@gmail.com>
2023-05-09 23:10:44 +08:00
Prasad Chandrasekaran
c863f1f8c0 dependency: bump dependabot dependencies
Signed-off-by: Prasad Chandrasekaran <prasadc@vmware.com>
2023-05-09 18:38:35 +05:30
Marek Siarkowicz
ad20230e07 test/robustness: Create dedicated traffic package
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-09 10:50:13 +02:00
Marek Siarkowicz
f6161673af
Merge pull request #15851 from serathius/robustness-generic
tests/robustness: Make weighted pick random generic
2023-05-09 10:36:11 +02:00
Marek Siarkowicz
b14b468661 tests/robustness: Make weighted pick random generic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-08 19:58:38 +02:00
Marek Siarkowicz
7c68be4cf3 tests/robustness: Implement Range limit and count
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-07 09:32:07 +02:00
Marek Siarkowicz
40f71ef3c6 tests/robustness: Implement delete request for kubernetes scenario
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-05 13:40:46 +02:00
Marek Siarkowicz
92366a5338 tests/robustness: Split model code into deterministic and non-deterministic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
2023-05-05 12:25:10 +02:00
Marek Siarkowicz
cfe154209c tests/robustness: Separate describe model functions to dedicated file
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 14:03:18 +02:00
Marek Siarkowicz
2c16812841
Merge pull request #15817 from serathius/robustness-k8s-1
tests/robustness: Implement first step in validating the Kubernetes-etcd contract
2023-05-04 13:52:25 +02:00
Marek Siarkowicz
9b5680c5f1 tests/robustness: Implement first step in validating the Kubernetes-etcd contract.
* Use mod revision for optimistic concurrency.
* Introduce range requests as more general then get
* Add kubernetes specific traffic generation, for now using pull, but
  expected to evolve to use watch.
* Introduce kubernetes specific test scenario

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 13:26:54 +02:00
Hitoshi Mitake
49b59cc8e5
Merge pull request #15656 from mitake/lease-timetolive-auth
protect LeaseTimeToLive with RBAC
2023-05-02 23:02:29 +09:00
Benjamin Wang
b089474b01
Merge pull request #15795 from jmhbnz/deflake-roundrobin-resolver-test
tests: Deflake TestEtcdGrpcResolverRoundRobin
2023-05-02 06:09:46 +08:00
James Blair
b9533ca98b
Deflake TestEtcdGrpcResolverRoundRobin.
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-29 14:14:16 +12:00
Wei Fu
09d053e035 tests/robustness: tune timeout policy
In a [scheduled test][1], the error shows

```
2023-04-19T11:16:15.8166316Z     traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```

According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.

I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.

```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
                t.controlBuf.throttle()
                frame, err := t.framer.fr.ReadFrame()
                if t.keepaliveEnabled {
+                       time.Sleep(2 * time.Millisecond)
                        atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
                }
                if err != nil {
```

`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.

And in a [scheduled test][5], the error shows

```
logger.go:130: 2023-04-22T10:45:52.646Z	INFO	Failed to trigger failpoint	{"failpoint": "blackhole", "error": "context deadline exceeded"}
```

Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.

The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.

```
injectFailpoints(1min timeout)
  failpoint.Inject
    triggerBlockhole.Trigger
      blackhole
        waitTillSnapshot
```

> NOTE: I didn't reproduce it either. :(

Reference:

[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>

REF: #15763

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-29 07:03:47 +08:00
Benjamin Wang
c7d81acaf0 test: forcibly save data on pinicking
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-27 14:54:35 +08:00
Hitoshi Mitake
c9b368119e tests: e2e and integration test for timetolive
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
James Blair
18e3acae0e
Add new test for round robin resolver.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-25 18:44:24 +12:00
Benjamin Wang
4a8817bfb0
Merge pull request #15737 from jmhbnz/update-dependencies
Bump dependencies identified by dependabot
2023-04-21 06:35:08 +08:00
James Blair
04f3e9cb9a
dependency: bump golang.org/x/crypto from 0.7.0 to 0.8.0
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-21 05:34:21 +12:00
James Blair
042e2e9a57
dependency: bump github.com/prometheus/client_golang from 1.14.0 to 1.15.0
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-21 05:14:40 +12:00
Wei Fu
50aa00b203 tests: make log monitor as common helper
It's followup of #15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-18 09:00:24 +08:00
Wei Fu
9f034fbaa8 chore: use tools/mod to lock the cfssl cmd version
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:06:31 +08:00
Wei Fu
8cd5969248 chore: use strict mode for tests/*/*.sh
REF: #15514

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:05:39 +08:00
Wei Fu
78d2ead804 chore: deprecate tests/manual folder
REF: #15514

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:05:39 +08:00
Marek Siarkowicz
e48ef9ea6a tests/robustness: Disable blackholing traffic till snapshot for v3.4.X
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:39:57 +02:00
Marek Siarkowicz
d19752f16a tests/robustness: Unify failpoint lists by depending on availability checking
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:38:30 +02:00
Marek Siarkowicz
625d427eb5 tests/robustness: Separate triggering failpoint from injection
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:36:50 +02:00
Marek Siarkowicz
932415a8d5 tests/robustness: Verify cluster configuration in failpoint availability
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-12 14:36:50 +02:00
Chao Chen
941c4afb0c tests/framwork/e2e/cluster.go: revert back to sequential cluster stop to reduce e2e test run time
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-04-11 22:08:18 -07:00
Benjamin Wang
dddd4780c2 dependency: bump github.com/spf13/cobra from 1.6.1 to 1.7.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-11 08:51:26 +08:00
Benjamin Wang
eb9b15bf49 dependency: bump golang.org/x/net from 0.8.0 to 0.9.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-11 08:44:26 +08:00
Benjamin Wang
8a27dd4db4 dependency: bump github.com/jonboulle/clockwork from 0.3.0 to 0.4.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-11 08:36:44 +08:00
Benjamin Wang
1683231216
Merge pull request #15667 from fuweid/deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync
tests: deflake TestV3WatchRestoreSnapshotUnsync
2023-04-11 06:00:42 +08:00
Wei Fu
536953ec6c tests: deflake TestV3WatchRestoreSnapshotUnsync
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index:  6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.

```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}
```

To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.

Fixes: #15545

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-10 22:27:58 +08:00