Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Marek Siarkowicz	b14b468661	tests/robustness: Make weighted pick random generic Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-08 19:58:38 +02:00
Marek Siarkowicz	7c68be4cf3	tests/robustness: Implement Range limit and count Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-07 09:32:07 +02:00
Marek Siarkowicz	40f71ef3c6	tests/robustness: Implement delete request for kubernetes scenario Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-05 13:40:46 +02:00
Marek Siarkowicz	92366a5338	tests/robustness: Split model code into deterministic and non-deterministic Signed-off-by: Marek Siarkowicz <siarkowicz@google.com> Co-authored-by: Benjamin Wang <wachao@vmware.com> Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>	2023-05-05 12:25:10 +02:00
Marek Siarkowicz	cfe154209c	tests/robustness: Separate describe model functions to dedicated file Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-04 14:03:18 +02:00
Marek Siarkowicz	2c16812841	Merge pull request #15817 from serathius/robustness-k8s-1 tests/robustness: Implement first step in validating the Kubernetes-etcd contract	2023-05-04 13:52:25 +02:00
Marek Siarkowicz	9b5680c5f1	tests/robustness: Implement first step in validating the Kubernetes-etcd contract. * Use mod revision for optimistic concurrency. * Introduce range requests as more general then get * Add kubernetes specific traffic generation, for now using pull, but expected to evolve to use watch. * Introduce kubernetes specific test scenario Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-05-04 13:26:54 +02:00
Hitoshi Mitake	49b59cc8e5	Merge pull request #15656 from mitake/lease-timetolive-auth protect LeaseTimeToLive with RBAC	2023-05-02 23:02:29 +09:00
Benjamin Wang	b089474b01	Merge pull request #15795 from jmhbnz/deflake-roundrobin-resolver-test tests: Deflake TestEtcdGrpcResolverRoundRobin	2023-05-02 06:09:46 +08:00
James Blair	b9533ca98b	Deflake TestEtcdGrpcResolverRoundRobin. Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%. Signed-off-by: James Blair <mail@jamesblair.net>	2023-04-29 14:14:16 +12:00
Wei Fu	09d053e035	tests/robustness: tune timeout policy In a [scheduled test][1], the error shows ``` 2023-04-19T11:16:15.8166316Z traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout ``` According to [grpc-keepalive@v1.51.0][2], each frame from server will fresh the `lastRead` and it won't file `Ping` frame to server. But the client used by [`tombstone` request][3] might hit the race. Since we use 5ms as timeout, the client might not receive the result of `Ping` from server in time. The keepalive will mark it timeout and close the connection. I didn't reproduce it in my local. If we add the sleep before update `lastRead`, it can reproduce it sometimes. Still investigating this part. ```diff diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go index d518b07e..bee9c00a 100644 --- a/internal/transport/http2_client.go +++ b/internal/transport/http2_client.go @@ -1560,6 +1560,7 @@ func (t http2Client) reader(errCh chan<- error) { t.controlBuf.throttle() frame, err := t.framer.fr.ReadFrame() if t.keepaliveEnabled { + time.Sleep(2 time.Millisecond) atomic.StoreInt64(&t.lastRead, time.Now().UnixNano()) } if err != nil { ``` `DialKeepAliveTime` is always >= [10s][4]. I think we should increase the timeout to avoid flaky caused by unstable env. And in a [scheduled test][5], the error shows ``` logger.go:130: 2023-04-22T10:45:52.646Z INFO Failed to trigger failpoint {"failpoint": "blackhole", "error": "context deadline exceeded"} ``` Before sending `Status` to member, the client doesn't [pick][6] the connection in time (100ms) and returns the error. The `waitTillSnapshot` is used to ensure that it is good enough to trigger snapshot transfer. And we have 1min timeout for injectFailpoints, so I think we can remove the 100ms timeout to reduce unnecessary stop. ``` injectFailpoints(1min timeout) failpoint.Inject triggerBlockhole.Trigger blackhole waitTillSnapshot ``` > NOTE: I didn't reproduce it either. :( Reference: [1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899> [2]: <`eeb9afa1f6/internal/transport/http2_client.go (L1647)`> [3]: <`7450cd886d/tests/robustness/traffic.go (L94)`> [4]: <`eeb9afa1f6/dialoptions.go (L445)`> [5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015> [6]: <`eeb9afa1f6/clientconn.go (L932)`> REF: #15763 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-29 07:03:47 +08:00
Benjamin Wang	c7d81acaf0	test: forcibly save data on pinicking Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-27 14:54:35 +08:00
Hitoshi Mitake	c9b368119e	tests: e2e and integration test for timetolive Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Co-authored-by: Benjamin Wang <wachao@vmware.com>	2023-04-26 20:35:20 +09:00
James Blair	18e3acae0e	Add new test for round robin resolver. Signed-off-by: James Blair <mail@jamesblair.net>	2023-04-25 18:44:24 +12:00
Benjamin Wang	4a8817bfb0	Merge pull request #15737 from jmhbnz/update-dependencies Bump dependencies identified by dependabot	2023-04-21 06:35:08 +08:00
James Blair	04f3e9cb9a	dependency: bump golang.org/x/crypto from 0.7.0 to 0.8.0 Signed-off-by: James Blair <mail@jamesblair.net>	2023-04-21 05:34:21 +12:00
James Blair	042e2e9a57	dependency: bump github.com/prometheus/client_golang from 1.14.0 to 1.15.0 Signed-off-by: James Blair <mail@jamesblair.net>	2023-04-21 05:14:40 +12:00
Wei Fu	50aa00b203	tests: make log monitor as common helper It's followup of #15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. The test env ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz mkdir /sys/fs/cgroup/etcd-followup-15667 echo 0-2 \| tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores ``` Before change: * memory.peak: ~ 681 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04 After change: * memory.peak: ~ 671 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07 Based on the test result, I think it's safe to be enabled by default. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-18 09:00:24 +08:00
Wei Fu	9f034fbaa8	chore: use tools/mod to lock the cfssl cmd version Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-13 12:06:31 +08:00
Wei Fu	8cd5969248	chore: use strict mode for tests//.sh REF: #15514 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-13 12:05:39 +08:00
Wei Fu	78d2ead804	chore: deprecate tests/manual folder REF: #15514 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-13 12:05:39 +08:00
Marek Siarkowicz	e48ef9ea6a	tests/robustness: Disable blackholing traffic till snapshot for v3.4.X Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-12 14:39:57 +02:00
Marek Siarkowicz	d19752f16a	tests/robustness: Unify failpoint lists by depending on availability checking Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-12 14:38:30 +02:00
Marek Siarkowicz	625d427eb5	tests/robustness: Separate triggering failpoint from injection Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-12 14:36:50 +02:00
Marek Siarkowicz	932415a8d5	tests/robustness: Verify cluster configuration in failpoint availability Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-12 14:36:50 +02:00
Chao Chen	941c4afb0c	tests/framwork/e2e/cluster.go: revert back to sequential cluster stop to reduce e2e test run time Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-04-11 22:08:18 -07:00
Benjamin Wang	dddd4780c2	dependency: bump github.com/spf13/cobra from 1.6.1 to 1.7.0 Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-11 08:51:26 +08:00
Benjamin Wang	eb9b15bf49	dependency: bump golang.org/x/net from 0.8.0 to 0.9.0 Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-11 08:44:26 +08:00
Benjamin Wang	8a27dd4db4	dependency: bump github.com/jonboulle/clockwork from 0.3.0 to 0.4.0 Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-11 08:36:44 +08:00
Benjamin Wang	1683231216	Merge pull request #15667 from fuweid/deflake-issue-15545-TestV3WatchRestoreSnapshotUnsync tests: deflake TestV3WatchRestoreSnapshotUnsync	2023-04-11 06:00:42 +08:00
Wei Fu	536953ec6c	tests: deflake TestV3WatchRestoreSnapshotUnsync The TestV3WatchRestoreSnapshotUnsync setups three members' cluster. Before serving any update requests from client, after leader elected, each member will have index 8 log: 3 x ConfChange + 3 x ClusterMemberAttrSet + 1 x ClusterVersionSet. Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to file update requests to trigger snapshot at least twice. T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8) T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date) After member0 recovers from network partition, it will reject leader's request and return hint (index:8, term:x). If it happens after second snapshot, leader will find out the index:8 is out of date and force to transfer snapshot. However, the client only files 15 update requests and leader doesn't finish the process of snapshot in time. Since the last of compacted-index is 6, leader can still replicate index:9 to member0 instead of snapshot. ```bash cd tests/integration CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./ ... INFO m2.raft 3da8ba707f1a21a4 became leader at term 2 {"member": "m2"} ... INFO m2 triggering snapshot {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false} ... cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036 cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4 cluster.go:416: WaitMembersForLeader INFO m0.raft 99626fe5001fde8b became follower at term 2 {"member": "m0"} INFO m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2 {"member": "m0"} DEBUG m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23 {"member": "m2"} DEBUG m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15] {"member": "m2"} DEBUG m0 Applying entries {"member": "m0", "num-entries": 15} DEBUG m0 Applying entry {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"} .... INFO m2 saved snapshot {"member": "m2", "snapshot-index": 22} INFO m2 compacted Raft logs {"member": "m2", "compact-index": 17} ``` To fix this issue, the patch uses log monitor to watch "compacted Raft log" and expect that two members should compact log twice. Fixes: #15545 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-04-10 22:27:58 +08:00
Marek Siarkowicz	7153a8f2f4	Merge pull request #15646 from serathius/robustness-readme-watch-issue tests/robustness: Document analysing watch issue	2023-04-07 23:45:42 +02:00
Marek Siarkowicz	a5a5862e0b	tests: Make using etcdctl expicit in e2e tests Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-06 13:29:37 +02:00
Benjamin Wang	8b1cd036ff	security: remove password after authenticating the user fix https://nvd.nist.gov/vuln/detail/CVE-2021-28235 Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-06 17:11:54 +08:00
Benjamin Wang	801bb4c6df	test: add an e2e test to reproduce https://nvd.nist.gov/vuln/detail/CVE-2021-28235 Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-06 16:47:31 +08:00
Benjamin Wang	2d0d3c3fdf	security: bump go to 1.19.8 to fix four CVEs Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-04-06 13:38:58 +08:00
Marek Siarkowicz	2d9aeec91f	Merge pull request #15645 from serathius/tests-cleanup-alternative-binaries tests/framework: Cleanup alternative binaries in e2e tests	2023-04-06 07:33:17 +02:00
Marek Siarkowicz	540d012e5e	tests/robustness: Ensure that etcdctl binary is provided Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-05 23:04:20 +02:00
Marek Siarkowicz	1e41d95ab2	tests/robustness: Document analysing watch issue Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-05 22:40:47 +02:00
Marek Siarkowicz	651873cf7b	tests/framework: Cleanup alternative binaries in e2e tests Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-05 15:32:31 +02:00
Peter Wortmann	42a2643df9	tests/robustness: Reproduce issue #15220 This issue is somewhat easily reproduced simply by bombarding the server with requests for progress notifications, which eventually leads to one being delivered ahead of the payload message. This is then caught by the watch response validation code previously added by Marek Siarkowicz. Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>	2023-04-05 11:23:02 +01:00
Peter Wortmann	af25936fb7	tests/integration: Demonstrate manual progress notification race This will fail basically every time, as the progress notification request catches the watcher in an asynchronised state. Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>	2023-04-05 11:19:07 +01:00
Marek Siarkowicz	5bae6b1e44	tests/robustness: Detect trigger timeout and exit Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-04 15:23:58 +02:00
James Blair	1227754284	Cancel watch if cluster not healthy before or after injecting failpoints. Signed-off-by: James Blair <mail@jamesblair.net>	2023-04-04 13:58:17 +02:00
Marek Siarkowicz	6582e349db	tests: Enfoce timeout on failpoints Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-04 12:25:07 +02:00
Marek Siarkowicz	523f235c82	Merge pull request #15603 from serathius/robustness-finish-with-success tests: Ensure that operation history finishes with successful request	2023-04-04 12:03:36 +02:00
Benjamin Wang	32acc662c9	Merge pull request #15638 from ahrtr/dependency_20230404 Bump some dependencies	2023-04-04 17:11:26 +08:00
Marek Siarkowicz	6a5d326519	tests: Ensure that operation history finishes with successful request Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-04-04 09:40:17 +02:00
Marek Siarkowicz	5e0119eadc	Merge pull request #15636 from lavacat/main-test-watch-delay tests: increase maxWatchDelay to prevent flaky TestWatchDelay*	2023-04-04 09:38:03 +02:00
Marek Siarkowicz	138fae6246	Merge pull request #15632 from serathius/fix-comparing-etcd-version tests: Fix comparing etcd version	2023-04-04 09:34:55 +02:00

... 6 7 8 9 10 ...

1560 Commits