238 Commits

Author SHA1 Message Date
Benjamin Wang
2df32102ca
Merge pull request #15835 from yellowzf/grpcproxy_fix_memberlist_results_not_update_when_proxy_node_down
grpcproxy: fix memberlist results not update when proxy node down
2023-05-15 13:34:05 +08:00
yellowzf
ca221208d2 grpcproxy: fix memberlist results not update when proxy node down
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return

Signed-off-by: yellowzf <zzhf3311@163.com>
2023-05-15 10:59:02 +08:00
Benjamin Wang
52dfd4bbed
Merge pull request #15867 from chaochn47/auth_test_split_8
migrate e2e auth tests to common #8
2023-05-13 14:21:37 +08:00
Chao Chen
c846b087db migrate e2e auth tests to common #8
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-12 22:51:47 -07:00
Benjamin Wang
05b663fbe8
Merge pull request #15828 from chaochn47/add_leadership_transfer_coverage
tests/e2e: add graceful shutdown test
2023-05-11 07:39:25 +08:00
James Blair
3f5ad36039
Deflake TestEtcdGrpcResolverRoundRobin.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-05-10 21:03:01 +12:00
Chao Chen
f31d0eafb9 tests/e2e: add graceful shutdown test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-09 17:08:53 -07:00
Benjamin Wang
b404d25d84
Merge pull request #15741 from AngstyDuck/set-default-value-for-AutoCompactionMode
server: default value for config file field auto-compaction-mode is n…
2023-05-10 05:44:16 +08:00
AngstyDuck
a7344da7d3 server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined
Signed-off-by: AngstyDuck <solsticedante@gmail.com>
2023-05-09 23:10:44 +08:00
Hitoshi Mitake
49b59cc8e5
Merge pull request #15656 from mitake/lease-timetolive-auth
protect LeaseTimeToLive with RBAC
2023-05-02 23:02:29 +09:00
James Blair
b9533ca98b
Deflake TestEtcdGrpcResolverRoundRobin.
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-29 14:14:16 +12:00
Hitoshi Mitake
c9b368119e tests: e2e and integration test for timetolive
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
James Blair
18e3acae0e
Add new test for round robin resolver.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-25 18:44:24 +12:00
Wei Fu
50aa00b203 tests: make log monitor as common helper
It's followup of #15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-18 09:00:24 +08:00
Wei Fu
9f034fbaa8 chore: use tools/mod to lock the cfssl cmd version
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:06:31 +08:00
Wei Fu
8cd5969248 chore: use strict mode for tests/*/*.sh
REF: #15514

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:05:39 +08:00
Wei Fu
536953ec6c tests: deflake TestV3WatchRestoreSnapshotUnsync
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index:  6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.

```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}
```

To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.

Fixes: #15545

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-10 22:27:58 +08:00
Peter Wortmann
af25936fb7 tests/integration: Demonstrate manual progress notification race
This will fail basically every time, as the progress notification
request catches the watcher in an asynchronised state.

Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
2023-04-05 11:19:07 +01:00
Marek Siarkowicz
0cbd56e8b6 tests: Cleanup endpoints
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 12:18:54 +02:00
Hitoshi Mitake
4da39e4b1e
Merge pull request #15294 from mitake/range-check
server/auth: disallow creating empty permission ranges
2023-04-03 09:03:50 +09:00
Chao Chen
f163af2bc8 deflake TestTracing
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-03-17 14:39:18 -07:00
Marek Siarkowicz
372042c374 refactor: Use proper variable names for urls
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-13 14:48:01 +01:00
Wei Fu
4a9ad47bcc tests/integration: deflake #15409
The cluster version will be initialized after the member becomes leader.
The update is handled asynchronously. It couldn't be updated if the member
has been closed and the go-runtime picks the `s.stopping` channel first.

```go
// e2a5df534c/server/etcdserver/server.go (L2170)

func (s *EtcdServer) monitorClusterVersions() {
	...
	for {
		select {
		case <-s.firstCommitInTerm.Receive():
		case <-time.After(monitorVersionInterval):
		case <-s.stopping:
			return
		}
		...
	}
}
```

Or after the `s.stopping` has been closed, the [UpdateClusterVersion][1] won't
file GoAttach successfully. For the #15409, we can see the warn log
`server has stopped; skipping GoAttach` from GoAttach:

```plain
https://github.com/etcd-io/etcd/actions/runs/4340931587/jobs/7580103902

    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	stopping grpc server due to error	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	stopped grpc server due to error	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	ERROR	default	setting up serving from embedded etcd failed.	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	ERROR	default	setting up serving from embedded etcd failed.	{"error": "http: Server closed"}
    logger.go:130: 2023-03-06T07:36:44.253Z	INFO	default	skipped leadership transfer for single voting member cluster	{"local-member-id": "8e9e05c52164694d", "current-leader-member-id": "8e9e05c52164694d"}

    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	server has stopped; skipping GoAttach

    ...
```

If the cluster version isn't updated, the minimum storage version will
be v3.5 because the [AuthStatus][2] is introduced in [v3.5][3].
The compare will fail.

To fix this issue, we should wait for cluster version to become ready
after server is ready to serve request.

[1]: <e2a5df534c/server/etcdserver/adapters.go (L45)>
[2]: <071e70cdc4>
[3]: <1b4e54c238>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-11 14:50:37 +08:00
Wei Fu
3419230eea tests/integration: Update TestLeasingDeleteRangeContendTxn
The TestLeasingDeleteRangeContendTxn is trying to test for RangeDelete when
the target resources are being updated.  When the `txnLeasing` wants a
server-side transaction, it needs to ensure all the keys mod revision should
be leass than what it saw. If the compare fails, it will repeat to apply the
server-side transaction until it is sucessful. I believe the test-case is
trying to verify how the `txnLeasing` handles the race issue.

Before the patch #15401, the resource-updating goroutine keeps updating until
the RangeDelete finishes. The testcase is flaky because two goroutines are
sharing one `ctx` and grpc-go client won't wait for the response if `ctx`
has been canceled.

For example,

| DelLease Goroutine   | PutLease Goroutine         | ETCD Server                    | Key/0 Status |
| --                   | ---                        | --                             | --           |
| deleted              |                            |                                | version = 0  |
|                      | send update(key/0=123) req | received update(key/0=123) req | version = 0  |
| cancel               |                            |                                | version = 0  |
|                      | exit because of cancel     |                                | version = 0  |
| get key/0 by putkv   |                            |                                | version = 0  |
|                      |                            | applied update(key/0=123)      | version = 1  |
| get key/0 by raw-cli |                            |                                | version = 1  |

So `raw-cli` gets `[key/0=123]` while the `putkv` gets `[]`. If `putkv`
applies two update reqs to ETCD server and the last one is canceled
before apply, the error will be like:

```
expected [key:"key/0" version:2 value:"123" ], got [key:"key/0" version:1 value:"123" ]
```

The resource-updating goroutine should not share the ctx with RangeDelete here.
And I also revert current main branch because the resource-update goroutine
only updates 8 times and might exit before `RangeDelete`. In this case,
the `txnLeasing` is not handling the race issue.

Fixes: #15352

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-07 23:11:41 +08:00
Thomas Jungblut
63964ec781 Fixing flaky TestLeasingDeleteRangeContendTxn
Fixes etcd-io#15352.
Depending on the goroutine scheduling, the expected count of 8 might not
have been reached yet. This ensures the routine won't stop earlier than
that.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-03-03 11:38:22 +01:00
Hitoshi Mitake
65eeb7ff17 server/auth: disallow creating empty permission ranges
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-02-27 22:55:36 +09:00
Benjamin Wang
87e271701b test: enhance the test case TestV3WatchProgressOnMemberRestart
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-02-10 20:09:26 +08:00
Benjamin Wang
36fc3cae65 clientv3: correct the nextRev on receving progress notification response
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-02-10 09:09:19 +08:00
Tero Saarni
588b98d085 Add TLSv1.3 support.
Added optional TLS min/max protocol version and command line switches to set
versions for the etcd server.

If max version is not explicitly set by the user, let Go select the max
version which is currently TLSv1.3. Previously max version was set to TLSv1.2.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
2023-01-30 16:16:53 +02:00
Piotr Tabor
9abc895122 Goimports: Apply automated fixing to test files as well.
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 13:04:45 +01:00
Piotr Tabor
6f899a7b40
Merge pull request #15052 from ptabor/20221228-goimports-fix
./scripts/fix.sh: Takes care of goimports across the whole project.
2022-12-29 11:31:22 +01:00
Piotr Tabor
9e1abbab6e Fix goimports in all existing files. Execution of ./scripts/fix.sh
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 09:41:31 +01:00
Wei Fu
4d0b91947e chore: delete // +build buildtag by go fix
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-12-29 14:17:05 +08:00
Ramil Mirhasanov
f0153222f1 clientv3/naming/endpoints: fix endpoints prefix bug
fixes bug with multiple endpoints with same prefix

Signed-off-by: Ramil Mirhasanov <ramil600@yahoo.com>
2022-12-22 13:36:16 +03:00
Piotr Tabor
6fc0d96b42
Merge pull request #14993 from ramil600/add-log
clientv3/concurrency: add logger to session, add unit test
2022-12-19 10:30:38 +01:00
Ramil Mirhasanov
3c582fecb0 clientv3/concurrency: add logger to session, add unit test
Signed-off-by: Ramil Mirhasanov <ramil600@yahoo.com>
2022-12-16 11:11:35 +03:00
Wei Fu
e58c73cc18 maintenance: add test to verify content of Snapshot
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-12-16 15:53:39 +08:00
Benjamin Wang
3c51c42417 test: fix nil pointer panic in testMutexLock
Refer to: https://github.com/etcd-io/etcd/actions/runs/3671847902/jobs/6207463700

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xedc388]

goroutine 5253 [running]:
go.etcd.io/etcd/client/v3/concurrency.(*Session).Client(...)
	/home/runner/work/etcd/etcd/client/v3/concurrency/session.go:76
go.etcd.io/etcd/client/v3/concurrency.(*Mutex).tryAcquire(0xc000133140, {0x18a8668, 0xc000050158})
	/home/runner/work/etcd/etcd/client/v3/concurrency/mutex.go:111 +0x88
go.etcd.io/etcd/client/v3/concurrency.(*Mutex).Lock(0xc000133140, {0x18a8668, 0xc000050158})
	/home/runner/work/etcd/etcd/client/v3/concurrency/mutex.go:74 +0x68
go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes_test.testMutexLock.func1()
	/home/runner/work/etcd/etcd/tests/integration/clientv3/experimental/recipes/v3_lock_test.go:65 +0x285
created by go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes_test.testMutexLock
	/home/runner/work/etcd/etcd/tests/integration/clientv3/experimental/recipes/v3_lock_test.go:59 +0xda
FAIL	go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes	7.070s
FAIL
```

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-12 10:18:45 +08:00
Benjamin Wang
7b19ee6396 test: add integration test to cover the multiple member corruption case
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Benjamin Wang
a3197102e9 test: rollback the change in PR pull/14824
The change did in https://github.com/etcd-io/etcd/pull/14824 fixed
the test instead of the product code. It isn't correct. After we
fixed the product code in this PR, we can revert the change in
that PR.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Bhargav Ravuri
dbfe42bbd2 comments: fix comments as per goword in go _test pkg files
Comments fixed as per goword in go _test package files that
shell function go_srcs_in_module lists as per changes on #14827

Helps in #14827

Signed-off-by: Bhargav Ravuri <bhargav.ravuri@infracloud.io>
2022-11-24 00:03:00 +05:30
Bhargav Ravuri
2feec4fe68 comments: fix comments as per goword in go test files
Comments fixed as per goword in go test files that shell
function go_srcs_in_module lists as per changes on #14827

Helps in #14827

Signed-off-by: Bhargav Ravuri <bhargav.ravuri@infracloud.io>
2022-11-23 23:05:42 +05:30
Wei Fu
0b30e83b1d tests/integration: deflake Corruption cases
If the corrupted member has been elected as leader, the memberID in alert
response won't be the corrupted one. It will be a smaller follower ID since
the raftCluster.Members always sorts by ID. We should check the leader
ID and decide to use which memberID.

Fixes: #14823

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-11-22 22:19:55 +08:00
Sasha Melentyev
c3b6cbdb73 all: goimports -w .
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-17 19:07:04 +03:00
Sasha Melentyev
2c9c209eb6 all: Changing Printf and friends to Print if there is no formatting
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-15 22:11:23 +03:00
Sasha Melentyev
855aa4f7a7 all: Use ReplaceAll instead of Replace with -1 pos
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-15 00:06:09 +03:00
chenyahui
5b8c6b548f etcdclient: check mutex state in Unlock method of concurrency.Mutex
Check the values of myKey and myRev first in Unlock method to prevent calling Unlock without Lock. Because this may cause the value of pfx to be deleted by mistake.

Signed-off-by: chenyahui <cyhone@qq.com>
2022-11-08 22:24:52 +08:00
Benjamin Wang
62167d1f1f clientv3: fix the design & implementation of double barrier
Check the client count before creating the ephemeral key, do not
create the key if there are already too many clients. Check the
count after creating the key again, if the total kvs is bigger
than the expected count, then check the rev of the current key,
and take action accordingly based on its rev. If its rev is in
the first ${count}, then it's valid client, otherwise, it should
fail.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-10-20 16:24:20 +08:00
Marek Siarkowicz
07ca384753 tests: Move MustAbsPath function to testutils
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-10-17 13:37:14 +02:00
王霄霄
2751ec6479 integration: check Watch response error not nil to avoid runtime panic.
Fixes issue: #14259

Signed-off-by: 王霄霄 <1141195807@qq.com>
2022-10-16 11:41:11 +08:00