263 Commits

Author SHA1 Message Date
Benjamin Wang
d8c410ff82
Merge pull request #16791 from chaochn47/remove-deprecated-gRPC-API
remove deprecated gRPC API usage
2023-10-18 11:13:09 +01:00
Chao Chen
3c6d2e972d remove deprecated gRPC API usage
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-10-17 15:51:25 -07:00
Wei Fu
aea1cd0077 feat: enable unparam lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-10-17 21:24:13 +08:00
Wei Fu
8870cb3070 *: fix unconvert linter
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-25 19:40:45 +08:00
Wei Fu
07effc4d0a *: fix revive linter
Remove old revive_pass in the bash scripts and migirate the revive.toml
into golangci linter_settings.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-24 14:21:11 +08:00
Wei Fu
aa97484166 *: enable goimports in verify-lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-21 21:14:09 +08:00
Wei Fu
9c3edfa0af *: fix staticcheck lint
Changed TraceKey/StartTimeKey/TokenFieldNameGRPCKey to struct{} to
follow the correct usage of context. Similar patch to #8901.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-21 11:24:26 +08:00
Wei Fu
df86cadd8b *: fix ineffassign lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-19 22:19:19 +08:00
Wei Fu
5e3910d96c *: fix govet-shadow lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-19 20:24:01 +08:00
chenyahui
c0aa3b613b Use any instead of interface{}
Signed-off-by: chenyahui <cyhone@qq.com>
2023-09-17 17:41:58 +08:00
Benjamin Wang
5dd5fe35d0 test: de-flake test case TestV3WatchProgressOnMemberRestart
The case may be blocked on sending progress notification, so may
not be able to exit the goroutine.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-09-05 15:04:02 +01:00
Jes Cok
52748f60f3 all: stop using math/rand.Seed
Fixes #16428.

Signed-off-by: Jes Cok <xigua67damn@gmail.com>
2023-08-20 16:34:44 +08:00
Benjamin Wang
eb04f3ad8d
Merge pull request #16338 from chaochn47/bump-up-grpc
Fix 15877 and bump up gRPC from v1.52.0 to v1.57.0
2023-08-02 08:36:02 +01:00
Benjamin Wang
f3a03247df
Merge pull request #16265 from kensou97/expose-session-context
clientv3: add Ctx() to return context of session
2023-08-01 17:09:01 +01:00
Benjamin Wang
8524903935
Merge pull request #16223 from kensou97/fix-barrier
clientv3: fix barrier.Wait() still block after barrier.Release()
2023-08-01 17:08:17 +01:00
Chao Chen
24c6fb4b4d Fix 15877 and bump up gRPC from v1.52.0 to v1.57.0
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-07-31 13:57:24 -07:00
Chao Chen
8aeed09f2c endpoints.Interpret returns Host:port as ServerName
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-07-28 16:19:54 -07:00
zhangwenkang
c3e5201972 clientv3: fix barrier.Wait() still block after barrier.Release() in some cases
Signed-off-by: Wenkang Zhang <314830391@qq.com>
2023-07-18 15:48:48 +08:00
Wenkang Zhang
03d8fff0d8 clientv3: add Ctx() to return context of session
Signed-off-by: Wenkang Zhang <314830391@qq.com>
2023-07-18 15:22:53 +08:00
Rajalakshmi Girish
ea72194935 Fix flaky integration/clientv3/naming TestEtcdGrpcResolverRoundRobin
Signed-off-by: Rajalakshmi Girish <rajalakshmi.girish1@ibm.com>
2023-07-17 23:53:02 -07:00
Benjamin Wang
93bfdba265
Merge pull request #16156 from kensou97/fix-barrier
clientv3: remove v3.WithFirstKey() in Barrier.Wait()
2023-07-06 08:29:21 +01:00
caojiamingalan
eff9517a90 etcdserver: add cluster id check for hashKVHandler
Signed-off-by: caojiamingalan <alan.c.19971111@gmail.com>
2023-07-05 14:09:40 -05:00
zhangwenkang
3d3e91c6e3 clientv3: remove v3.WithFirstKey() in Barrier.Wait()
fix the unexpected blocking when using Barrier.Wait(), e.g.
NewBarrier(client, "a").Wait() will block if key "a" is not existed but "a0" is existed, but it should return immediately.

Signed-off-by: zhangwenkang <zwenkang@vmware.com>
2023-07-04 22:01:54 +08:00
Chao Chen
6cdc9ae4fe server/etcdserver/raft.go:
1. rename confChangeCh to raftAdvancedC
2. rename waitApply to confChanged
3. add comments and test assertion

Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-26 22:42:44 -07:00
Chao Chen
6d79b86219 Enable failpoint by default in integration tests
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-21 23:13:46 -07:00
Benjamin Wang
2df32102ca
Merge pull request #15835 from yellowzf/grpcproxy_fix_memberlist_results_not_update_when_proxy_node_down
grpcproxy: fix memberlist results not update when proxy node down
2023-05-15 13:34:05 +08:00
yellowzf
ca221208d2 grpcproxy: fix memberlist results not update when proxy node down
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return

Signed-off-by: yellowzf <zzhf3311@163.com>
2023-05-15 10:59:02 +08:00
Benjamin Wang
52dfd4bbed
Merge pull request #15867 from chaochn47/auth_test_split_8
migrate e2e auth tests to common #8
2023-05-13 14:21:37 +08:00
Chao Chen
c846b087db migrate e2e auth tests to common #8
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-12 22:51:47 -07:00
Benjamin Wang
05b663fbe8
Merge pull request #15828 from chaochn47/add_leadership_transfer_coverage
tests/e2e: add graceful shutdown test
2023-05-11 07:39:25 +08:00
James Blair
3f5ad36039
Deflake TestEtcdGrpcResolverRoundRobin.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-05-10 21:03:01 +12:00
Chao Chen
f31d0eafb9 tests/e2e: add graceful shutdown test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-09 17:08:53 -07:00
Benjamin Wang
b404d25d84
Merge pull request #15741 from AngstyDuck/set-default-value-for-AutoCompactionMode
server: default value for config file field auto-compaction-mode is n…
2023-05-10 05:44:16 +08:00
AngstyDuck
a7344da7d3 server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined
Signed-off-by: AngstyDuck <solsticedante@gmail.com>
2023-05-09 23:10:44 +08:00
Hitoshi Mitake
49b59cc8e5
Merge pull request #15656 from mitake/lease-timetolive-auth
protect LeaseTimeToLive with RBAC
2023-05-02 23:02:29 +09:00
James Blair
b9533ca98b
Deflake TestEtcdGrpcResolverRoundRobin.
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-29 14:14:16 +12:00
Hitoshi Mitake
c9b368119e tests: e2e and integration test for timetolive
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
James Blair
18e3acae0e
Add new test for round robin resolver.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-25 18:44:24 +12:00
Wei Fu
50aa00b203 tests: make log monitor as common helper
It's followup of #15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-18 09:00:24 +08:00
Wei Fu
9f034fbaa8 chore: use tools/mod to lock the cfssl cmd version
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:06:31 +08:00
Wei Fu
8cd5969248 chore: use strict mode for tests/*/*.sh
REF: #15514

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-13 12:05:39 +08:00
Wei Fu
536953ec6c tests: deflake TestV3WatchRestoreSnapshotUnsync
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index:  6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.

```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}
```

To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.

Fixes: #15545

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-10 22:27:58 +08:00
Peter Wortmann
af25936fb7 tests/integration: Demonstrate manual progress notification race
This will fail basically every time, as the progress notification
request catches the watcher in an asynchronised state.

Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
2023-04-05 11:19:07 +01:00
Marek Siarkowicz
0cbd56e8b6 tests: Cleanup endpoints
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 12:18:54 +02:00
Hitoshi Mitake
4da39e4b1e
Merge pull request #15294 from mitake/range-check
server/auth: disallow creating empty permission ranges
2023-04-03 09:03:50 +09:00
Chao Chen
f163af2bc8 deflake TestTracing
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-03-17 14:39:18 -07:00
Marek Siarkowicz
372042c374 refactor: Use proper variable names for urls
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-13 14:48:01 +01:00
Wei Fu
4a9ad47bcc tests/integration: deflake #15409
The cluster version will be initialized after the member becomes leader.
The update is handled asynchronously. It couldn't be updated if the member
has been closed and the go-runtime picks the `s.stopping` channel first.

```go
// e2a5df534c/server/etcdserver/server.go (L2170)

func (s *EtcdServer) monitorClusterVersions() {
	...
	for {
		select {
		case <-s.firstCommitInTerm.Receive():
		case <-time.After(monitorVersionInterval):
		case <-s.stopping:
			return
		}
		...
	}
}
```

Or after the `s.stopping` has been closed, the [UpdateClusterVersion][1] won't
file GoAttach successfully. For the #15409, we can see the warn log
`server has stopped; skipping GoAttach` from GoAttach:

```plain
https://github.com/etcd-io/etcd/actions/runs/4340931587/jobs/7580103902

    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	stopping grpc server due to error	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	stopped grpc server due to error	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	ERROR	default	setting up serving from embedded etcd failed.	{"error": "accept tcp 127.0.0.1:2379: use of closed network connection"}
    logger.go:130: 2023-03-06T07:36:44.253Z	ERROR	default	setting up serving from embedded etcd failed.	{"error": "http: Server closed"}
    logger.go:130: 2023-03-06T07:36:44.253Z	INFO	default	skipped leadership transfer for single voting member cluster	{"local-member-id": "8e9e05c52164694d", "current-leader-member-id": "8e9e05c52164694d"}

    logger.go:130: 2023-03-06T07:36:44.253Z	WARN	default	server has stopped; skipping GoAttach

    ...
```

If the cluster version isn't updated, the minimum storage version will
be v3.5 because the [AuthStatus][2] is introduced in [v3.5][3].
The compare will fail.

To fix this issue, we should wait for cluster version to become ready
after server is ready to serve request.

[1]: <e2a5df534c/server/etcdserver/adapters.go (L45)>
[2]: <071e70cdc4>
[3]: <1b4e54c238>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-11 14:50:37 +08:00
Wei Fu
3419230eea tests/integration: Update TestLeasingDeleteRangeContendTxn
The TestLeasingDeleteRangeContendTxn is trying to test for RangeDelete when
the target resources are being updated.  When the `txnLeasing` wants a
server-side transaction, it needs to ensure all the keys mod revision should
be leass than what it saw. If the compare fails, it will repeat to apply the
server-side transaction until it is sucessful. I believe the test-case is
trying to verify how the `txnLeasing` handles the race issue.

Before the patch #15401, the resource-updating goroutine keeps updating until
the RangeDelete finishes. The testcase is flaky because two goroutines are
sharing one `ctx` and grpc-go client won't wait for the response if `ctx`
has been canceled.

For example,

| DelLease Goroutine   | PutLease Goroutine         | ETCD Server                    | Key/0 Status |
| --                   | ---                        | --                             | --           |
| deleted              |                            |                                | version = 0  |
|                      | send update(key/0=123) req | received update(key/0=123) req | version = 0  |
| cancel               |                            |                                | version = 0  |
|                      | exit because of cancel     |                                | version = 0  |
| get key/0 by putkv   |                            |                                | version = 0  |
|                      |                            | applied update(key/0=123)      | version = 1  |
| get key/0 by raw-cli |                            |                                | version = 1  |

So `raw-cli` gets `[key/0=123]` while the `putkv` gets `[]`. If `putkv`
applies two update reqs to ETCD server and the last one is canceled
before apply, the error will be like:

```
expected [key:"key/0" version:2 value:"123" ], got [key:"key/0" version:1 value:"123" ]
```

The resource-updating goroutine should not share the ctx with RangeDelete here.
And I also revert current main branch because the resource-update goroutine
only updates 8 times and might exit before `RangeDelete`. In this case,
the `txnLeasing` is not handling the race issue.

Fixes: #15352

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-07 23:11:41 +08:00
Thomas Jungblut
63964ec781 Fixing flaky TestLeasingDeleteRangeContendTxn
Fixes etcd-io#15352.
Depending on the goroutine scheduling, the expected count of 8 might not
have been reached yet. This ensures the routine won't stop earlier than
that.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-03-03 11:38:22 +01:00