17898 Commits

Author SHA1 Message Date
ahrtr
47038593e9 set the consistent_index directly when applyV3 isn't performed 2022-04-07 05:35:13 +08:00
ahrtr
7ac995cdde enhanced authBackend to support authReadTx 2022-04-07 05:35:13 +08:00
ahrtr
a4c5da844d added detailed comment to explain the difference between Lock and LockWithoutHook 2022-04-07 05:35:13 +08:00
ahrtr
bfd5170f66 add a txPostLockHook into the backend
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.

In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
2022-04-07 05:35:13 +08:00
Manuel Rüger
f0f77fc14e go.mod: Bump prometheus/client_golang to v1.12.1
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-04-06 19:03:24 +02:00
Marek Siarkowicz
3ffa253516 tests: Add tests for snapshot compatibility and recovery between versions 2022-04-06 16:10:38 +02:00
Marek Siarkowicz
c4d055fe7b
Merge pull request #13819 from endocrimes/dani/auth_test.go
migrate e2e/users tests to common framework
2022-04-06 16:02:46 +02:00
Piotr Tabor
d24ef3ac20
Merge pull request #13893 from ls-2018/todo
fix unexpose todo
2022-04-06 14:31:26 +02:00
ls-2018
5b84b30fce fix unexpose todo
Signed-off-by: ls-2018 <acejilam@gmail.com>
2022-04-06 17:38:46 +08:00
Piotr Tabor
047e61df7a
Merge pull request #13880 from ahrtr/fix_dump_logs_panic
etcd-dump-logs will panic if there is no WAL entry after the snapshot
2022-04-06 09:25:17 +02:00
Marek Siarkowicz
ad03f2076a
Merge pull request #13886 from serathius/backend-logger
tests: Pass logger to backend
2022-04-05 16:35:07 +02:00
Marek Siarkowicz
ae57fe5d30
Merge pull request #13885 from serathius/verify
server: Add verification of whether lock was called within out outside of apply
2022-04-05 16:22:52 +02:00
Marek Siarkowicz
73fc864247 tests: Pass logger to backend 2022-04-05 15:53:38 +02:00
Marek Siarkowicz
1d3517020b server: Add verification of whether lock was called within out outside of apply 2022-04-05 15:34:45 +02:00
Marek Siarkowicz
8d8271f6d1
Merge pull request #13175 from karuppiah7890/issue-13167-measure-flakyness
scripts: add script to measure percentage of commits with failed status
2022-04-05 15:25:47 +02:00
Marek Siarkowicz
a08d479463
Merge pull request #13868 from endocrimes/dani/leasefix
tests/common/lease: Wait for correct lease list response
2022-04-04 17:51:57 +02:00
Danielle Lancashire
f71196d113 tests/common/lease: Wait for correct lease list response
We don't consistently reach the same etcd server during the lifetime of
a test and in some cases, this means that this test will flake if an
etcd server was slow to update its state and the test hits the outdated
server.

Here we switch to using an `Eventually` case which will wait upto a
second for the expected result before failing - with a 10ms gap between
invocations.

```
[tests(dani/leasefix)] $ gotestsum -- ./common -tags integration -count 100 -timeout 15m -run TestLeaseGrantAndList
✓  common (2m26.71s)

DONE 1600 tests in 147.258s
```
2022-04-04 15:43:17 +02:00
Piotr Tabor
6c974a3e31
Merge pull request #13867 from serathius/logs-test
tests: Use zaptest.NewLogger in tests
2022-04-04 14:47:04 +02:00
Piotr Tabor
5b84d3934e
Merge pull request #13876 from ptabor/20220403-integration-test-fixes
Integration tests flake fixes
2022-04-04 14:46:29 +02:00
Marek Siarkowicz
9dc8bbb7cf
Merge pull request #13875 from ahrtr/be_race
fix WARNING: DATA RACE issue when multiple goroutines access the backend
2022-04-04 13:31:19 +02:00
Marek Siarkowicz
804fddf921 tests: Use zaptest.NewLogger in tests 2022-04-04 13:03:15 +02:00
ahrtr
543c87cc38 etcd-dump-logs will panic if there is no WAL entry after the snapshot 2022-04-04 18:58:18 +08:00
Piotr Tabor
d4dcd3061d Fix flakes in TestV3LeaseCheckpoint/Checkpointing_disabled,_lease_TTL_is_reset
I think strong (not-equal) relationship was too restrictive when expressed with 1s granularity.

```
        logger.go:130: 2022-04-03T22:15:15.242+0200	WARN	m1	leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk	{"member": "m1", "to": "cb785755eb80ac1", "heartbeat-interval": "10ms", "expected-duration": "20ms", "exceeded-duration": "24.666613ms"}
        logger.go:130: 2022-04-03T22:15:15.262+0200	INFO	m-1	published local member to cluster through raft	{"member": "m-1", "local-member-id": "e2dd9f523aa7be87", "local-member-attributes": "{Name:m-1 ClientURLs:[unix://127.0.0.1:2196386040]}", "cluster-id": "b4b8e7e41c23c8b5", "publish-timeout": "5.2s"}
        v3_lease_test.go:415: Expected lease ttl (4m58s) to be greather than (4m58s)
```
2022-04-03 23:13:01 +02:00
Piotr Tabor
90796720c1 Reduce integration test parallelism to 2 packages at once.
Especially with 'race' detection, running O(cpu) integrational tests was causing CPU overloads and timeouts.
2022-04-03 14:48:36 +02:00
Piotr Tabor
ed1bc447c7 Flakes: Additional logging and timeouts to understand common flakes. 2022-04-03 14:48:36 +02:00
Piotr Tabor
68f2cb8c77 Fix ExampleAuth from integration/clientv3/examples (on OsX)
The code now ensures that each of the test is running in its own directory as opposed to shared os.tempdir.
```
$  (cd tests && env go test -timeout=15m --race go.etcd.io/etcd/tests/v3/integration/clientv3/examples -run ExampleAuth)
2022/04/03 10:24:59 Running tests (examples): ...
2022/04/03 10:24:59 the function can be called only in the test context. Was integration.BeforeTest() called ?
2022/04/03 10:24:59 2022-04-03T10:24:59.462+0200	INFO	m0	LISTEN GRPC	{"member": "m0", "grpcAddr": "localhost:m0", "m.Name": "m0"}
```
2022-04-03 14:16:45 +02:00
Piotr Tabor
d57f8dba62 Deflaking: Make WaitLeader (and WaitMembersForLeader) aggressively (30s) wait for leader being established.
Nearly none of the tests was checking the value... just assuming WaitLeader success.

```
    maintenance_test.go:277: Waiting for leader...
    logger.go:130: 2022-04-03T08:01:09.914+0200	INFO	m0	cluster version differs from storage version.	{"member": "m0", "cluster-version": "3.6.0", "storage-version": "3.5.0"}
    logger.go:130: 2022-04-03T08:01:09.915+0200	WARN	m0	leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk	{"member": "m0", "to": "2acc3d3b521981", "heartbeat-interval": "10ms", "expected-duration": "20ms", "exceeded-duration": "103.756219ms"}
    logger.go:130: 2022-04-03T08:01:09.916+0200	INFO	m0	updated storage version	{"member": "m0", "new-storage-version": "3.6.0"}
    ...
    logger.go:130: 2022-04-03T08:01:09.926+0200	INFO	grpc	[[roundrobin] roundrobinPicker: Build called with info: {map[0xc002630ac0:{{unix:localhost:m0 localhost <nil> 0 <nil>}} 0xc002630af0:{{unix:localhost:m1 localhost <nil> 0 <nil>}} 0xc002630b20:{{unix:localhost:m2 localhost <nil> 0 <nil>}}]}]
    logger.go:130: 2022-04-03T08:01:09.926+0200	WARN	m0	apply request took too long	{"member": "m0", "took": "114.661766ms", "expected-duration": "100ms", "prefix": "", "request": "header:<ID:12658633312866157316 > cluster_version_set:<ver:\"3.6.0\" > ", "response": ""}
    logger.go:130: 2022-04-03T08:01:09.927+0200	INFO	m0	cluster version is updated	{"member": "m0", "cluster-version": "3.6"}
    logger.go:130: 2022-04-03T08:01:09.955+0200	INFO	m2.raft	9f96af25a04e2ec3 [logterm: 2, index: 8, vote: 9903a56eaf96afac] ignored MsgVote from 2acc3d3b521981 [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 10)	{"member": "m2"}
    logger.go:130: 2022-04-03T08:01:09.955+0200	INFO	m0.raft	9903a56eaf96afac [logterm: 2, index: 8, vote: 9903a56eaf96afac] ignored MsgVote from 2acc3d3b521981 [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 5)	{"member": "m0"}
    logger.go:130: 2022-04-03T08:01:09.955+0200	INFO	m0.raft	9903a56eaf96afac [term: 2] received a MsgAppResp message with higher term from 2acc3d3b521981 [term: 3]	{"member": "m0"}
    logger.go:130: 2022-04-03T08:01:09.955+0200	INFO	m0.raft	9903a56eaf96afac became follower at term 3	{"member": "m0"}
    logger.go:130: 2022-04-03T08:01:09.955+0200	INFO	m0.raft	raft.node: 9903a56eaf96afac lost leader 9903a56eaf96afac at term 3	{"member": "m0"}
    maintenance_test.go:279: Leader established.
```

Tmp
2022-04-03 12:23:09 +02:00
Piotr Tabor
2fab3f3ae5 Make naming of test-nodes consistent and positive: m0, m1, m2
The nodes used to be named: m-1, m0, m1, that was generating very confusing logs
in integration tests.
2022-04-03 09:16:55 +02:00
ahrtr
836bd6bc3a fix WARNING: DATA RACE issue when multiple goroutines access the backend concurrently 2022-04-03 06:13:09 +08:00
Sahdev Zala
3d3c4373e3
Merge pull request #13860 from mrueg/fix-make2
Makefile: Additional logic fix
2022-04-02 14:43:19 -04:00
Piotr Tabor
f85cd0296f
Merge pull request #13872 from ptabor/20220402-osx-unit-test-pass
Fix TestauthTokenBundleOnOverwrite on OsX:
2022-04-02 20:03:38 +02:00
Piotr Tabor
3bb2d0c716
Merge pull request #13870 from howz97/main
fix comment in raft.go
2022-04-02 16:50:26 +02:00
Piotr Tabor
8cd8a1ea10 Flakes in integration/clientv3/examples/...
The tests sometimes flaked due to already existing socket-files.
Now each execution works in a tempoarary directory.
2022-04-02 16:16:25 +02:00
Piotr Tabor
3b589fb3b2 Fix TestauthTokenBundleOnOverwrite on OsX:
```
% (cd client/v3 && env go test -short -timeout=3m --race ./...)
--- FAIL: TestAuthTokenBundleNoOverwrite (0.00s)
    client_test.go:210: listen unix /var/folders/t1/3m8z9xz93t9c3vpt7zyzjm6w00374n/T/TestAuthTokenBundleNoOverwrite3197524989/001/etcd-auth-test:0: bind: invalid argument
FAIL
FAIL	go.etcd.io/etcd/client/v3	4.270s
```

The reason was that the path exceeded 108 chars (that is too much for socket).
In the mitigation we first change chroot (working directory) to the tempDir... such the path is 'local'.
2022-04-02 16:12:02 +02:00
howz97
f9c9bfa44c fix comment in raft.go 2022-04-02 14:27:33 +08:00
W. Trevor King
c59cae5aaa Makefile: Drop log tee calls
We've had these since in one form or another since 23a302364c
(Makefile: initial commit, 2017-09-29), but in at least some cases the
underlying shell does not pipefail, a test failure gets swallowed, and
the make call exits zero despite failing the tests [1]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_etcd/109/pull-ci-openshift-etcd-openshift-4.11-unit/1509260812278042624/artifacts/test/build-log.txt
  TEST_OPTS: PASSES='unit'
  log-file: test-MTY0ODY3MTA1MQo.log
  PASSES='unit' ./test.sh 2>&1 | tee test-MTY0ODY3MTA1MQo.log
  % env GO111MODULE=off go get github.com/myitcv/gobin
  Running with --race
  Starting at: Wed Mar 30 20:10:52 UTC 2022

  'unit' started at Wed Mar 30 20:10:52 UTC 2022
  % (cd api && env go test -short -timeout=3m --race ./...)
  stderr: authpb/auth.pb.go:12:2: open /go/pkg/mod/github.com/gogo/protobuf@v1.3.2/gogoproto: permission denied
  stderr: authpb/auth.pb.go:13:2: open /go/pkg/mod/github.com/golang/protobuf@v1.5.2/proto: permission denied
  stderr: etcdserverpb/rpc.pb.go:17:2: open /go/pkg/mod/google.golang.org/genproto@v0.0.0-20210602131652-f16073e35f0c/googleapis/api/annotations: permission denied
  stderr: etcdserverpb/rpc.pb.go:18:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0: permission denied
  stderr: etcdserverpb/rpc.pb.go:19:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/codes: permission denied
  stderr: etcdserverpb/rpc.pb.go:20:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/status: permission denied
  stderr: etcdserverpb/gw/rpc.pb.gw.go:17:2: open /go/pkg/mod/github.com/golang/protobuf@v1.5.2/descriptor: permission denied
  stderr: etcdserverpb/gw/rpc.pb.gw.go:19:2: open /go/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.16.0/runtime: permission denied
  stderr: etcdserverpb/gw/rpc.pb.gw.go:20:2: open /go/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.16.0/utilities: permission denied
  FAIL: (code:1):
    % (cd api && env go test -short -timeout=3m --race ./...)
  stderr: etcdserverpb/gw/rpc.pb.gw.go:23:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/grpclog: permission denied
  stderr: version/version.go:23:2: open /go/pkg/mod/github.com/coreos/go-semver@v0.3.0/semver: permission denied
  FAIL: 'unit' failed at Wed Mar 30 20:10:52 UTC 2022
  ! egrep "(--- FAIL:|DATA RACE|panic: test timed out|appears to have leaked)" -B50 -A10 test-MTY0ODY3MTA1MQo.log

We can't drop the log aggregation, because the log files are used for
the panic/race grepping.  But I'm dropping the tee (so no more
synchronous updates, but we no longer have to worry about pipefail
handling).  And then if the test script fails, I'm dumping the log
file to stdout and exiting 1, so the overall run fails.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_etcd/109/pull-ci-openshift-etcd-openshift-4.11-unit/1509260812278042624
2022-04-01 11:06:50 -07:00
Marek Siarkowicz
b1610934e3
Merge pull request #13864 from serathius/logs
Fix inconsistent log format
2022-04-01 11:00:48 +02:00
Marek Siarkowicz
63346bfead server: Use default logging configuration instead of zap production one
This fixes problem where logs json changes format of timestamp.
2022-04-01 10:23:42 +02:00
Piotr Tabor
e4d34f21bc
Merge pull request #13856 from ahrtr/cleanup_unused_code
The file server/storage/mvcc/util.go isn't used at all, so removing it
2022-03-31 21:16:02 +02:00
Marek Siarkowicz
e5bf23037a tests: Keeps log in expect to allow their analysis 2022-03-31 21:02:36 +02:00
Manuel Rüger
29905029f6 Makefile: Additional logic fix
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-03-31 11:18:36 +02:00
Piotr Tabor
0d5c1dce49
Merge pull request #13857 from mrueg/fix-make
Makefile: Fix wrong target
2022-03-31 11:04:52 +02:00
Manuel Rüger
ec29b9ee36 Makefile: Fix wrong target
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-03-31 09:48:21 +02:00
ahrtr
9b3b383366 the file server/storage/mvcc/util.go isn't used at all, so removing it 2022-03-31 10:14:46 +08:00
Chris Ayoub
125f3c3f9a clientv3: filter learners members during autosync
This change is to ensure that all members returned during the client's
AutoSync are started and are not learners, which are not valid
etcd members to make requests to.
2022-03-29 13:38:21 -04:00
Marek Siarkowicz
0e83f62e0c
Merge pull request #13852 from serathius/recommend
changelog: Update and deduplicate production recommendations
2022-03-29 19:12:46 +02:00
Marek Siarkowicz
88a39d780f changelog: Update and deduplicate production recommendations 2022-03-29 19:09:01 +02:00
Marek Siarkowicz
27e222e2d7
Merge pull request #13802 from yankay/fix-the-api-dependency-in-pkg-and-update-cobra-to-1.4.0
Fix the etcd api dependency in pkg. And Update Cobra Version to1.4.0
2022-03-28 10:40:24 +02:00
Sahdev Zala
be2929568f
Merge pull request #13834 from ahrtr/tool_decode_meta
enhance etcd-dump-db to display keys in meta more friendly
2022-03-26 13:38:06 -04:00
Sahdev Zala
dcc226491f
Merge pull request #13836 from kkkkun/set-etcdutl-default
test: set etcdutl to default
2022-03-25 20:13:21 -04:00