Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Piotr Tabor	76543d06ce	Merge pull request #13898 from mrueg/update-tools tools/mod: Update tools	2022-04-08 09:16:58 +02:00
Donal Hunt	4c8ef011e0	*: drop use of humanize.Time() in favour of time.Duration.String() humanize.Time() drops precision resulting in some events reporting they took "now" time to complete. Using time.Duration.String() results in accurate duration being reported. Fixes #13905	2022-04-07 23:24:35 +01:00
Marek Siarkowicz	3dce38085d	Merge pull request #13903 from serathius/term server: Save consistency index and term to backend even when they decease	2022-04-07 21:12:23 +02:00
Marek Siarkowicz	1ea53d527e	server: Save consistency index and term to backend even when they decrease Reason to store CI and term in backend was to make db fully independent snapshot, it was never meant to interfere with apply logic. Skip of CI was introduced for v2->v3 migration where we wanted to prevent it from decreasing when replaying wal in https://github.com/etcd-io/etcd/pull/5391. By mistake it was added to apply flow during refactor in https://github.com/etcd-io/etcd/pull/12855#commitcomment-70713670. Consistency index and term should only be negotiated and used by raft to make decisions. Their values should only driven by raft state machine and backend should only be responsible for storing them.	2022-04-07 19:00:03 +02:00
Marek Siarkowicz	a5b9f72da6	Merge pull request #13807 from endocrimes/dani/before-test-fix tests/framework/integration: Fail BeforeTest nesting early	2022-04-07 15:37:06 +02:00
Danielle Lancashire	7cc00ec981	tests/framework/integration: Fail nesting early Currently there are a handful of tests within etcd that silently fail because LeakDetection will skip the test before it manages to hit this check. Here we move the check to the beginning of the process to highlight these cases earlier, and to avoid them accidentally presenting as leaks.	2022-04-07 13:10:15 +00:00
Manuel Rüger	dedb661d92	tools/mod: Update tools github.com/google/addlicense v0.0.0-20210428195630-6d92264d7170 -> v1.0.0 github.com/gordonklaus/ineffassign v0.0.0-20200809085317-e36bfde3bb78 -> v0.0.0-20210914165742-4cc7213b9bc8 github.com/grpc-ecosystem/grpc-gateway v1.14.6 -> v1.16.0 github.com/hexfusion/schwag v0.0.0-20170606222847-b7d0fc9aadaa -> v0.0.0-20211117114134-3ceb0191ccbf github.com/mgechev/revive v1.0.2 -> v1.2.0 github.com/mikefarah/yq/v3 v3.0.0-20201125113350-f42728eef735 -> v4.24.2 gotest.tools v2.2.0+incompatible -> v3.1.0 gotest.tools/gotestsum v0.3.5 -> v1.7.0 honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc -> v0.3.0 mvdan.cc/unparam v0.0.0-20200501210554-b37ab49443f7 -> v0.0.0-20220316160445-06cc5682983b Signed-off-by: Manuel Rüger <manuel@rueg.eu>	2022-04-07 14:27:51 +02:00
Piotr Tabor	c83b1ad9ba	Merge pull request #13854 from ahrtr/data_corruption Fix the data inconsistency issue by moving the SetConsistentIndex into the transaction lock	2022-04-07 14:20:19 +02:00
Marek Siarkowicz	706cde86d0	Merge pull request #13897 from mrueg/bump_client-golang go.mod: Bump prometheus/client_golang to v1.12.1	2022-04-07 09:36:36 +02:00
ahrtr	4033f5c2b9	move the consistentIdx and consistentTerm from Etcdserver to cindex package Removed the fields consistentIdx and consistentTerm from struct EtcdServer, and added applyingIndex and applyingTerm into struct consistentIndex in package cindex. We may remove the two fields completely if we decide to remove the OnPreCommitUnsafe, and it will depend on the performance test result.	2022-04-07 15:16:49 +08:00
Piotr Tabor	47bb48dfdc	Merge pull request #13869 from wking/test-fail-exit-code-with-log-redirect-upstream Makefile: Drop log tee calls	2022-04-07 09:12:19 +02:00
ahrtr	e155e50886	rename LockWithoutHook to LockOutsideApply and add LockInsideApply	2022-04-07 05:35:13 +08:00
ahrtr	47038593e9	set the consistent_index directly when applyV3 isn't performed	2022-04-07 05:35:13 +08:00
ahrtr	7ac995cdde	enhanced authBackend to support authReadTx	2022-04-07 05:35:13 +08:00
ahrtr	a4c5da844d	added detailed comment to explain the difference between Lock and LockWithoutHook	2022-04-07 05:35:13 +08:00
ahrtr	bfd5170f66	add a txPostLockHook into the backend Previously the SetConsistentIndex() is called during the apply workflow, but it's outside the db transaction. If a commit happens between SetConsistentIndex and the following apply workflow, and etcd crashes for whatever reason right after the commit, then etcd commits an incomplete transaction to db. Eventually etcd runs into the data inconsistency issue. In this commit, we move the SetConsistentIndex into a txPostLockHook, so it will be executed inside the transaction lock.	2022-04-07 05:35:13 +08:00
Manuel Rüger	f0f77fc14e	go.mod: Bump prometheus/client_golang to v1.12.1 Signed-off-by: Manuel Rüger <manuel@rueg.eu>	2022-04-06 19:03:24 +02:00
Marek Siarkowicz	3ffa253516	tests: Add tests for snapshot compatibility and recovery between versions	2022-04-06 16:10:38 +02:00
Marek Siarkowicz	c4d055fe7b	Merge pull request #13819 from endocrimes/dani/auth_test.go migrate e2e/users tests to common framework	2022-04-06 16:02:46 +02:00
Piotr Tabor	d24ef3ac20	Merge pull request #13893 from ls-2018/todo fix unexpose todo	2022-04-06 14:31:26 +02:00
ls-2018	5b84b30fce	fix unexpose todo Signed-off-by: ls-2018 <acejilam@gmail.com>	2022-04-06 17:38:46 +08:00
Piotr Tabor	047e61df7a	Merge pull request #13880 from ahrtr/fix_dump_logs_panic etcd-dump-logs will panic if there is no WAL entry after the snapshot	2022-04-06 09:25:17 +02:00
Marek Siarkowicz	ad03f2076a	Merge pull request #13886 from serathius/backend-logger tests: Pass logger to backend	2022-04-05 16:35:07 +02:00
Marek Siarkowicz	ae57fe5d30	Merge pull request #13885 from serathius/verify server: Add verification of whether lock was called within out outside of apply	2022-04-05 16:22:52 +02:00
Marek Siarkowicz	73fc864247	tests: Pass logger to backend	2022-04-05 15:53:38 +02:00
Marek Siarkowicz	1d3517020b	server: Add verification of whether lock was called within out outside of apply	2022-04-05 15:34:45 +02:00
Marek Siarkowicz	8d8271f6d1	Merge pull request #13175 from karuppiah7890/issue-13167-measure-flakyness scripts: add script to measure percentage of commits with failed status	2022-04-05 15:25:47 +02:00
Marek Siarkowicz	a08d479463	Merge pull request #13868 from endocrimes/dani/leasefix tests/common/lease: Wait for correct lease list response	2022-04-04 17:51:57 +02:00
Danielle Lancashire	f71196d113	tests/common/lease: Wait for correct lease list response We don't consistently reach the same etcd server during the lifetime of a test and in some cases, this means that this test will flake if an etcd server was slow to update its state and the test hits the outdated server. Here we switch to using an `Eventually` case which will wait upto a second for the expected result before failing - with a 10ms gap between invocations. ``` [tests(dani/leasefix)] $ gotestsum -- ./common -tags integration -count 100 -timeout 15m -run TestLeaseGrantAndList ✓ common (2m26.71s) DONE 1600 tests in 147.258s ```	2022-04-04 15:43:17 +02:00
Piotr Tabor	6c974a3e31	Merge pull request #13867 from serathius/logs-test tests: Use zaptest.NewLogger in tests	2022-04-04 14:47:04 +02:00
Piotr Tabor	5b84d3934e	Merge pull request #13876 from ptabor/20220403-integration-test-fixes Integration tests flake fixes	2022-04-04 14:46:29 +02:00
Marek Siarkowicz	9dc8bbb7cf	Merge pull request #13875 from ahrtr/be_race fix WARNING: DATA RACE issue when multiple goroutines access the backend	2022-04-04 13:31:19 +02:00
Marek Siarkowicz	804fddf921	tests: Use zaptest.NewLogger in tests	2022-04-04 13:03:15 +02:00
ahrtr	543c87cc38	etcd-dump-logs will panic if there is no WAL entry after the snapshot	2022-04-04 18:58:18 +08:00
Piotr Tabor	d4dcd3061d	Fix flakes in TestV3LeaseCheckpoint/Checkpointing_disabled,_lease_TTL_is_reset I think strong (not-equal) relationship was too restrictive when expressed with 1s granularity. ``` logger.go:130: 2022-04-03T22:15:15.242+0200 WARN m1 leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk {"member": "m1", "to": "cb785755eb80ac1", "heartbeat-interval": "10ms", "expected-duration": "20ms", "exceeded-duration": "24.666613ms"} logger.go:130: 2022-04-03T22:15:15.262+0200 INFO m-1 published local member to cluster through raft {"member": "m-1", "local-member-id": "e2dd9f523aa7be87", "local-member-attributes": "{Name:m-1 ClientURLs:[unix://127.0.0.1:2196386040]}", "cluster-id": "b4b8e7e41c23c8b5", "publish-timeout": "5.2s"} v3_lease_test.go:415: Expected lease ttl (4m58s) to be greather than (4m58s) ```	2022-04-03 23:13:01 +02:00
Piotr Tabor	90796720c1	Reduce integration test parallelism to 2 packages at once. Especially with 'race' detection, running O(cpu) integrational tests was causing CPU overloads and timeouts.	2022-04-03 14:48:36 +02:00
Piotr Tabor	ed1bc447c7	Flakes: Additional logging and timeouts to understand common flakes.	2022-04-03 14:48:36 +02:00
Piotr Tabor	68f2cb8c77	Fix ExampleAuth from integration/clientv3/examples (on OsX) The code now ensures that each of the test is running in its own directory as opposed to shared os.tempdir. ``` $ (cd tests && env go test -timeout=15m --race go.etcd.io/etcd/tests/v3/integration/clientv3/examples -run ExampleAuth) 2022/04/03 10:24:59 Running tests (examples): ... 2022/04/03 10:24:59 the function can be called only in the test context. Was integration.BeforeTest() called ? 2022/04/03 10:24:59 2022-04-03T10:24:59.462+0200 INFO m0 LISTEN GRPC {"member": "m0", "grpcAddr": "localhost:m0", "m.Name": "m0"} ```	2022-04-03 14:16:45 +02:00
Piotr Tabor	d57f8dba62	Deflaking: Make WaitLeader (and WaitMembersForLeader) aggressively (30s) wait for leader being established. Nearly none of the tests was checking the value... just assuming WaitLeader success. ``` maintenance_test.go:277: Waiting for leader... logger.go:130: 2022-04-03T08:01:09.914+0200 INFO m0 cluster version differs from storage version. {"member": "m0", "cluster-version": "3.6.0", "storage-version": "3.5.0"} logger.go:130: 2022-04-03T08:01:09.915+0200 WARN m0 leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk {"member": "m0", "to": "2acc3d3b521981", "heartbeat-interval": "10ms", "expected-duration": "20ms", "exceeded-duration": "103.756219ms"} logger.go:130: 2022-04-03T08:01:09.916+0200 INFO m0 updated storage version {"member": "m0", "new-storage-version": "3.6.0"} ... logger.go:130: 2022-04-03T08:01:09.926+0200 INFO grpc [[roundrobin] roundrobinPicker: Build called with info: {map[0xc002630ac0:{{unix:localhost:m0 localhost <nil> 0 <nil>}} 0xc002630af0:{{unix:localhost:m1 localhost <nil> 0 <nil>}} 0xc002630b20:{{unix:localhost:m2 localhost <nil> 0 <nil>}}]}] logger.go:130: 2022-04-03T08:01:09.926+0200 WARN m0 apply request took too long {"member": "m0", "took": "114.661766ms", "expected-duration": "100ms", "prefix": "", "request": "header:<ID:12658633312866157316 > cluster_version_set:<ver:\"3.6.0\" > ", "response": ""} logger.go:130: 2022-04-03T08:01:09.927+0200 INFO m0 cluster version is updated {"member": "m0", "cluster-version": "3.6"} logger.go:130: 2022-04-03T08:01:09.955+0200 INFO m2.raft 9f96af25a04e2ec3 [logterm: 2, index: 8, vote: 9903a56eaf96afac] ignored MsgVote from 2acc3d3b521981 [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 10) {"member": "m2"} logger.go:130: 2022-04-03T08:01:09.955+0200 INFO m0.raft 9903a56eaf96afac [logterm: 2, index: 8, vote: 9903a56eaf96afac] ignored MsgVote from 2acc3d3b521981 [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 5) {"member": "m0"} logger.go:130: 2022-04-03T08:01:09.955+0200 INFO m0.raft 9903a56eaf96afac [term: 2] received a MsgAppResp message with higher term from 2acc3d3b521981 [term: 3] {"member": "m0"} logger.go:130: 2022-04-03T08:01:09.955+0200 INFO m0.raft 9903a56eaf96afac became follower at term 3 {"member": "m0"} logger.go:130: 2022-04-03T08:01:09.955+0200 INFO m0.raft raft.node: 9903a56eaf96afac lost leader 9903a56eaf96afac at term 3 {"member": "m0"} maintenance_test.go:279: Leader established. ``` Tmp	2022-04-03 12:23:09 +02:00
Piotr Tabor	2fab3f3ae5	Make naming of test-nodes consistent and positive: m0, m1, m2 The nodes used to be named: m-1, m0, m1, that was generating very confusing logs in integration tests.	2022-04-03 09:16:55 +02:00
ahrtr	836bd6bc3a	fix WARNING: DATA RACE issue when multiple goroutines access the backend concurrently	2022-04-03 06:13:09 +08:00
Sahdev Zala	3d3c4373e3	Merge pull request #13860 from mrueg/fix-make2 Makefile: Additional logic fix	2022-04-02 14:43:19 -04:00
Piotr Tabor	f85cd0296f	Merge pull request #13872 from ptabor/20220402-osx-unit-test-pass Fix TestauthTokenBundleOnOverwrite on OsX:	2022-04-02 20:03:38 +02:00
Piotr Tabor	3bb2d0c716	Merge pull request #13870 from howz97/main fix comment in raft.go	2022-04-02 16:50:26 +02:00
Piotr Tabor	8cd8a1ea10	Flakes in integration/clientv3/examples/... The tests sometimes flaked due to already existing socket-files. Now each execution works in a tempoarary directory.	2022-04-02 16:16:25 +02:00
Piotr Tabor	3b589fb3b2	Fix TestauthTokenBundleOnOverwrite on OsX: ``` % (cd client/v3 && env go test -short -timeout=3m --race ./...) --- FAIL: TestAuthTokenBundleNoOverwrite (0.00s) client_test.go:210: listen unix /var/folders/t1/3m8z9xz93t9c3vpt7zyzjm6w00374n/T/TestAuthTokenBundleNoOverwrite3197524989/001/etcd-auth-test:0: bind: invalid argument FAIL FAIL go.etcd.io/etcd/client/v3 4.270s ``` The reason was that the path exceeded 108 chars (that is too much for socket). In the mitigation we first change chroot (working directory) to the tempDir... such the path is 'local'.	2022-04-02 16:12:02 +02:00
howz97	f9c9bfa44c	fix comment in raft.go	2022-04-02 14:27:33 +08:00
W. Trevor King	c59cae5aaa	Makefile: Drop log tee calls We've had these since in one form or another since 23a302364c (Makefile: initial commit, 2017-09-29), but in at least some cases the underlying shell does not pipefail, a test failure gets swallowed, and the make call exits zero despite failing the tests [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_etcd/109/pull-ci-openshift-etcd-openshift-4.11-unit/1509260812278042624/artifacts/test/build-log.txt TEST_OPTS: PASSES='unit' log-file: test-MTY0ODY3MTA1MQo.log PASSES='unit' ./test.sh 2>&1 \| tee test-MTY0ODY3MTA1MQo.log % env GO111MODULE=off go get github.com/myitcv/gobin Running with --race Starting at: Wed Mar 30 20:10:52 UTC 2022 'unit' started at Wed Mar 30 20:10:52 UTC 2022 % (cd api && env go test -short -timeout=3m --race ./...) stderr: authpb/auth.pb.go:12:2: open /go/pkg/mod/github.com/gogo/protobuf@v1.3.2/gogoproto: permission denied stderr: authpb/auth.pb.go:13:2: open /go/pkg/mod/github.com/golang/protobuf@v1.5.2/proto: permission denied stderr: etcdserverpb/rpc.pb.go:17:2: open /go/pkg/mod/google.golang.org/genproto@v0.0.0-20210602131652-f16073e35f0c/googleapis/api/annotations: permission denied stderr: etcdserverpb/rpc.pb.go:18:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0: permission denied stderr: etcdserverpb/rpc.pb.go:19:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/codes: permission denied stderr: etcdserverpb/rpc.pb.go:20:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/status: permission denied stderr: etcdserverpb/gw/rpc.pb.gw.go:17:2: open /go/pkg/mod/github.com/golang/protobuf@v1.5.2/descriptor: permission denied stderr: etcdserverpb/gw/rpc.pb.gw.go:19:2: open /go/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.16.0/runtime: permission denied stderr: etcdserverpb/gw/rpc.pb.gw.go:20:2: open /go/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.16.0/utilities: permission denied FAIL: (code:1): % (cd api && env go test -short -timeout=3m --race ./...) stderr: etcdserverpb/gw/rpc.pb.gw.go:23:2: open /go/pkg/mod/google.golang.org/grpc@v1.38.0/grpclog: permission denied stderr: version/version.go:23:2: open /go/pkg/mod/github.com/coreos/go-semver@v0.3.0/semver: permission denied FAIL: 'unit' failed at Wed Mar 30 20:10:52 UTC 2022 ! egrep "(--- FAIL:\|DATA RACE\|panic: test timed out\|appears to have leaked)" -B50 -A10 test-MTY0ODY3MTA1MQo.log We can't drop the log aggregation, because the log files are used for the panic/race grepping. But I'm dropping the tee (so no more synchronous updates, but we no longer have to worry about pipefail handling). And then if the test script fails, I'm dumping the log file to stdout and exiting 1, so the overall run fails. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_etcd/109/pull-ci-openshift-etcd-openshift-4.11-unit/1509260812278042624	2022-04-01 11:06:50 -07:00
Marek Siarkowicz	b1610934e3	Merge pull request #13864 from serathius/logs Fix inconsistent log format	2022-04-01 11:00:48 +02:00
Marek Siarkowicz	63346bfead	server: Use default logging configuration instead of zap production one This fixes problem where logs json changes format of timestamp.	2022-04-01 10:23:42 +02:00

1 2 3 4 5 ...

17910 Commits