17873 Commits

Author SHA1 Message Date
Sahdev Zala
0c9a4e0f93
Merge pull request #13837 from chrisayoub/main
clientv3: filter learner members during autosync
2022-04-09 19:00:55 -04:00
Piotr Tabor
1d8c06ac50
Merge pull request #13916 from hexfusion/email
MAINTAINERS: update Sam's contact email
2022-04-09 20:01:01 +02:00
Sam Batschelet
cd8f8b9e0b MAINTAINERS: update Sam's contact email
Signed-off-by: Sam Batschelet <sbatschelet@gmail.com>
2022-04-09 13:53:21 -04:00
Marek Siarkowicz
09b299e906
Merge pull request #13914 from ahrtr/data_corruption_changelog
Update 3.5 and 3.6 changelog to cover the data inconsistency issue
2022-04-09 13:07:06 +02:00
Marek Siarkowicz
12bdd1c5e4
Merge pull request #13910 from serathius/crypto
*: update golang.org/x/crypto
2022-04-09 09:41:49 +02:00
Marek Siarkowicz
7d3ca1f516
Merge pull request #13906 from donalhunt/main
*: drop use of humanize.Time() in favour of zap.Duration and time.Duration
2022-04-08 23:41:50 +02:00
ahrtr
bd0c5a74a3 update 3.5 and 3.6 changelog to cover the data inconsistency issue 2022-04-09 05:34:24 +08:00
Donal Hunt
d659403955
Update server/etcdserver/api/v3rpc/maintenance.go
Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
2022-04-08 18:45:13 +01:00
Donal Hunt
6e1afa9677
Update client/v3/snapshot/v3_snapshot.go
Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
2022-04-08 18:44:50 +01:00
Marek Siarkowicz
1bb59adb1e *: update golang.org/x/crypto 2022-04-08 16:27:52 +02:00
Marek Siarkowicz
f026a37474
Merge pull request #13907 from ahrtr/kvstore_readtx
Use readTx in (*store).restore
2022-04-08 14:14:51 +02:00
Marek Siarkowicz
05e6527d26
Merge pull request #13756 from serathius/test-snapshot
tests: Add tests for snapshot compatibility and recovery between versions
2022-04-08 14:14:19 +02:00
ahrtr
a3650db574 use readTx in (*store).restore 2022-04-08 15:45:05 +08:00
Piotr Tabor
76543d06ce
Merge pull request #13898 from mrueg/update-tools
tools/mod: Update tools
2022-04-08 09:16:58 +02:00
Donal Hunt
4c8ef011e0 *: drop use of humanize.Time() in favour of time.Duration.String()
humanize.Time() drops precision resulting in some events reporting they took
"now" time to complete. Using time.Duration.String() results in accurate
duration being reported.

Fixes #13905
2022-04-07 23:24:35 +01:00
Marek Siarkowicz
3dce38085d
Merge pull request #13903 from serathius/term
server: Save consistency index and term to backend even when they decease
2022-04-07 21:12:23 +02:00
Marek Siarkowicz
1ea53d527e server: Save consistency index and term to backend even when they decrease
Reason to store CI and term in backend was to make db fully independent
snapshot, it was never meant to interfere with apply logic. Skip of CI
was introduced for v2->v3 migration where we wanted to prevent it from
decreasing when replaying wal in
https://github.com/etcd-io/etcd/pull/5391. By mistake it was added to
apply flow during refactor in
https://github.com/etcd-io/etcd/pull/12855#commitcomment-70713670.

Consistency index and term should only be negotiated and used by raft to make
decisions. Their values should only driven by raft state machine and
backend should only be responsible for storing them.
2022-04-07 19:00:03 +02:00
Marek Siarkowicz
a5b9f72da6
Merge pull request #13807 from endocrimes/dani/before-test-fix
tests/framework/integration: Fail BeforeTest nesting early
2022-04-07 15:37:06 +02:00
Danielle Lancashire
7cc00ec981 tests/framework/integration: Fail nesting early
Currently there are a handful of tests within etcd that silently fail
because LeakDetection will skip the test before it manages to hit this
check.

Here we move the check to the beginning of the process to highlight
these cases earlier, and to avoid them accidentally presenting as leaks.
2022-04-07 13:10:15 +00:00
Manuel Rüger
dedb661d92 tools/mod: Update tools
github.com/google/addlicense v0.0.0-20210428195630-6d92264d7170 -> v1.0.0
github.com/gordonklaus/ineffassign v0.0.0-20200809085317-e36bfde3bb78 -> v0.0.0-20210914165742-4cc7213b9bc8
github.com/grpc-ecosystem/grpc-gateway v1.14.6 -> v1.16.0
github.com/hexfusion/schwag v0.0.0-20170606222847-b7d0fc9aadaa -> v0.0.0-20211117114134-3ceb0191ccbf
github.com/mgechev/revive v1.0.2 -> v1.2.0
github.com/mikefarah/yq/v3 v3.0.0-20201125113350-f42728eef735 -> v4.24.2
gotest.tools v2.2.0+incompatible -> v3.1.0
gotest.tools/gotestsum v0.3.5 -> v1.7.0
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc -> v0.3.0
mvdan.cc/unparam v0.0.0-20200501210554-b37ab49443f7 -> v0.0.0-20220316160445-06cc5682983b

Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-04-07 14:27:51 +02:00
Piotr Tabor
c83b1ad9ba
Merge pull request #13854 from ahrtr/data_corruption
Fix the data inconsistency issue by moving the SetConsistentIndex into the transaction lock
2022-04-07 14:20:19 +02:00
Marek Siarkowicz
706cde86d0
Merge pull request #13897 from mrueg/bump_client-golang
go.mod: Bump prometheus/client_golang to v1.12.1
2022-04-07 09:36:36 +02:00
ahrtr
4033f5c2b9 move the consistentIdx and consistentTerm from Etcdserver to cindex package
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
2022-04-07 15:16:49 +08:00
Piotr Tabor
47bb48dfdc
Merge pull request #13869 from wking/test-fail-exit-code-with-log-redirect-upstream
Makefile: Drop log tee calls
2022-04-07 09:12:19 +02:00
ahrtr
e155e50886 rename LockWithoutHook to LockOutsideApply and add LockInsideApply 2022-04-07 05:35:13 +08:00
ahrtr
47038593e9 set the consistent_index directly when applyV3 isn't performed 2022-04-07 05:35:13 +08:00
ahrtr
7ac995cdde enhanced authBackend to support authReadTx 2022-04-07 05:35:13 +08:00
ahrtr
a4c5da844d added detailed comment to explain the difference between Lock and LockWithoutHook 2022-04-07 05:35:13 +08:00
ahrtr
bfd5170f66 add a txPostLockHook into the backend
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.

In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
2022-04-07 05:35:13 +08:00
Manuel Rüger
f0f77fc14e go.mod: Bump prometheus/client_golang to v1.12.1
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-04-06 19:03:24 +02:00
Marek Siarkowicz
3ffa253516 tests: Add tests for snapshot compatibility and recovery between versions 2022-04-06 16:10:38 +02:00
Marek Siarkowicz
c4d055fe7b
Merge pull request #13819 from endocrimes/dani/auth_test.go
migrate e2e/users tests to common framework
2022-04-06 16:02:46 +02:00
Piotr Tabor
d24ef3ac20
Merge pull request #13893 from ls-2018/todo
fix unexpose todo
2022-04-06 14:31:26 +02:00
ls-2018
5b84b30fce fix unexpose todo
Signed-off-by: ls-2018 <acejilam@gmail.com>
2022-04-06 17:38:46 +08:00
Piotr Tabor
047e61df7a
Merge pull request #13880 from ahrtr/fix_dump_logs_panic
etcd-dump-logs will panic if there is no WAL entry after the snapshot
2022-04-06 09:25:17 +02:00
Marek Siarkowicz
ad03f2076a
Merge pull request #13886 from serathius/backend-logger
tests: Pass logger to backend
2022-04-05 16:35:07 +02:00
Marek Siarkowicz
ae57fe5d30
Merge pull request #13885 from serathius/verify
server: Add verification of whether lock was called within out outside of apply
2022-04-05 16:22:52 +02:00
Marek Siarkowicz
73fc864247 tests: Pass logger to backend 2022-04-05 15:53:38 +02:00
Marek Siarkowicz
1d3517020b server: Add verification of whether lock was called within out outside of apply 2022-04-05 15:34:45 +02:00
Marek Siarkowicz
8d8271f6d1
Merge pull request #13175 from karuppiah7890/issue-13167-measure-flakyness
scripts: add script to measure percentage of commits with failed status
2022-04-05 15:25:47 +02:00
Marek Siarkowicz
a08d479463
Merge pull request #13868 from endocrimes/dani/leasefix
tests/common/lease: Wait for correct lease list response
2022-04-04 17:51:57 +02:00
Danielle Lancashire
f71196d113 tests/common/lease: Wait for correct lease list response
We don't consistently reach the same etcd server during the lifetime of
a test and in some cases, this means that this test will flake if an
etcd server was slow to update its state and the test hits the outdated
server.

Here we switch to using an `Eventually` case which will wait upto a
second for the expected result before failing - with a 10ms gap between
invocations.

```
[tests(dani/leasefix)] $ gotestsum -- ./common -tags integration -count 100 -timeout 15m -run TestLeaseGrantAndList
✓  common (2m26.71s)

DONE 1600 tests in 147.258s
```
2022-04-04 15:43:17 +02:00
Piotr Tabor
6c974a3e31
Merge pull request #13867 from serathius/logs-test
tests: Use zaptest.NewLogger in tests
2022-04-04 14:47:04 +02:00
Piotr Tabor
5b84d3934e
Merge pull request #13876 from ptabor/20220403-integration-test-fixes
Integration tests flake fixes
2022-04-04 14:46:29 +02:00
Marek Siarkowicz
9dc8bbb7cf
Merge pull request #13875 from ahrtr/be_race
fix WARNING: DATA RACE issue when multiple goroutines access the backend
2022-04-04 13:31:19 +02:00
Marek Siarkowicz
804fddf921 tests: Use zaptest.NewLogger in tests 2022-04-04 13:03:15 +02:00
ahrtr
543c87cc38 etcd-dump-logs will panic if there is no WAL entry after the snapshot 2022-04-04 18:58:18 +08:00
Piotr Tabor
d4dcd3061d Fix flakes in TestV3LeaseCheckpoint/Checkpointing_disabled,_lease_TTL_is_reset
I think strong (not-equal) relationship was too restrictive when expressed with 1s granularity.

```
        logger.go:130: 2022-04-03T22:15:15.242+0200	WARN	m1	leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk	{"member": "m1", "to": "cb785755eb80ac1", "heartbeat-interval": "10ms", "expected-duration": "20ms", "exceeded-duration": "24.666613ms"}
        logger.go:130: 2022-04-03T22:15:15.262+0200	INFO	m-1	published local member to cluster through raft	{"member": "m-1", "local-member-id": "e2dd9f523aa7be87", "local-member-attributes": "{Name:m-1 ClientURLs:[unix://127.0.0.1:2196386040]}", "cluster-id": "b4b8e7e41c23c8b5", "publish-timeout": "5.2s"}
        v3_lease_test.go:415: Expected lease ttl (4m58s) to be greather than (4m58s)
```
2022-04-03 23:13:01 +02:00
Piotr Tabor
90796720c1 Reduce integration test parallelism to 2 packages at once.
Especially with 'race' detection, running O(cpu) integrational tests was causing CPU overloads and timeouts.
2022-04-03 14:48:36 +02:00
Piotr Tabor
ed1bc447c7 Flakes: Additional logging and timeouts to understand common flakes. 2022-04-03 14:48:36 +02:00