371 Commits

Author SHA1 Message Date
Marek Siarkowicz
600ee13ac0 server: Cover V3 health with tests 2022-05-05 09:52:14 +02:00
Marek Siarkowicz
e9dec74ded server: Refactor health checks 2022-05-05 09:52:14 +02:00
Marek Siarkowicz
191aed645e server: Run health check tests in subtests 2022-05-05 09:52:14 +02:00
Marek Siarkowicz
e4e391792a server: Rename test case expect fields 2022-05-05 09:52:13 +02:00
Marek Siarkowicz
0fb194d6f2 server: Use named struct initialization in healthcheck test 2022-05-05 09:52:13 +02:00
Marek Siarkowicz
0096d2ecdb server: Remove unused NewClientHandler 2022-05-05 09:52:13 +02:00
ahrtr
fb2eeb9027 verify consistent_index in snapshot must be equal to the snapshot index
Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
2022-05-03 20:02:47 +08:00
Piotr Tabor
887f95d0d3
Merge pull request #13963 from ptabor/20220412-verify-assert
Add verification consistent index is (nearly) never decreasing
2022-04-25 10:13:44 +02:00
Piotr Tabor
d69e07dd3a Verification framework and check whether cindex is not decreasing. 2022-04-22 12:32:05 +02:00
ahrtr
6eef7ede40 Update conssitent_index when applying fails
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
2022-04-20 21:44:48 +08:00
ahrtr
484d2f01f3 set backend to cindex before recovering the lessor in applySnapshot 2022-04-12 10:36:29 +08:00
ahrtr
1b3d6cb0c8 set an separate applyTimeout for the waitAppliedIndex 2022-04-10 14:44:55 +08:00
ahrtr
fe3a57976e support linearizable renew lease
When etcdserver receives a LeaseRenew request, it may be still in
progress of processing the LeaseGrantRequest on exact the same
leaseID. Accordingly it may return a TTL=0 to client due to the
leaseID not found error. So the leader should wait for the appliedID
to be available before processing client requests.
2022-04-10 14:44:55 +08:00
Donal Hunt
d659403955
Update server/etcdserver/api/v3rpc/maintenance.go
Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
2022-04-08 18:45:13 +01:00
Donal Hunt
4c8ef011e0 *: drop use of humanize.Time() in favour of time.Duration.String()
humanize.Time() drops precision resulting in some events reporting they took
"now" time to complete. Using time.Duration.String() results in accurate
duration being reported.

Fixes #13905
2022-04-07 23:24:35 +01:00
Marek Siarkowicz
1ea53d527e server: Save consistency index and term to backend even when they decrease
Reason to store CI and term in backend was to make db fully independent
snapshot, it was never meant to interfere with apply logic. Skip of CI
was introduced for v2->v3 migration where we wanted to prevent it from
decreasing when replaying wal in
https://github.com/etcd-io/etcd/pull/5391. By mistake it was added to
apply flow during refactor in
https://github.com/etcd-io/etcd/pull/12855#commitcomment-70713670.

Consistency index and term should only be negotiated and used by raft to make
decisions. Their values should only driven by raft state machine and
backend should only be responsible for storing them.
2022-04-07 19:00:03 +02:00
ahrtr
4033f5c2b9 move the consistentIdx and consistentTerm from Etcdserver to cindex package
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
2022-04-07 15:16:49 +08:00
ahrtr
e155e50886 rename LockWithoutHook to LockOutsideApply and add LockInsideApply 2022-04-07 05:35:13 +08:00
ahrtr
47038593e9 set the consistent_index directly when applyV3 isn't performed 2022-04-07 05:35:13 +08:00
ahrtr
7ac995cdde enhanced authBackend to support authReadTx 2022-04-07 05:35:13 +08:00
ahrtr
bfd5170f66 add a txPostLockHook into the backend
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.

In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
2022-04-07 05:35:13 +08:00
Marek Siarkowicz
73fc864247 tests: Pass logger to backend 2022-04-05 15:53:38 +02:00
Piotr Tabor
6c974a3e31
Merge pull request #13867 from serathius/logs-test
tests: Use zaptest.NewLogger in tests
2022-04-04 14:47:04 +02:00
Marek Siarkowicz
804fddf921 tests: Use zaptest.NewLogger in tests 2022-04-04 13:03:15 +02:00
ahrtr
836bd6bc3a fix WARNING: DATA RACE issue when multiple goroutines access the backend concurrently 2022-04-03 06:13:09 +08:00
qsyqian
2ed87b9f2f skip compact when rev not change at period compact mode 2022-03-24 20:59:25 +08:00
ahrtr
f978da4f4f move the newClientCfg into clientv3 package so as to be reused by both etcdctl and v3discovery 2022-03-24 06:18:25 +08:00
ahrtr
edce939f6e add one more field storageVersion into StatusResponse
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.
2022-03-18 07:04:44 +08:00
ahrtr
1a3822f2c3 Rename ClientConfig to ConfigSpec
The ClientConfig is a fully declarive configuration, so it makes more
sense to rename it to ConfigSpec. It can also mitigate the confusion
between Config and ClientConfig.
2022-03-13 05:41:49 +08:00
ahrtr
3dcbbf62d9 Move clientconfig into clientv3 so that it can be reused by both etcdctl and v3 discovery 2022-03-12 06:38:41 +08:00
ahrtr
1ae5aa52de fix some typos related to downgrade 2022-03-09 16:07:18 +08:00
Marek Siarkowicz
d2e5a8cb5d
Merge pull request #13750 from kkkkun/add-timeout
add timeout for http client
2022-03-02 10:15:44 +01:00
Marek Siarkowicz
1406a9919c
Merge pull request #13700 from AdamKorcz/fuzz8
server/etcdserver: fix oss-fuzz issue
2022-03-01 10:48:29 +01:00
kkkkun
59f7764772 add timeout for http client 2022-03-01 11:11:09 +08:00
ahrtr
2f36e0c62b Change discovery url to endpoints
Currently the discovery url is just one endpoint. But actually it
should be the same as the etcdctl, which means that it should be
a list of endpoints. When one endpoint is down, the clientv3 can
fail over to the next endpoint automatically.
2022-02-24 09:11:41 +08:00
Marek Siarkowicz
6af760131e
Merge pull request #13687 from serathius/etcdctl
Add downgrade commands
2022-02-22 17:12:23 +01:00
Marek Siarkowicz
42faf9fe06 etcdctl: Use minor versions for downgrade 2022-02-22 16:30:08 +01:00
Marek Siarkowicz
b5e224db7d
Merge pull request #13635 from ahrtr/v3_discovery
support v3 discovery to bootstrap a new etcd cluster
2022-02-21 21:50:40 +01:00
Piotr Tabor
f80f477073
Merge pull request #13644 from Juneezee/refactor/t.TempDir
*: use `T.TempDir` to create temporary test directory
2022-02-21 19:52:37 +01:00
ahrtr
ebc86d12c0 support v3 discovery to bootstrap a new etcd cluster 2022-02-21 23:22:49 +08:00
Marek Siarkowicz
a0f26ff4ea server: Snapshot after cluster version downgrade 2022-02-21 15:48:00 +01:00
Piotr Tabor
6105a6f0e8
Merge pull request #13683 from serathius/publishV3
server: Switch to publishV3
2022-02-21 14:16:22 +01:00
ahrtr
8681888012 fix typo, renamed ErrGPRCNotSupportedForLearner to ErrGRPCNotSupportedForLearner 2022-02-21 14:46:58 +08:00
Marek Siarkowicz
a63fa17b76
Merge pull request #13645 from yangxuanjia/yxjetcd_fix_panic_when_restart_after_removeMember
fix panic when restart after removeMember
2022-02-20 12:28:14 +01:00
AdamKorcz
5649cf3f1a Log and return instead of panic 2022-02-16 10:31:08 +00:00
AdamKorcz
fad82c1b6f server/etcdserver: fix oss-fuzz issue N 2022-02-15 15:32:31 +00:00
Marek Siarkowicz
8c91d60a6f server: Switch to publishV3 2022-02-14 23:06:45 +01:00
Sahdev Zala
830f00d105
Merge pull request #13695 from AdamKorcz/fuzz1
server/etcdserver: fix oss-fuzz issue 42181
2022-02-14 15:36:41 -05:00
Marek Siarkowicz
3de5e221a8 tests: Fix cluster version and downgrade request timeout
Returning nil means that raft.Trigger was not called, causing member to
wait infinitly for response for response on raft request.
2022-02-14 14:19:06 +01:00
AdamKorcz
0df768d2b1 server/etcdserver: fix oss-fuzz issue 42181 2022-02-14 10:59:41 +00:00