122 Commits

Author SHA1 Message Date
Chao Chen
6cdc9ae4fe server/etcdserver/raft.go:
1. rename confChangeCh to raftAdvancedC
2. rename waitApply to confChanged
3. add comments and test assertion

Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-26 22:42:44 -07:00
Benjamin Wang
ad3b6ee4c6 etcdserver: wait for raft is notified on confChange before responding to client
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-06-26 13:40:51 -07:00
Geeta Gharpure
550aa152a7 Verify consistent index is latest at the time of snapshot
Signed-off-by: Geeta Gharpure <geetagh@amazon.com>
2023-06-19 16:00:04 +00:00
Chao Chen
f31d0eafb9 tests/e2e: add graceful shutdown test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-09 17:08:53 -07:00
Chao Chen
caed563e08 fix flaking auth member remove test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-04-03 17:41:08 -07:00
Wei Fu
22bdc91302 server/etcdserver: add log for terminating monitors
Adding log for terminating monitors is to make the debug easier.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-11 15:07:17 +08:00
James Blair
275e10bcf7
Return default snapshot count to 10,000.
The huge (100k+) value was justified when storev2 was being dumped completely with every snapshot.

With storev2 being decomissioned we can checkpoint more frequently for faster recovery.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-03-06 20:21:03 +13:00
guozhao
de8d6b3792 etcdserver: use time.Ticker instead of time.After
Using time.After will create a new Timer in each cycle, In these cases
, it is better to use time.Ticker.

Signed-off-by: guozhao <guozhao@360.cn>
2023-01-17 16:58:13 +08:00
Benjamin Wang
8ed20e85d2 etcdserver: return membership.ErrIDNotFound when the memberID not found
When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-01-17 06:18:15 +08:00
Piotr Tabor
6f899a7b40
Merge pull request #15052 from ptabor/20221228-goimports-fix
./scripts/fix.sh: Takes care of goimports across the whole project.
2022-12-29 11:31:22 +01:00
Piotr Tabor
9e1abbab6e Fix goimports in all existing files. Execution of ./scripts/fix.sh
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 09:41:31 +01:00
KiloG
101a2a61ea
etcdserver: fix typo in comment
etcdserver: fix typo in comment
2022-12-28 18:41:08 +08:00
Benjamin Wang
faff80a2b3 etcdserve: format the source code
gofmt -w ./server

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-02 13:00:59 +08:00
Benjamin Wang
e9aa275b36 etcdserver: update etcdserver to use the new raft module go.etcd.io/raft/v3
Just replaced all go.etcd.io/etcd/raft/v3 with go.etcd.io/raft/v3
under directory server.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-02 09:33:45 +08:00
Marek Siarkowicz
2b178fdd96 server: Handle cluster version equal downgrade version
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-10-17 12:05:57 +02:00
Benjamin Wang
cc840336f0 move consistent_index forward when executing alarmList operation
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,

```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl  alarm list; done
kill -9 <etcd_pid>
etcd
```

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-05 10:05:55 +08:00
Marek Siarkowicz
d44bbff278 server: Make corrtuption check optional and period configurable
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 09:31:15 +02:00
Marek Siarkowicz
6697fca97d server: Implement compaction hash checking
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 09:31:14 +02:00
Marek Siarkowicz
c58ec9fe13 server: Refactor compaction checker
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-25 13:59:30 +02:00
Tsonglew
e5a80f5049
fix: typo gouroutine
fix: typo gouroutine
2022-06-16 16:35:06 +08:00
SimFG
d83925e357 schedule: Provide logs when the fifo job panic happens
To make the fifo scheduler better debuggability.

Signed-off-by: SimFG <1142838399@qq.com>
2022-06-15 20:58:17 +08:00
Marek Siarkowicz
7c35dadc25 server: Extract corruption detection to dedicated struct
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-06-13 18:19:24 +02:00
ahrtr
25deb436af fix the race condition between goroutine and channel on the same leases to be revoked 2022-05-25 16:44:41 +08:00
Piotr Tabor
85b18c9b3e Rename WrapApply to Apply. 2022-05-20 14:32:04 +02:00
Piotr Tabor
0da0cf4795 expose UberApplier as interface (not as implementation struct). 2022-05-20 14:32:04 +02:00
Piotr Tabor
5097b33ab9 Rename etcdserver/etcderrors package to etcdserver/errors. 2022-05-20 14:32:04 +02:00
Piotr Tabor
63b2f63cc1 Rename package alising "apply2" -> apply. 2022-05-20 14:32:04 +02:00
Piotr Tabor
47a771871b Move apply to its own package (no dependency on etcdserver). 2022-05-20 14:32:04 +02:00
Piotr Tabor
fc6a6c3c27 Move etcdserver/errors.go to sepatate package to avoid cyclic dependencies. 2022-05-20 14:32:04 +02:00
Piotr Tabor
b073129d03 Applier does not depend on EtcdServer any longer.
All the depencies are explicily passed to the UberApplier factory method.
2022-05-20 14:32:04 +02:00
Piotr Tabor
651de5a057 Rename EtcdServer.Id with EtcdServer.MemberId.
It was misleading and error prone vs. ClusterId.
2022-05-20 14:32:04 +02:00
Piotr Tabor
b7ad746bfe Encapsulating applier logic: UberApplier coordinates all appliers for server
This PR:
 - moves wrapping of appliers (due to Alarms) out of server.go into uber_applier.go
 - clearly devides the application logic into: chain of:
     a) 'WrapApply' (generic logic across all the methods)
     b) dispatcher (translation of Apply into specific method like 'Put')
     c) chain of 'wrappers' of the specific methods (like Put).
 - when we do recovery (restore from snapshot) we create new instance of appliers.

The purpose is to make sure we control all the depencies of the apply process, i.e.
we can supply e.g. special instance of 'backend' to the application logic.
2022-05-20 14:32:04 +02:00
Piotr Tabor
cdf9869d70 Encapsulation of applier logic: Move Txn related code out of applier.go.
The PR removes calls to applierV3base logic from server.go that is NOT part of 'application'.
The original idea was that read-only transaction and Range call shared logic with Apply,
so they can call appliers directly (but bypassing all 'corrupt', 'quota' and 'auth' wrappers).

This PR moves all the logic to a separate file (that later can become package on its own).
2022-05-20 14:32:04 +02:00
ahrtr
e7f8bf7c44 enhance the /version endpoint to add storageVersion 2022-05-06 20:29:42 +08:00
Marek Siarkowicz
f09da32f9d
Merge pull request #13655 from serathius/health
Cleanup healthcheck code after V2 removal
2022-05-06 12:08:36 +02:00
Marek Siarkowicz
600ee13ac0 server: Cover V3 health with tests 2022-05-05 09:52:14 +02:00
Marek Siarkowicz
0096d2ecdb server: Remove unused NewClientHandler 2022-05-05 09:52:13 +02:00
ahrtr
fb2eeb9027 verify consistent_index in snapshot must be equal to the snapshot index
Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
2022-05-03 20:02:47 +08:00
ahrtr
6eef7ede40 Update conssitent_index when applying fails
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
2022-04-20 21:44:48 +08:00
ahrtr
484d2f01f3 set backend to cindex before recovering the lessor in applySnapshot 2022-04-12 10:36:29 +08:00
ahrtr
4033f5c2b9 move the consistentIdx and consistentTerm from Etcdserver to cindex package
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
2022-04-07 15:16:49 +08:00
ahrtr
e155e50886 rename LockWithoutHook to LockOutsideApply and add LockInsideApply 2022-04-07 05:35:13 +08:00
ahrtr
47038593e9 set the consistent_index directly when applyV3 isn't performed 2022-04-07 05:35:13 +08:00
ahrtr
7ac995cdde enhanced authBackend to support authReadTx 2022-04-07 05:35:13 +08:00
ahrtr
bfd5170f66 add a txPostLockHook into the backend
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.

In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
2022-04-07 05:35:13 +08:00
ahrtr
836bd6bc3a fix WARNING: DATA RACE issue when multiple goroutines access the backend concurrently 2022-04-03 06:13:09 +08:00
ahrtr
edce939f6e add one more field storageVersion into StatusResponse
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.
2022-03-18 07:04:44 +08:00
Marek Siarkowicz
a0f26ff4ea server: Snapshot after cluster version downgrade 2022-02-21 15:48:00 +01:00
Marek Siarkowicz
8c91d60a6f server: Switch to publishV3 2022-02-14 23:06:45 +01:00
AdamKorcz
0df768d2b1 server/etcdserver: fix oss-fuzz issue 42181 2022-02-14 10:59:41 +00:00