1. rename confChangeCh to raftAdvancedC
2. rename waitApply to confChanged
3. add comments and test assertion
Signed-off-by: Chao Chen <chaochn@amazon.com>
The huge (100k+) value was justified when storev2 was being dumped completely with every snapshot.
With storev2 being decomissioned we can checkpoint more frequently for faster recovery.
Signed-off-by: James Blair <mail@jamesblair.net>
When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,
```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl alarm list; done
kill -9 <etcd_pid>
etcd
```
Signed-off-by: Benjamin Wang <wachao@vmware.com>
This PR:
- moves wrapping of appliers (due to Alarms) out of server.go into uber_applier.go
- clearly devides the application logic into: chain of:
a) 'WrapApply' (generic logic across all the methods)
b) dispatcher (translation of Apply into specific method like 'Put')
c) chain of 'wrappers' of the specific methods (like Put).
- when we do recovery (restore from snapshot) we create new instance of appliers.
The purpose is to make sure we control all the depencies of the apply process, i.e.
we can supply e.g. special instance of 'backend' to the application logic.
The PR removes calls to applierV3base logic from server.go that is NOT part of 'application'.
The original idea was that read-only transaction and Range call shared logic with Apply,
so they can call appliers directly (but bypassing all 'corrupt', 'quota' and 'auth' wrappers).
This PR moves all the logic to a separate file (that later can become package on its own).
Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.
In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.