Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.
In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Storage version should follow cluster version. During upgrades this
should be immidiate as storage version can be always upgraded as storage
is backward compatible. During downgrades it will be delayed and will
require time for incompatible changes to be snapshotted.
As storage version change can happen long after cluster is running, we
need to add a step during bootstrap to validate if loaded data can be
understood by migrator.