Instead of unconditionally randomizing, extend leases on promotion
if too many leases expire within the same time span. If the server
has few leases or spread out expires, there will be no extension.
Squashed previous commits for https://github.com/coreos/etcd/pull/8149.
Author: Anthony Romano <anthony.romano@coreos.com>
This is a combination of 4 commits below:
lease: randomize expiry on initial refresh call
Randomize the very first expiry on lease recovery
to prevent recovered leases from expiring all at
the same time.
Address https://github.com/coreos/etcd/issues/8096.
integration: remove lease exist checking on randomized expiry
Lease with TTL 5 should be renewed with randomization,
thus it's still possible to exist after 3 seconds.
lessor: extend leases on promote if expires will be rate limited
Instead of unconditionally randomizing, extend leases on promotion
if too many leases expire within the same time span. If the server
has few leases or spread out expires, there will be no extension.
Revert "integration: remove lease exist checking on randomized expiry"
This reverts commit 95bc33f37f7c31a4cd06287d44879a60baaee40c. The new
lease extension algorithm should pass this test.
Revoke expects the BatchTx lock to be held when holding the TxnDeleter
because it updates the lease bucket. The tests don't hold the lock so
it may race with the backend commit loop.
Fixes#7662
Demote was racing on expiry when LeaseTimeToLive called Remaining. Replace
with intrinsics since the ordering isn't important, but torn writes are
bad.
suppose a lease granting request from a follower goes through and followed by a lease look up or renewal, the leader might not apply the lease grant request locally. So the leader might not find the lease from the lease look up or renewal request which will result lease not found error. To fix this issue, we force the leader to apply its pending commited index before looking up lease.
FIX#6978
Would retry a few times before returning a not primary error that
the client should never see. Instead, use proper timeouts and
then return a request timeout error on failure.
Fixes#6922
The new leader needs to refresh with an extened TTL to gracefully handle
the potential concurrent leader issue. Clients might still send keep alive
to old leader until the old leader itself gives up leadership at most after
an election timeout.