The old leader demotes lessor and all the leases' expire time will be
updated. Instead of returning incorrect remaining TTL, we should return
errors to force client retry.
Cherry-pick: d3bb6f688b4643155b4a9924cec726bdc76a1306
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Backport of PR #16822, commits f7e488dc9262685d6624755e0d3bb0a655863248,
67f17166bf2ba337dafb8e0ea8eea5f74a990767,
and f7ff898fd6c2d6dbb54278343073aa4fa5f46a03.
Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>
Signed-off-by: Ivan Valdes <ivan@vald.es>
This contains a slight refactoring to expose enough information
to write meaningful tests for auth applier v3.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Current checkpointing mechanism is buggy. New checkpoints for any lease
are scheduled only until the first leader change. Added fix for that
and a test that will check it.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels
The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Because the leases were sorted inside UnsafeLeases() the lessor mutex
ended up being locked while the whole map was sorted. This pulls the
soring outside of the lock, per feedback on
https://github.com/coreos/etcd/pull/9699
Since heap is already sorted, we can just check first element
to see if anything is expiry, rather than popping and pushing
it back. If nothing is expiry, pop operation is unnecessary.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
This adds a heap acting as a priority queue to keep track of lease
exiprations. Previously the whole lease map had to be iterated through
each time.
The queue allows us to check only those leases which might be expired.
When the expiration changes, we add an additional entry. If we check an
entry that isn't expired, it means that the lease got extended.
If we find a entry in the heap that doesn't have a corresponding entry in
the map, we know that the lease has already been expired or revoked.
The golang/time package tracks monotonic time in each time.Time
returned by time.Now function for Go 1.9.
Use time.Time to measure whether a lease is expired and remove
previous pkg/monotime. Use zero time.Time to mean forever. No
expiration when expiry.IsZero() is true.
Instead of unconditionally randomizing, extend leases on promotion
if too many leases expire within the same time span. If the server
has few leases or spread out expires, there will be no extension.
Randomize the very first expiry on lease recovery
to prevent recovered leases from expiring all at
the same time.
Address https://github.com/coreos/etcd/issues/8096.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Demote was racing on expiry when LeaseTimeToLive called Remaining. Replace
with intrinsics since the ordering isn't important, but torn writes are
bad.