election runner can deadlock in atomic release().
suppose election runner has two clients A and B.
if A is a leader and B is a follower, B obtains lock
for release() and waits for A to close(nextc) which signal
next round is ready. However, A can only close(nextc) if it
obtains lock for release(); hence deadlock.
this pr removes atomicity of validate() and release() in global.go
and gives the responsibility of locking to each runner.
FIXES#7891
In the case that follower recieves a snapshot from leader
and crashes before renaming xxx.snap.db to db but after
snapshot has persisted to .wal and .snap, restarting
follower results loading old db, new .wal, and new .snap.
This will causes a index mismatch between snap metadata index
and consistent index from db.
This pr forces an ordering where saving/renaming db must
happen after snapshot is persisted to wal and snap file.
this guarantees wal and snap files are newer than db.
on server restart, etcd server checks if snap index > db consistent index.
if yes, etcd server attempts to load xxx.snap.db where xxx=snap index
if there is any and panic other wise.
FIXES#7628
Trying to decouple the v2 client from SRV code. Can't move
into discovery/ since that creates a circular dependency. So,
give up and move all the SRV code into a new package.
This test verifies that adding a node does not cause the leader to step
down until at least one full ElectionTick cycle elapses.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
500ms keepalive delay on proxy side causes client to sometimes send
a second keepalive since it waits more than 500ms for the first response.
Fixes#7658
Fix https://github.com/coreos/etcd/issues/7865.
It is also possible to have mis-matched key file
while renaming directories.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Connection pausing added another exit condition in the listener
path, causing the bridge to leak connections instead of closing
them when signalled to close. Also adds some additional Close
paranoia.
Fixes#7823
If the balancer update notification loop starts with a downed
connection and endpoints are switched while the old connection is up,
the balancer can potentially wait forever for an up connection without
refreshing the connections to reflect the current endpoints.
Instead, fetch upc/downc together, only caring about a single transition
either from down->up or up->down for each iteration
Simple way to reproduce failures: add time.Sleep(time.Second) to the
beginning of the update notification loop.