If we promote the lessor before finish applying all
entries from the last term, we might incorrectly renew
the already revoked leases.
Here is an example:
- Term 1: revoke lease A accepted by raft
- Old leader failed, new election happened
- Term 2: promote
- Term 2: keep alive A succeed. A now has 10 seconds TTL
- Term 2: revoke lease A from Term 1 got committed and applied
- Term 2: the lease A with 10 seconds TTL is revoked
To solve this, the new leader MUST apply all entries from old term
before promote its lessor to start accept renew requests.
The new leader needs to refresh with an extened TTL to gracefully handle
the potential concurrent leader issue. Clients might still send keep alive
to old leader until the old leader itself gives up leadership at most after
an election timeout.
The raft loop would block on the applier's done channel after
persisting the raft messages; the latency could cause dropped network
messages. Instead, asynchronously notify the applier with a buffered
channel when the raft writes complete.
This fixes the problem that proposal cannot be applied.
When start the etcdserver.run loop, it expects to get the latest
existing snapshot. It should not attempt to request one because the loop
is the entity to create the snapshot.
snapshotStore is the store of snapshot, and it supports to get latest snapshot
and save incoming snapshot.
raftStorage supports to get latest snapshot when v3demo is open.
Before this PR, the timeout caused by leader election returns:
```
14:45:37 etcd2 | 2015-08-12 14:45:37.786349 E | etcdhttp: got unexpected
response error (etcdserver: request timed out)
```
After this PR:
```
15:52:54 etcd1 | 2015-08-12 15:52:54.389523 E | etcdhttp: etcdserver:
request timed out, possibly due to leader down
```
It uses heartbeat interval and election timeout to estimate the
commit timeout for internal requests.
This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
Follow the simple rule in the atomic package:
"On both ARM and x86-32, it is the caller's responsibility to arrange
for 64-bit alignment of 64-bit words accessed atomically. The first word
in a global variable or in an allocated struct or slice can be relied
upon to be 64-bit aligned."
Tested on a system with /proc/cpuinfo reporting:
processor : 0
model name : ARMv7 Processor rev 1 (v7l)
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc0d
CPU revision : 1
The behavior accelarates the happen of the first-time leader election,
so the cluster could elect its leader fast. Technically, it could
help to reduce `electionMs - heartbeatMs` wait time for the first leader election.
Main usage:
1. Quick start for the local cluster when setting a little longer
election timeout
2. Quick start for the global cluster, which sets election timeout to
its maximum 50s.
When it waits for apply to be done, it should stop the loop if it
receives stop signal.
This helps to print out panic information. Before this PR, if the panic
happens when server loop is applying entries, server loop will wait for
raft loop to stop forever.
After this PR, only cluster's interface Cluster is exposed, which makes
code much cleaner. And it avoids external packages to rely on cluster
struct in the future.
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.
There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.