Do not restart the killed member immediately.
The member will advance its election timeout after restart
So it will have a better chance to become the leader again.
When raft broadcasts a Grant to all nodes, all nodes must
agree on the same lease ID. Otherwise, attaching a key to
a lease will fail since the lease ID is node-dependent.
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.
It uses heartbeat interval and election timeout to estimate the
expected request timeout.
This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
The bug happens when restarted member wants to listen on its original
port, but finds out that it has been occupied by some client.
Use well-known port instead of ephemeral port, so client cannot occupy
the listen port anymore.
There exist the possiblity to update attributes of removed member in
reasonable workflow:
1. start member A
2. leader receives the proposal to remove member A
2. member A sends the proposal of update its attribute to the leader
3. leader commits the two proposals
So etcdserver should allow to update attributes of removed member.
After this PR, only cluster's interface Cluster is exposed, which makes
code much cleaner. And it avoids external packages to rely on cluster
struct in the future.
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.
There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
1. Persist the cluster version change through raft. When the member is restarted, it can recover
the previous known decided cluster version.
2. When there is a new leader, it is forced to do a version checking immediately. This helps to
update the first cluster version fast.
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance