This commit fixes the issue of creating member dir before validating
the configuration. When member dir exists, it indicates the local etcd
process is a valid etcd member. So we should only create member dir
after we finish configuration validation, joining validation or
discovery validation.
The raft loop would block on the applier's done channel after
persisting the raft messages; the latency could cause dropped network
messages. Instead, asynchronously notify the applier with a buffered
channel when the raft writes complete.
raft's applyc writes block on the server loop's database IO since
the next applyc read must wait on the db operation to finish.
Instead, stream applyc to a run queue outside the server loop.
This moves the code to create listener and roundTripper for raft communication
to the same place, and use explicit functions to build them. This prevents
possible development errors in the future.
This pairs with remote timeout listeners.
etcd uses timeout listener, and times out the accepted connections
if there is no activity. So the idle connections may time out easily.
Becaus timeout transport doesn't reuse connections, it prevents using
timeouted connection.
This fixes the problem that etcd fail to get version of peers.
When a slow follower receives the snapshot sent from the leader, it
should rename the snapshot file to the default KV file path, and
restore KV snapshot.
Have tested it manually and it works pretty well.
When snapshot store requests raft snapshot from etcdserver apply loop,
it may block on the channel for some time, or wait some time for KV to
snapshot. This is unexpected because raft state machine should be unblocked.
Even worse, this block may lead to deadlock:
1. raft state machine waits on getting snapshot from raft memory storage
2. raft memory storage waits snapshot store to get snapshot
3. snapshot store requests raft snapshot from apply loop
4. apply loop is applying entries, and waits raftNode loop to finish
messages sending
5. raftNode loop waits peer loop in Transport to send out messages
6. peer loop in Transport waits for raft state machine to process message
Fix it by changing the logic of getSnap to be asynchronously creation.
Use snapshotSender to send v3 snapshot message. It puts raft snapshot
message and v3 snapshot into request body, then sends it to the target peer.
When it receives http.StatusNoContent, it knows the message has been
received and processed successfully.
As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot
message, then respond with http.StatusNoContent.
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.
Before this PR, it always prints nil because cluster info has not been
covered when print:
```
2015-10-02 14:00:24.353631 I | etcdserver: loaded cluster information
from store: <nil>
```
snapshotStore is the store of snapshot, and it supports to get latest snapshot
and save incoming snapshot.
raftStorage supports to get latest snapshot when v3demo is open.
Before this PR, the log is
```
2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or
misconfigured
```
It is quite hard for people to understand what happens.
Now we print out the exact reason for the failure, and explains the way
to handle it.
Like the commit 6974fc63ed87, this commit lets etcdserver forbid
removing started member if quorum cannot be preserved after
reconfiguration if the option -strict-reconfig-check is passed to
etcd. The removal can cause deadlock if unstarted members have wrong
peer URLs.