The size of MsgSnap may be very big, e.g., 1G.
If its size is big and general streaming is used to send it, it may block
the following messages for several ten seconds, which interrupts the
heartbeat heavily.
Only use pipeline to send MsgSnap.
Do not aggressively compact raft log entries. After a snapshot,
etcd server can compact the raft log upto snapshot index. etcd server
compacts to an index smaller than snapshot to keep some entries in memory.
The leader can still read out the in memory entries to send to a slightly
slow follower. If all the entries are compacted, the leader will send the
whole snapshot or read entries from disk if possible.
Encoding store into json snapshot has quite high CPU cost. And it
will block for a while. This commit makes the encoding process non-
blocking by running it in another go-routine.
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
New rafthttp uses /raft/stream/msgapp for MsgApp stream, but v2.0 rafthttp
cannot understand it. Use the old endpoint /raft/stream instead for backward
compatibility, and plan to move to new endpoint in the version after the
next one.
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance
WAL should control the cut logic itself. We want to do falloc to
per allocate the space for a segmented wal file at the beginning
and cut it when it size reaches the limit.