We start to resolve host into tcp addrs since we generate
tcp based initial-cluster during srv discovery. However it
creates problems around tls and cluster verification. The
srv discovery only needs to use resolved the tcp addr to
find the local node. It does not have to resolve everything
and use the resolved addrs.
This fixes#2488 and #2226
If the process dies during wal.cut(), it might leave a corrupted wal
file. This commit solves the problem by creating a temp wal file first,
then atomically rename it to a wal file when we are sure it is vaild.
It exposes the metrics of file descriptor limit and file descriptor used.
Moreover, it prints out warning when more than 80% of fd limit has been used.
```
2015/04/08 01:26:19 etcdserver: 80% of the file descriptor limit is open
[open = 969, limit = 1024]
```
SaveSnap and Save are called in separate goroutines now. Allow at most
one WAL function being called at one time to protect internal fields and
guarantee execution order.
Or one possible bug is that the new cut file is started with snapshot
entry instead of crc entry.
The patch decreases the allocs when sending one AppEntry in msgappv2
stream from 30 to 9. This helps reduce CPU load when etcd is under
high write load.
msgappv2 stream is used to send all MsgApp, and replaces the
functionality of msgapp stream. Compared to v1, it has several
advantanges:
1. The output message is exactly the same with the input one, which
cannot be done in v1.
2. It uses one connection to stream persistently, which prevents message
reorder and saves the time to request stream.
3. It transmits 10 addiontional bytes in the procedure of committing one
proposal, which is trivia for idle time.
4. It transmits less bytes when committing mutliple proposals or keep
committing proposals.
etcd now compact raft storage asynchronously, and append entry to raft
storage may happen at the same time. Add the lock to fix the bug that
the entries saved in storage may be organized in a wrong way.
Stop raftNode goroutine before stopping server goroutine, so
server.Stop does stop all underlying stuffs elegantly now. This fixes
the problem that previous-round lock on WAL may not be released when
etcd is restarted.
Upgrade test listens on a fixed port, which may fail with 'bind address
already in use' if the port was just used to send tcp sockets.
The commit makes it listen on a random available port to avoid this.