As @jonboulle pointed out at
https://github.com/coreos/etcd/pull/4070/files#r48441847:
> math/rand is unrelated to crypto/rand; the latter reads from /dev/urandom and
> is relying on the kernel's PRNG. Just remove the seed entirely.
Current etcd client library chooses a default destination node from
every member of a cluster in a random manner. However, requests of
write and read (for consistent results) need to be forwarded to the
leader node as the nature of Raft algorithm. If the chosen node is a
follower, additional network traffic will be caused by the forwarding
from follower to leader.
Mainly for reducing the forward traffic, this commit adds a new
mechanism for various endpoint selection mode to the client library
which can be configured with client.Config.SelectionMode.
Currently, two modes are provided:
- EndpointSelectionRandom: default, same to existing behavior (pick
a node in a random manner)
- EndpointSelectionPrioritizeLeader: prioritize leader, for the above
purpose
I evaluated the effectiveness of the EndpointSelectionPrioritizeLeader
with 4 t1.micro instances of AWS (3 nodes for etcd cluster and 1 node
for etcd client). Client executes this simple benchmark
(https://github.com/mitake/etcd-things/tree/master/prioritize-leader-bench),
just writes 10000 keys. When SelectionMode == EndpointSelectionRandom
(default), the benchmark needed 1 min and 32.102 sec to finish. When
SelectionMode == EndpointSelectionPrioritizeLeader, the benchmark
needed 1 min 4.760 sec.
Reports depended on writing all results to a large buffered channel and
reading from that synchronously. Similarly, requests were buffered the
same way which can take significant memory on big request strings. Instead,
have reports stream in results as they're produced then print when the
results channel closes.
Database snapshot can be as large as 5GB. It is reasonable
to log before receiving it. Or the user might not know what
is happening and why etcd starts to use IO intensively.
The raft loop would block on the applier's done channel after
persisting the raft messages; the latency could cause dropped network
messages. Instead, asynchronously notify the applier with a buffered
channel when the raft writes complete.
raft's applyc writes block on the server loop's database IO since
the next applyc read must wait on the db operation to finish.
Instead, stream applyc to a run queue outside the server loop.
etcd might generate incomplete proxy config file after a power failure.
It is because we use ioutil.WriteFile. And iotuile.WriteFile does
not call Sync before closing the file.