This commit fixes the issue of creating member dir before validating
the configuration. When member dir exists, it indicates the local etcd
process is a valid etcd member. So we should only create member dir
after we finish configuration validation, joining validation or
discovery validation.
Before this PR, the log is
```
2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or
misconfigured
```
It is quite hard for people to understand what happens.
Now we print out the exact reason for the failure, and explains the way
to handle it.
It specifies request timeout error possibly caused by connection lost,
and print out better log for user to understand.
It handles two cases:
1. the leader cannot connect to majority of cluster.
2. the connection between follower and leader is down for a while,
and it losts proposals.
log format:
```
20:04:19 etcd3 | 2015-08-25 20:04:19.368126 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
20:04:19 etcd3 | 2015-08-25 20:04:19.368227 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
```
Before this PR, the timeout caused by leader election returns:
```
14:45:37 etcd2 | 2015-08-12 14:45:37.786349 E | etcdhttp: got unexpected
response error (etcdserver: request timed out)
```
After this PR:
```
15:52:54 etcd1 | 2015-08-12 15:52:54.389523 E | etcdhttp: etcdserver:
request timed out, possibly due to leader down
```
It uses heartbeat interval and election timeout to estimate the
commit timeout for internal requests.
This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
Follow the simple rule in the atomic package:
"On both ARM and x86-32, it is the caller's responsibility to arrange
for 64-bit alignment of 64-bit words accessed atomically. The first word
in a global variable or in an allocated struct or slice can be relied
upon to be 64-bit aligned."
Tested on a system with /proc/cpuinfo reporting:
processor : 0
model name : ARMv7 Processor rev 1 (v7l)
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc0d
CPU revision : 1
After this PR, only cluster's interface Cluster is exposed, which makes
code much cleaner. And it avoids external packages to rely on cluster
struct in the future.
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.
There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
1. Persist the cluster version change through raft. When the member is restarted, it can recover
the previous known decided cluster version.
2. When there is a new leader, it is forced to do a version checking immediately. This helps to
update the first cluster version fast.
We store cluster related key in StoreAdminPrefix for some
historical reason. The previous API is called admin. But now,
the admin name is gone and `cluster` is a more clear and correct
name.
Cluster version is the min major.minor of all members in
the etcd cluster. Cluster version is set to the min version
that a etcd member is compatible with when first bootstrapp.
During a rolling upgrades, the cluster version will be updated
automatically.
For example:
```
Cluster [a:1, b:1 ,c:1] -> clusterVersion 1
update a -> 2, b -> 2
after a detection
Cluster [a:2, b:2 ,c:1] -> clusterVersion 1, since c is still 1
update c -> 2
after a detection
Cluster [a:2, b:2 ,c:2] -> clusterVersion 2
```
The API/raft component can utilize clusterVersion to determine if
it can accept a client request or a raft RPC.
We choose polling rather than pushing since we want to use the same
logic for cluster version detection and (TODO) cluster version checking.
Before a member actually joins a etcd cluster, it should check the version
of the cluster. Push does not work since the other members cannot push
version info to it before it actually joins. Moreover, we do not want our
raft RPC system (which is doing the heartbeat pushing) to coordinate cluster version.
The original process is stopping etcd only when pipeline message finds itself
has been removed. After this PR, stream dial has this functionality too.
It helps fast etcd stop, which doesn't need to wait for stream break to
fall back to pipeline, and wait for election timeout to send out message
to detect self removal.
Add remotes to rafthttp, who help newly joined members catch up the
progress of the cluster. It supports basic message sending to remote, and
has no stream connection for simplicity. remotes will not be used
after the latest peers have been added into rafthttp.
It is more reasonable to init the variable before passing it as an
argument.
It fixes a bug that etcdserver may panic on server stats when processing
a message from rafthttp streamReader before server stats is initialized
in server.Start().
It exposes the metrics of file descriptor limit and file descriptor used.
Moreover, it prints out warning when more than 80% of fd limit has been used.
```
2015/04/08 01:26:19 etcdserver: 80% of the file descriptor limit is open
[open = 969, limit = 1024]
```
Stop raftNode goroutine before stopping server goroutine, so
server.Stop does stop all underlying stuffs elegantly now. This fixes
the problem that previous-round lock on WAL may not be released when
etcd is restarted.