The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.
There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
1. Persist the cluster version change through raft. When the member is restarted, it can recover
the previous known decided cluster version.
2. When there is a new leader, it is forced to do a version checking immediately. This helps to
update the first cluster version fast.
We store cluster related key in StoreAdminPrefix for some
historical reason. The previous API is called admin. But now,
the admin name is gone and `cluster` is a more clear and correct
name.
Cluster version is the min major.minor of all members in
the etcd cluster. Cluster version is set to the min version
that a etcd member is compatible with when first bootstrapp.
During a rolling upgrades, the cluster version will be updated
automatically.
For example:
```
Cluster [a:1, b:1 ,c:1] -> clusterVersion 1
update a -> 2, b -> 2
after a detection
Cluster [a:2, b:2 ,c:1] -> clusterVersion 1, since c is still 1
update c -> 2
after a detection
Cluster [a:2, b:2 ,c:2] -> clusterVersion 2
```
The API/raft component can utilize clusterVersion to determine if
it can accept a client request or a raft RPC.
We choose polling rather than pushing since we want to use the same
logic for cluster version detection and (TODO) cluster version checking.
Before a member actually joins a etcd cluster, it should check the version
of the cluster. Push does not work since the other members cannot push
version info to it before it actually joins. Moreover, we do not want our
raft RPC system (which is doing the heartbeat pushing) to coordinate cluster version.
Before this PR, people can set listen-client-urls without setting
advertise-client-urls, and leaves advertise-client-urls as default
localhost value. The client libraries which sync the cluster info
fetch wrong advertise-client-urls and cannot connect to the cluster.
This PR avoids this case and provides better UX.
On the other hand, this change is safe because people always want to set
advertise-client-urls if listen-client-urls is set. The default localhost
advertise url cannot be accessed from the outside, and should always be
set except that etcd is bootstrapped with no flag.
The original process is stopping etcd only when pipeline message finds itself
has been removed. After this PR, stream dial has this functionality too.
It helps fast etcd stop, which doesn't need to wait for stream break to
fall back to pipeline, and wait for election timeout to send out message
to detect self removal.
A bit was missing from the documentation on disaster recovery, the reset
of the advertised peer urls for the node recovered from backup. Without
that, any subsequent server joining the cluster would not be able to
speak to the first node.
When removing a member, etcdserver might return 410 that indicates
the member has been removed. To client, 410 is a vaild response since
the client might do internal retry.