diff --git a/Documentation/op-guide/runtime-reconf-design.md b/Documentation/op-guide/runtime-reconf-design.md index 3632301c4..0b6d8d2e6 100644 --- a/Documentation/op-guide/runtime-reconf-design.md +++ b/Documentation/op-guide/runtime-reconf-design.md @@ -6,11 +6,11 @@ Read on to learn about the design of etcd's runtime reconfiguration commands and ## Two phase config changes keep the cluster safe -In etcd, every runtime reconfiguration has to go through [two phases][add-member] for safety reasons. For example, to add a member, first inform cluster of new configuration and then start the new member. +In etcd, every runtime reconfiguration has to go through [two phases][add-member] for safety reasons. For example, to add a member, first inform the cluster of the new configuration and then start the new member. Phase 1 - Inform cluster of new configuration -To add a member into etcd cluster, make an API call to request a new member to be added to the cluster. This is the only way to add a new member into an existing cluster. The API call returns when the cluster agrees on the configuration change. +To add a member into an etcd cluster, make an API call to request a new member to be added to the cluster. This is the only way to add a new member into an existing cluster. The API call returns when the cluster agrees on the configuration change. Phase 2 - Start new member @@ -28,19 +28,19 @@ If a cluster permanently loses a majority of its members, a new cluster will nee It is entirely possible to force removing the failed members from the existing cluster to recover. However, we decided not to support this method since it bypasses the normal consensus committing phase, which is unsafe. If the member to remove is not actually dead or force removed through different members in the same cluster, etcd will end up with a diverged cluster with same clusterID. This is very dangerous and hard to debug/fix afterwards. -With a correct deployment, the possibility of permanent majority lose is very low. But it is a severe enough problem that worth special care. We strongly suggest reading the [disaster recovery documentation][disaster-recovery] and preparing for permanent majority lose before putting etcd into production. +With a correct deployment, the possibility of permanent majority loss is very low. But it is a severe enough problem that is worth special care. We strongly suggest reading the [disaster recovery documentation][disaster-recovery] and preparing for permanent majority loss before putting etcd into production. ## Do not use public discovery service for runtime reconfiguration -The public discovery service should only be used for bootstrapping a cluster. To join member into an existing cluster, use runtime reconfiguration API. +The public discovery service should only be used for bootstrapping a cluster. To join member into an existing cluster, use the runtime reconfiguration API. -Discovery service is designed for bootstrapping an etcd cluster in the cloud environment, when the IP addresses of all the members are not known beforehand. After successfully bootstrapping a cluster, the IP addresses of all the members are known. Technically, the discovery service should no longer be needed. +The discovery service is designed for bootstrapping an etcd cluster in a cloud environment, when the IP addresses of all the members are not known beforehand. After successfully bootstrapping a cluster, the IP addresses of all the members are known. Technically, the discovery service should no longer be needed. It seems that using public discovery service is a convenient way to do runtime reconfiguration, after all discovery service already has all the cluster configuration information. However relying on public discovery service brings troubles: 1. it introduces external dependencies for the entire life-cycle of the cluster, not just bootstrap time. If there is a network issue between the cluster and public discovery service, the cluster will suffer from it. -2. public discovery service must reflect correct runtime configuration of the cluster during it life-cycle. It has to provide security mechanism to avoid bad actions, and it is hard. +2. public discovery service must reflect correct runtime configuration of the cluster during its life-cycle. It has to provide security mechanisms to avoid bad actions, and it is hard. 3. public discovery service has to keep tens of thousands of cluster configurations. Our public discovery service backend is not ready for that workload.