Merge pull request #1883 from xiang90/member_migration

doc: add doc for member migration
2024-09-27 06:25:44 +00:00 · 2014-12-08 14:09:09 -08:00 · 2014-12-08 14:09:09 -08:00 · 091cc237e3
commit 091cc237e3
parent 31b9f712ba 069578c29c
2 changed files with 19 additions and 2 deletions
--- a/Documentation/0.5/admin_guide.md
+++ b/Documentation/0.5/admin_guide.md
@ -32,6 +32,19 @@ The data directory has two sub-directories in it:
 If you are spinning up multiple clusters for testing it is recommended that you specify a unique initial-cluster-token for the different clusters.
 This can protect you from cluster corruption in case of mis-configuration because two members started with different cluster tokens will refuse members from each other.

+### Member Migration
+
+When there is a scheduled machine maintenance or retirement, you might want to migrate an etcd member to another machine without losing the data and changing the member ID. 
+
+The data directory contains all the data to recover a member to its point-in-time state. To migrate a member:
+
+* Stop the member process
+* Copy the data directory of the now-idle member to the new machine
+* Update the peer URLs for that member to reflect the new machine according to the [member api] [change peer url]
+* Start etcd on the new machine, using the same configuration and the copy of the data directory
+
+[change peer url]: https://github.com/coreos/etcd/blob/master/Documentation/0.5/other_apis.md#change-the-peer-urls-of-a-member 
+
 ### Disaster Recovery

 etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N/2)-1_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
--- a/Documentation/0.5/runtime-configuration.md
+++ b/Documentation/0.5/runtime-configuration.md
@ -6,12 +6,16 @@ etcd comes with support for incremental runtime reconfiguration, which allows us

 Let us walk through the four use cases for re-configuring a cluster: replacing a member, increasing or decreasing cluster size, and restarting a cluster from a majority failure.

-### Replace a Member
+### Replace a Non-recoverable Member

-The most common use case of cluster reconfiguration is to replace a member because of a permanent failure of the existing member: for example, hardware failure, loss of network address, or data directory corruption.
+The most common use case of cluster reconfiguration is to replace a member because of a permanent failure of the existing member: for example, hardware failure or data directory corruption.
 It is important to replace failed members as soon as the failure is detected.
 If etcd falls below a simple majority of members it can no longer accept writes: e.g. in a 3 member cluster the loss of two members will cause writes to fail and the cluster to stop operating.

+If you want to migrate an running member to another machine, please refer [member migration section][member migration].
+
+[member migration]: https://github.com/coreos/etcd/blob/master/Documentation/0.5/admin_guide.md#member-migration
+
 ### Increase Cluster Size

 To make your cluster more resilient to machine failure you can increase the size of the cluster.