From c6be7887e5d2d2b1191b28ae6149218d0f0a6949 Mon Sep 17 00:00:00 2001 From: Yicheng Qin Date: Mon, 21 Apr 2014 10:17:31 -0700 Subject: [PATCH] docs(discovery): update cluster finding process --- Documentation/cluster-discovery.md | 6 ++-- Documentation/design/cluster-finding.md | 34 +++++++++++++++++++ Documentation/design/discovery.md | 45 ------------------------- 3 files changed, 37 insertions(+), 48 deletions(-) create mode 100644 Documentation/design/cluster-finding.md delete mode 100644 Documentation/design/discovery.md diff --git a/Documentation/cluster-discovery.md b/Documentation/cluster-discovery.md index a10eaf606..d2f1d059e 100644 --- a/Documentation/cluster-discovery.md +++ b/Documentation/cluster-discovery.md @@ -2,9 +2,9 @@ ## Overview -Starting an etcd cluster can be painful since each node needs to know of another node in the cluster to get started. If you are trying to bring up a cluster all at once, say using a cloud formation, you also need to coordinate who will be the initial cluster leader. The discovery protocol helps you by providing an automated way to discover other existing peers in a cluster. +Starting an etcd cluster requires that each node knows another in the cluster. If you are trying to bring up a cluster all at once, say using a cloud formation, you also need to coordinate who will be the initial cluster leader. The discovery protocol helps you by providing an automated way to discover other existing peers in a cluster. -Peer discovery for etcd is processed by `-discovery`, `-peers` and lastly log data in `-data-dir`. For more information see the [discovery design][discovery-design]. +For more information on how etcd can locate the cluster, see the [finding the cluster][cluster-finding] documentation. Please note - at least 3 nodes are required for [cluster availability][optimal-cluster-size]. @@ -52,4 +52,4 @@ The Discovery API submits the `-peer-addr` of each etcd instance to the configur The discovery API will automatically clean up the address of a stale peer that is no longer part of the cluster. The TTL for this process is a week, which should be long enough to handle any extremely long outage you may encounter. There is no harm in having stale peers in the list until they are cleaned up, since an etcd instance only needs to connect to one valid peer in the cluster to join. -[discovery-design]: https://github.com/coreos/etcd/blob/master/Documentation/design/discovery.md +[discovery-design]: https://github.com/coreos/etcd/blob/master/Documentation/design/cluster-finding.md diff --git a/Documentation/design/cluster-finding.md b/Documentation/design/cluster-finding.md new file mode 100644 index 000000000..9cc3bafcc --- /dev/null +++ b/Documentation/design/cluster-finding.md @@ -0,0 +1,34 @@ +## Cluster Finding Process + +Peer discovery uses the following sources in this order: log data in `-data-dir`, `-discovery` and `-peers`. + +If log data is provided, etcd will concatenate possible peers from three sources: the log data, the `-discovery` option, and `-peers`. Then it tries to join cluster through them one by one. If all connection attempts fail (which indicates that the majority of the cluster is currently down), it will restart itself based on the log data, which helps the cluster to recover from a full outage. + +Without log data, the instance is assumed to be a brand new one. If possible targets are provided by `-discovery` and `-peers`, etcd will make a best effort attempt to join them, and if none is reachable it will exit. Otherwise, if no `-discovery` or `-peers` option is provided, a new cluster will always be started. + +This ensures that users can always restart the node safely with the same command (without --force), and etcd will either reconnect to the old cluster if it is still running or recover its cluster from a outage. + +## Logical Workflow + +Start an etcd machine: + +``` +If log data is given: + Try to join via peers in previous cluster + Try to join via peers found in discover URL + Try to join via peers in peer list + Restart the previous cluster which is down + return + +If discover URL is given: + Fetch peers through discover URL + If Success: + Join via peers found + return + +If peer list is given: + Join as follower via peers in peer list + return + +Start as the leader of a new cluster +``` diff --git a/Documentation/design/discovery.md b/Documentation/design/discovery.md deleted file mode 100644 index 237b30ddb..000000000 --- a/Documentation/design/discovery.md +++ /dev/null @@ -1,45 +0,0 @@ -## Discovery Rule - -Peer discovery uses the following sources in this order: `-discovery`, `-peers`, log data in `-data-dir`. - -If none of these is set, it will start a new cluster by itself. If any of them is set, it will make -best efforts to find cluster, and panic if none is reachable. - -If a discover URL is provided and the discovery process succeeds then it will find peers specified by the discover URL only. -This is because we assume that it has been registered in discover URL and -should not join other clusters. - -If a discover URL is provided but the discovery process fails then we will prevent the node from forming -a new cluster. We assume the user doesn't want to start a brand new cluster without noticing discover URL. - -## Logical Workflow - -Start an etcd machine: - -``` -If discovery url is given: - Do discovery - If Success: - Join to the cluster discovered - return - -If peer list is given: - Try to join as follower via peer list - If Success: return - -If log data is given: - Try to join as follower via peers in previous cluster - If Success: return - -If log data is given: - Restart the previous cluster which is down - return - -If discovery url is given: - Panic - -If peer list is given: - Panic - -Start as the leader of a new cluster -```