fix(docs): update doc with standby info

This commit is contained in:
Rob Szumski 2014-05-21 11:44:30 -07:00
parent 22c944d8ef
commit 001cceb1cd
2 changed files with 33 additions and 37 deletions

View File

@ -166,16 +166,4 @@ In the previous example we showed how to use SSL client certs for client-to-serv
Etcd can also do internal server-to-server communication using SSL client certs.
To do this just change the `-*-file` flags to `-peer-*-file`.
If you are using SSL for server-to-server communication, you must use it on all instances of etcd.
### What size cluster should I use?
Every command the client sends to the master is broadcast to all of the followers.
The command is not committed until the majority of the cluster peers receive that command.
Because of this majority voting property, the ideal cluster should be kept small to keep speed up and be made up of an odd number of peers.
Odd numbers are good because if you have 8 peers the majority will be 5 and if you have 9 peers the majority will still be 5.
The result is that an 8 peer cluster can tolerate 3 peer failures and a 9 peer cluster can tolerate 4 machine failures.
And in the best case when all 9 peers are responding the cluster will perform at the speed of the fastest 5 machines.
If you are using SSL for server-to-server communication, you must use it on all instances of etcd.

View File

@ -1,30 +1,38 @@
# Optimal etcd Cluster Size
etcd's Raft consensus algorithm is most efficient in small clusters between 3 and 9 peers. Let's briefly explore how etcd works internally to understand why.
## Writing to etcd
Writes to an etcd peer are always redirected to the leader of the cluster and distributed to all of the peers immediately. A write is only considered successful when a majority of the peers acknowledge the write.
For example, in a 5 node cluster, a write operation is only as fast as the 3rd fastest machine. This is the main reason for keeping your etcd cluster below 9 nodes. In practice, you only need to worry about write performance in high latency environments such as a cluster spanning multiple data centers.
## Leader Election
The leader election process is similar to writing a key — a majority of the cluster must acknowledge the new leader before cluster operations can continue. The longer each node takes to elect a new leader means you have to wait longer before you can write to the cluster again. In low latency environments this process takes milliseconds.
## Odd Cluster Size
The other important cluster optimization is to always have an odd cluster size. Adding an odd node to the cluster doesn't change the size of the majority and therefore doesn't increase the total latency of the majority as described above. But you do gain a higher tolerance for peer failure by adding the extra machine. You can see this in practice when comparing two even and odd sized clusters:
| Cluster Size | Majority | Failure Tolerance |
|--------------|------------|-------------------|
| 8 machines | 5 machines | 3 machines |
| 9 machines | 5 machines | **4 machines** |
As you can see, adding another node to bring the cluster up to an odd size is always worth it. During a network partition, an odd cluster size also guarantees that there will almost always be a majority of the cluster that can continue to operate and be the source of truth when the partition ends.
etcd's Raft consensus algorithm is most efficient in small clusters between 3 and 9 peers. For clusters larger than 9, etcd will select a subset of instances to participate in the algorithm in order to keep it efficient. The end of this document briefly explores how etcd works internally and why these choices have been made.
## Cluster Management
Currently, each CoreOS machine is an etcd peer — if you have 30 CoreOS machines, you have 30 etcd peers and end up with a cluster size that is way too large. If desired, you may manually stop some of these etcd instances to increase cluster performance.
You can manage the active cluster size through the [cluster config API](https://github.com/coreos/etcd/blob/master/Documentation/api.md#cluster-config). `activeSize` represents the etcd peers allowed to actively participate in the consensus algorithm.
Functionality is being developed to expose two different types of followers: active and benched followers. Active followers will influence operations within the cluster. Benched followers will not participate, but will transparently proxy etcd traffic to an active follower. This allows every CoreOS machine to expose etcd on port 4001 for ease of use. Benched followers will have the ability to transition into an active follower if needed.
If the total number of etcd instances exceeds this number, additional peers are started as [standbys](https://github.com/coreos/etcd/blob/master/Documentation/design/standbys.md), which can be promoted to active participation if one of the existing active instances has failed or been removed.
## Internals of etcd
### Writing to etcd
Writes to an etcd peer are always redirected to the leader of the cluster and distributed to all of the peers immediately. A write is only considered successful when a majority of the peers acknowledge the write.
For example, in a cluster with 5 peers, a write operation is only as fast as the 3rd fastest machine. This is the main reason for keeping the number of active peers below 9. In practice, you only need to worry about write performance in high latency environments such as a cluster spanning multiple data centers.
### Leader Election
The leader election process is similar to writing a key — a majority of the active peers must acknowledge the new leader before cluster operations can continue. The longer each peer takes to elect a new leader means you have to wait longer before you can write to the cluster again. In low latency environments this process takes milliseconds.
### Odd Active Cluster Size
The other important cluster optimization is to always have an odd active cluster size (i.e. `activeSize`). Adding an odd node to the number of peers doesn't change the size of the majority and therefore doesn't increase the total latency of the majority as described above. But, you gain a higher tolerance for peer failure by adding the extra machine. You can see this in practice when comparing two even and odd sized clusters:
| Active Peers | Majority | Failure Tolerance |
|--------------|------------|-------------------|
| 1 peers | 1 peers | None |
| 3 peers | 2 peers | 1 peer |
| 4 peers | 3 peers | 2 peers |
| 5 peers | 3 peers | **3 peers** |
| 6 peers | 4 peers | 2 peers |
| 7 peers | 4 peers | **3 peers** |
| 8 peers | 5 peers | 3 peers |
| 9 peers | 5 peers | **4 peers** |
As you can see, adding another peer to bring the number of active peers up to an odd size is always worth it. During a network partition, an odd number of active peers also guarantees that there will almost always be a majority of the cluster that can continue to operate and be the source of truth when the partition ends.