Merge pull request #9735 from gyuho/upgrade-guide

Documentation/upgrades: improve server checklists with zap logger
2024-09-27 06:25:44 +00:00 · 2018-05-17 13:15:35 -07:00 · 2018-05-17 13:15:35 -07:00 · 63b6f432dc
commit 63b6f432dc
parent 7f497b81c8 22c0f518f7
2 changed files with 367 additions and 118 deletions
--- a/Documentation/upgrades/upgrade_3_4.md
+++ b/Documentation/upgrades/upgrade_3_4.md
@ -6,6 +6,8 @@ In the general case, upgrading from etcd 3.3 to 3.4 can be a zero-downtime, roll

 Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.

+
+
 ### Upgrade checklists

 **NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
@ -140,6 +142,8 @@ curl -L http://localhost:2379/v3/kv/put \

 Requests to `/v3beta` endpoints will redirect to `/v3`, and `/v3beta` will be removed in 3.5 release.

+
+
 ### Server upgrade checklists

 #### Upgrade requirements
@ -152,7 +156,7 @@ Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. C

 Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.

-Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
+Before beginning, [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).

 #### Mixed versions

@ -170,97 +174,215 @@ For a much larger total data size, 100MB or more , this one-time process might t

 If all members have been upgraded to v3.4, the cluster will be upgraded to v3.4, and downgrade from this completed state is **not possible**. If any single member is still v3.3, however, the cluster and its operations remains "v3.3", and it is possible from this mixed cluster state to return to using a v3.3 etcd binary on all members.

-Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
+Please [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup) to make downgrading the cluster possible even after it has been completely upgraded.

 ### Upgrade procedure

 This example shows how to upgrade a 3-member v3.3 ectd cluster running on a local machine.

-#### 1. Check upgrade requirements
+#### Step 1: check upgrade requirements

 Is the cluster healthy and running v3.3.x?

-```
-$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
-localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
-localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
+```bash
+etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint health
+<<COMMENT
+localhost:2379 is healthy: successfully committed proposal: took = 2.118638ms
+localhost:22379 is healthy: successfully committed proposal: took = 3.631388ms
+localhost:32379 is healthy: successfully committed proposal: took = 2.157051ms
+COMMENT

-$ curl http://localhost:2379/version
-{"etcdserver":"3.3.0","etcdcluster":"3.3.0"}
+curl http://localhost:2379/version
+<<COMMENT
+{"etcdserver":"3.3.5","etcdcluster":"3.3.0"}
+COMMENT
+
+curl http://localhost:22379/version
+<<COMMENT
+{"etcdserver":"3.3.5","etcdcluster":"3.3.0"}
+COMMENT
+
+curl http://localhost:32379/version
+<<COMMENT
+{"etcdserver":"3.3.5","etcdcluster":"3.3.0"}
+COMMENT
 ```

-#### 2. Stop the existing etcd process
+#### Step 2: download snapshot backup from leader
+
+[Download the snapshot backup](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur.
+
+etcd leader is guaranteed to have the latest application data, thus fetch snapshot from leader:
+
+```bash
+curl -sL http://localhost:2379/metrics | grep etcd_server_is_leader
+<<COMMENT
+# HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
+# TYPE etcd_server_is_leader gauge
+etcd_server_is_leader 1
+COMMENT
+
+curl -sL http://localhost:22379/metrics | grep etcd_server_is_leader
+<<COMMENT
+etcd_server_is_leader 0
+COMMENT
+
+curl -sL http://localhost:32379/metrics | grep etcd_server_is_leader
+<<COMMENT
+etcd_server_is_leader 0
+COMMENT
+
+etcdctl --endpoints=localhost:2379 snapshot save backup.db
+<<COMMENT
+{"level":"info","ts":1526585787.148433,"caller":"snapshot/v3_snapshot.go:109","msg":"created temporary db file","path":"backup.db.part"}
+{"level":"info","ts":1526585787.1485257,"caller":"snapshot/v3_snapshot.go:120","msg":"fetching snapshot","endpoint":"localhost:2379"}
+{"level":"info","ts":1526585787.1519694,"caller":"snapshot/v3_snapshot.go:133","msg":"fetched snapshot","endpoint":"localhost:2379","took":0.003502721}
+{"level":"info","ts":1526585787.1520295,"caller":"snapshot/v3_snapshot.go:142","msg":"saved","path":"backup.db"}
+Snapshot saved at backup.db
+COMMENT
+```
+
+#### Step 3: stop one existing etcd server

 When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:

-```
-14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
-14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
-14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
-14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
-14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
-14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
-14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
-14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
-14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
-14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
-14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
-14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
-14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
-14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
-14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
+```bash
+10.237579 I | etcdserver: updating the cluster version from 3.0 to 3.3
+10.238315 N | etcdserver/membership: updated the cluster version from 3.0 to 3.3
+10.238451 I | etcdserver/api: enabled capabilities for version 3.3
+
+
+^C21.192174 N | pkg/osutil: received interrupt signal, shutting down...
+21.192459 I | etcdserver: 7339c4e5e833c029 starts leadership transfer from 7339c4e5e833c029 to 729934363faa4a24
+21.192569 I | raft: 7339c4e5e833c029 [term 8] starts to transfer leadership to 729934363faa4a24
+21.192619 I | raft: 7339c4e5e833c029 sends MsgTimeoutNow to 729934363faa4a24 immediately as 729934363faa4a24 already has up-to-date log
+WARNING: 2018/05/17 12:45:21 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {localhost:2379 0  <nil>}
+WARNING: 2018/05/17 12:45:21 grpc: addrConn.transportMonitor exits due to: grpc: the connection is closing
+21.193589 I | raft: 7339c4e5e833c029 [term: 8] received a MsgVote message with higher term from 729934363faa4a24 [term: 9]
+21.193626 I | raft: 7339c4e5e833c029 became follower at term 9
+21.193651 I | raft: 7339c4e5e833c029 [logterm: 8, index: 9, vote: 0] cast MsgVote for 729934363faa4a24 [logterm: 8, index: 9] at term 9
+21.193675 I | raft: raft.node: 7339c4e5e833c029 lost leader 7339c4e5e833c029 at term 9
+21.194424 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 9
+21.292898 I | etcdserver: 7339c4e5e833c029 finished leadership transfer from 7339c4e5e833c029 to 729934363faa4a24 (took 100.436391ms)
+21.292975 I | rafthttp: stopping peer 729934363faa4a24...
+21.293206 I | rafthttp: closed the TCP streaming connection with peer 729934363faa4a24 (stream MsgApp v2 writer)
+21.293225 I | rafthttp: stopped streaming with peer 729934363faa4a24 (writer)
+21.293437 I | rafthttp: closed the TCP streaming connection with peer 729934363faa4a24 (stream Message writer)
+21.293459 I | rafthttp: stopped streaming with peer 729934363faa4a24 (writer)
+21.293514 I | rafthttp: stopped HTTP pipelining with peer 729934363faa4a24
+21.293590 W | rafthttp: lost the TCP streaming connection with peer 729934363faa4a24 (stream MsgApp v2 reader)
+21.293610 I | rafthttp: stopped streaming with peer 729934363faa4a24 (stream MsgApp v2 reader)
+21.293680 W | rafthttp: lost the TCP streaming connection with peer 729934363faa4a24 (stream Message reader)
+21.293700 I | rafthttp: stopped streaming with peer 729934363faa4a24 (stream Message reader)
+21.293711 I | rafthttp: stopped peer 729934363faa4a24
+21.293720 I | rafthttp: stopping peer b548c2511513015...
+21.293987 I | rafthttp: closed the TCP streaming connection with peer b548c2511513015 (stream MsgApp v2 writer)
+21.294063 I | rafthttp: stopped streaming with peer b548c2511513015 (writer)
+21.294467 I | rafthttp: closed the TCP streaming connection with peer b548c2511513015 (stream Message writer)
+21.294561 I | rafthttp: stopped streaming with peer b548c2511513015 (writer)
+21.294742 I | rafthttp: stopped HTTP pipelining with peer b548c2511513015
+21.294867 W | rafthttp: lost the TCP streaming connection with peer b548c2511513015 (stream MsgApp v2 reader)
+21.294892 I | rafthttp: stopped streaming with peer b548c2511513015 (stream MsgApp v2 reader)
+21.294990 W | rafthttp: lost the TCP streaming connection with peer b548c2511513015 (stream Message reader)
+21.295004 E | rafthttp: failed to read b548c2511513015 on stream Message (context canceled)
+21.295013 I | rafthttp: peer b548c2511513015 became inactive
+21.295024 I | rafthttp: stopped streaming with peer b548c2511513015 (stream Message reader)
+21.295035 I | rafthttp: stopped peer b548c2511513015
 ```

-It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
+#### Step 4: restart the etcd server with same configuration

-```
-$ etcdctl snapshot save backup.db
+Restart the etcd server with same configuration but with the new etcd binary.
+
+```diff
+-etcd-old --name s1 \
+etcd-new --name s1 \
+  --data-dir /tmp/etcd/s1 \
+  --listen-client-urls http://localhost:2379 \
+  --advertise-client-urls http://localhost:2379 \
+  --listen-peer-urls http://localhost:2380 \
+  --initial-advertise-peer-urls http://localhost:2380 \
+  --initial-cluster s1=http://localhost:2380,s2=http://localhost:22380,s3=http://localhost:32380 \
+  --initial-cluster-token tkn \
+ --initial-cluster-state new \
+ --logger zap \
+ --log-outputs stderr
 ```

-#### 3. Drop-in etcd v3.4 binary and start the new etcd process
+The new v3.4 etcd will publish its information to the cluster. At this point, cluster still operates as v3.3 protocol, which is the lowest common version.

-The new v3.4 etcd will publish its information to the cluster:
+> `{"level":"info","ts":1526586617.1647713,"caller":"membership/cluster.go:485","msg":"set initial cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","cluster-version":"3.0"}`

-```
-14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
-```
+> `{"level":"info","ts":1526586617.1648536,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.0"}`
+
+> `{"level":"info","ts":1526586617.1649303,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","from":"3.0","from":"3.3"}`
+
+> `{"level":"info","ts":1526586617.1649797,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.3"}`
+
+> `{"level":"info","ts":1526586617.2107732,"caller":"etcdserver/server.go:1770","msg":"published local member to cluster through raft","local-member-id":"7339c4e5e833c029","local-member-attributes":"{Name:s1 ClientURLs:[http://localhost:2379]}","request-path":"/0/members/7339c4e5e833c029/attributes","cluster-id":"7dee9ba76d59ed53","publish-timeout":7}`

 Verify that each member, and then the entire cluster, becomes healthy with the new v3.4 etcd binary:

-```
-$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
-localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
-localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
+```bash
+etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
+<<COMMENT
+localhost:32379 is healthy: successfully committed proposal: took = 2.337471ms
+localhost:22379 is healthy: successfully committed proposal: took = 1.130717ms
+localhost:2379 is healthy: successfully committed proposal: took = 2.124843ms
+COMMENT
 ```

-Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.4:
+Un-upgraded members will log warnings like the following until the entire cluster is upgraded.
+
+This is expected and will cease after all etcd cluster members are upgraded to v3.4:

 ```
-14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.4.0
-14:15:21.073110 W | etcdserver: the local etcd version 3.3.0 is not up-to-date
-14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.4.0
-14:15:21.073157 W | etcdserver: the local etcd version 3.3.0 is not up-to-date
-14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.4.0
+:41.942121 W | etcdserver: member 7339c4e5e833c029 has a higher version 3.4.0
+:45.945154 W | etcdserver: the local etcd version 3.3.5 is not up-to-date
 ```

-#### 4. Repeat step 2 to step 3 for all other members
-
-#### 5. Finish
+#### Step 5: repeat *step 3* and *step 4* for rest of the members

 When all members are upgraded, the cluster will report upgrading to 3.4 successfully:

-```
-14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.3 to 3.4
-14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.4
+Member 1:
+
+> `{"level":"info","ts":1526586949.0920913,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}`
+> `{"level":"info","ts":1526586949.0921566,"caller":"etcdserver/server.go:2272","msg":"cluster version is updated","cluster-version":"3.4"}`
+
+Member 2:
+
+> `{"level":"info","ts":1526586949.092117,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"729934363faa4a24","from":"3.3","from":"3.4"}`
+> `{"level":"info","ts":1526586949.0923078,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}`
+
+Member 3:
+
+> `{"level":"info","ts":1526586949.0921423,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"b548c2511513015","from":"3.3","from":"3.4"}`
+> `{"level":"info","ts":1526586949.0922918,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}`
+
+
+```bash
+endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
+<<COMMENT
+localhost:2379 is healthy: successfully committed proposal: took = 492.834µs
+localhost:22379 is healthy: successfully committed proposal: took = 1.015025ms
+localhost:32379 is healthy: successfully committed proposal: took = 1.853077ms
+COMMENT
+
+curl http://localhost:2379/version
+<<COMMENT
+{"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
+
+curl http://localhost:22379/version
+<<COMMENT
+{"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
+
+curl http://localhost:32379/version
+<<COMMENT
+{"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
 ```

-```
-$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
-localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
-localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
-```
-
-[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
+[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
--- a/Documentation/upgrades/upgrade_3_5.md
+++ b/Documentation/upgrades/upgrade_3_5.md
@ -6,6 +6,8 @@ In the general case, upgrading from etcd 3.4 to 3.5 can be a zero-downtime, roll

 Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.

+
+
 ### Upgrade checklists

 **NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
@ -77,6 +79,8 @@ curl -L http://localhost:2379/v3/kv/put \

 `/v3beta` has been removed in 3.5 release.

+
+
 ### Server upgrade checklists

 #### Upgrade requirements
@ -89,7 +93,7 @@ Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. C

 Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.

-Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
+Before beginning, [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).

 #### Mixed versions

@ -105,99 +109,222 @@ For a much larger total data size, 100MB or more , this one-time process might t

 #### Downgrade

-If all members have been upgraded to v3.5, the cluster will be upgraded to v3.5, and downgrade from this completed state is **not possible.** If any single member is still v3.4, however, the cluster and its operations remains "v3.4", and it is possible from this mixed cluster state to return to using a v3.4 etcd binary on all members.
+If all members have been upgraded to v3.5, the cluster will be upgraded to v3.5, and downgrade from this completed state is **not possible**. If any single member is still v3.4, however, the cluster and its operations remains "v3.4", and it is possible from this mixed cluster state to return to using a v3.4 etcd binary on all members.

-Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
+Please [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup) to make downgrading the cluster possible even after it has been completely upgraded.

 ### Upgrade procedure

 This example shows how to upgrade a 3-member v3.4 ectd cluster running on a local machine.

-#### 1. Check upgrade requirements
+#### Step 1: check upgrade requirements

 Is the cluster healthy and running v3.4.x?

-```
-$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
-localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
-localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
+```bash
+etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint health
+<<COMMENT
+localhost:2379 is healthy: successfully committed proposal: took = 2.118638ms
+localhost:22379 is healthy: successfully committed proposal: took = 3.631388ms
+localhost:32379 is healthy: successfully committed proposal: took = 2.157051ms
+COMMENT

-$ curl http://localhost:2379/version
+curl http://localhost:2379/version
+<<COMMENT
 {"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
+
+curl http://localhost:22379/version
+<<COMMENT
+{"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
+
+curl http://localhost:32379/version
+<<COMMENT
+{"etcdserver":"3.4.0","etcdcluster":"3.4.0"}
+COMMENT
 ```

-#### 2. Stop the existing etcd process
+#### Step 2: download snapshot backup from leader
+
+[Download the snapshot backup](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur.
+
+etcd leader is guaranteed to have the latest application data, thus fetch snapshot from leader:
+
+```bash
+curl -sL http://localhost:2379/metrics | grep etcd_server_is_leader
+<<COMMENT
+# HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
+# TYPE etcd_server_is_leader gauge
+etcd_server_is_leader 1
+COMMENT
+
+curl -sL http://localhost:22379/metrics | grep etcd_server_is_leader
+<<COMMENT
+etcd_server_is_leader 0
+COMMENT
+
+curl -sL http://localhost:32379/metrics | grep etcd_server_is_leader
+<<COMMENT
+etcd_server_is_leader 0
+COMMENT
+
+etcdctl --endpoints=localhost:2379 snapshot save backup.db
+<<COMMENT
+{"level":"info","ts":1526585787.148433,"caller":"snapshot/v3_snapshot.go:109","msg":"created temporary db file","path":"backup.db.part"}
+{"level":"info","ts":1526585787.1485257,"caller":"snapshot/v3_snapshot.go:120","msg":"fetching snapshot","endpoint":"localhost:2379"}
+{"level":"info","ts":1526585787.1519694,"caller":"snapshot/v3_snapshot.go:133","msg":"fetched snapshot","endpoint":"localhost:2379","took":0.003502721}
+{"level":"info","ts":1526585787.1520295,"caller":"snapshot/v3_snapshot.go:142","msg":"saved","path":"backup.db"}
+Snapshot saved at backup.db
+COMMENT
+```
+
+#### Step 3: stop one existing etcd server

 When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:

-```
-14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
-14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
-14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
-14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
-14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
-14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
-14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
-14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
-14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
-14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
-14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
-14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
-14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
-14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
-14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
+```bash
+{"level":"info","ts":1526587281.2001143,"caller":"etcdserver/server.go:2249","msg":"updating cluster version","from":"3.0","to":"3.4"}
+{"level":"info","ts":1526587281.2010646,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","from":"3.0","from":"3.4"}
+{"level":"info","ts":1526587281.2012327,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}
+{"level":"info","ts":1526587281.2013083,"caller":"etcdserver/server.go:2272","msg":"cluster version is updated","cluster-version":"3.4"}
+
+
+
+^C{"level":"info","ts":1526587299.0717514,"caller":"osutil/interrupt_unix.go:63","msg":"received signal; shutting down","signal":"interrupt"}
+{"level":"info","ts":1526587299.0718873,"caller":"embed/etcd.go:285","msg":"closing etcd server","name":"s1","data-dir":"/tmp/etcd/s1","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
+{"level":"info","ts":1526587299.0722554,"caller":"etcdserver/server.go:1341","msg":"leadership transfer starting","local-member-id":"7339c4e5e833c029","current-leader-member-id":"7339c4e5e833c029","transferee-member-id":"729934363faa4a24"}
+{"level":"info","ts":1526587299.0723994,"caller":"raft/raft.go:1107","msg":"7339c4e5e833c029 [term 3] starts to transfer leadership to 729934363faa4a24"}
+{"level":"info","ts":1526587299.0724802,"caller":"raft/raft.go:1113","msg":"7339c4e5e833c029 sends MsgTimeoutNow to 729934363faa4a24 immediately as 729934363faa4a24 already has up-to-date log"}
+{"level":"info","ts":1526587299.0737045,"caller":"raft/raft.go:797","msg":"7339c4e5e833c029 [term: 3] received a MsgVote message with higher term from 729934363faa4a24 [term: 4]"}
+{"level":"info","ts":1526587299.0737681,"caller":"raft/raft.go:656","msg":"7339c4e5e833c029 became follower at term 4"}
+{"level":"info","ts":1526587299.073831,"caller":"raft/raft.go:882","msg":"7339c4e5e833c029 [logterm: 3, index: 9, vote: 0] cast MsgVote for 729934363faa4a24 [logterm: 3, index: 9] at term 4"}
+{"level":"info","ts":1526587299.0738947,"caller":"raft/node.go:312","msg":"raft.node: 7339c4e5e833c029 lost leader 7339c4e5e833c029 at term 4"}
+{"level":"info","ts":1526587299.0748374,"caller":"raft/node.go:306","msg":"raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 4"}
+{"level":"info","ts":1526587299.1726425,"caller":"etcdserver/server.go:1362","msg":"leadership transfer finished","local-member-id":"7339c4e5e833c029","old-leader-member-id":"7339c4e5e833c029","new-leader-member-id":"729934363faa4a24","took":0.100389359}
+{"level":"info","ts":1526587299.1728148,"caller":"rafthttp/peer.go:333","msg":"stopping remote peer","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.1751974,"caller":"rafthttp/stream.go:291","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.1752589,"caller":"rafthttp/stream.go:301","msg":"stopped TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.177348,"caller":"rafthttp/stream.go:291","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.1774004,"caller":"rafthttp/stream.go:301","msg":"stopped TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"b548c2511513015"}
+{"level":"info","ts":1526587299.177515,"caller":"rafthttp/pipeline.go:86","msg":"stopped HTTP pipelining with remote peer","local-member-id":"7339c4e5e833c029","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.1777067,"caller":"rafthttp/stream.go:436","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"7339c4e5e833c029","remote-peer-id":"b548c2511513015","error":"read tcp 127.0.0.1:34636->127.0.0.1:32380: use of closed network connection"}
+{"level":"info","ts":1526587299.1778402,"caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"7339c4e5e833c029","remote-peer-id":"b548c2511513015"}
+{"level":"warn","ts":1526587299.1780295,"caller":"rafthttp/stream.go:436","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"7339c4e5e833c029","remote-peer-id":"b548c2511513015","error":"read tcp 127.0.0.1:34634->127.0.0.1:32380: use of closed network connection"}
+{"level":"info","ts":1526587299.1780987,"caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"7339c4e5e833c029","remote-peer-id":"b548c2511513015"}
+{"level":"info","ts":1526587299.1781602,"caller":"rafthttp/peer.go:340","msg":"stopped remote peer","remote-peer-id":"b548c2511513015"}
+{"level":"info","ts":1526587299.1781986,"caller":"rafthttp/peer.go:333","msg":"stopping remote peer","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1802843,"caller":"rafthttp/stream.go:291","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1803446,"caller":"rafthttp/stream.go:301","msg":"stopped TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1824749,"caller":"rafthttp/stream.go:291","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.18255,"caller":"rafthttp/stream.go:301","msg":"stopped TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"729934363faa4a24"}
+{"level":"info","ts":1526587299.18261,"caller":"rafthttp/pipeline.go:86","msg":"stopped HTTP pipelining with remote peer","local-member-id":"7339c4e5e833c029","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1827736,"caller":"rafthttp/stream.go:436","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"7339c4e5e833c029","remote-peer-id":"729934363faa4a24","error":"read tcp 127.0.0.1:51482->127.0.0.1:22380: use of closed network connection"}
+{"level":"info","ts":1526587299.182845,"caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"7339c4e5e833c029","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1830168,"caller":"rafthttp/stream.go:436","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"7339c4e5e833c029","remote-peer-id":"729934363faa4a24","error":"context canceled"}
+{"level":"warn","ts":1526587299.1831107,"caller":"rafthttp/peer_status.go:65","msg":"peer became inactive","peer-id":"729934363faa4a24","error":"failed to read 729934363faa4a24 on stream Message (context canceled)"}
+{"level":"info","ts":1526587299.1831737,"caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"7339c4e5e833c029","remote-peer-id":"729934363faa4a24"}
+{"level":"info","ts":1526587299.1832306,"caller":"rafthttp/peer.go:340","msg":"stopped remote peer","remote-peer-id":"729934363faa4a24"}
+{"level":"warn","ts":1526587299.1837125,"caller":"rafthttp/http.go:424","msg":"failed to find remote peer in cluster","local-member-id":"7339c4e5e833c029","remote-peer-id-stream-handler":"7339c4e5e833c029","remote-peer-id-from":"b548c2511513015","cluster-id":"7dee9ba76d59ed53"}
+{"level":"warn","ts":1526587299.1840093,"caller":"rafthttp/http.go:424","msg":"failed to find remote peer in cluster","local-member-id":"7339c4e5e833c029","remote-peer-id-stream-handler":"7339c4e5e833c029","remote-peer-id-from":"b548c2511513015","cluster-id":"7dee9ba76d59ed53"}
+{"level":"warn","ts":1526587299.1842315,"caller":"rafthttp/http.go:424","msg":"failed to find remote peer in cluster","local-member-id":"7339c4e5e833c029","remote-peer-id-stream-handler":"7339c4e5e833c029","remote-peer-id-from":"729934363faa4a24","cluster-id":"7dee9ba76d59ed53"}
+{"level":"warn","ts":1526587299.1844475,"caller":"rafthttp/http.go:424","msg":"failed to find remote peer in cluster","local-member-id":"7339c4e5e833c029","remote-peer-id-stream-handler":"7339c4e5e833c029","remote-peer-id-from":"729934363faa4a24","cluster-id":"7dee9ba76d59ed53"}
+{"level":"info","ts":1526587299.2056687,"caller":"embed/etcd.go:473","msg":"stopping serving peer traffic","address":"127.0.0.1:2380"}
+{"level":"info","ts":1526587299.205819,"caller":"embed/etcd.go:480","msg":"stopped serving peer traffic","address":"127.0.0.1:2380"}
+{"level":"info","ts":1526587299.2058413,"caller":"embed/etcd.go:289","msg":"closed etcd server","name":"s1","data-dir":"/tmp/etcd/s1","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
 ```

-It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
+#### Step 4: restart the etcd server with same configuration

-```
-$ etcdctl snapshot save backup.db
+Restart the etcd server with same configuration but with the new etcd binary.
+
+```diff
+-etcd-old --name s1 \
+etcd-new --name s1 \
+  --data-dir /tmp/etcd/s1 \
+  --listen-client-urls http://localhost:2379 \
+  --advertise-client-urls http://localhost:2379 \
+  --listen-peer-urls http://localhost:2380 \
+  --initial-advertise-peer-urls http://localhost:2380 \
+  --initial-cluster s1=http://localhost:2380,s2=http://localhost:22380,s3=http://localhost:32380 \
+  --initial-cluster-token tkn \
+  --initial-cluster-state new
 ```

-#### 3. Drop-in etcd v3.5 binary and start the new etcd process
+The new v3.5 etcd will publish its information to the cluster. At this point, cluster still operates as v3.4 protocol, which is the lowest common version.

-The new v3.5 etcd will publish its information to the cluster:
+> `{"level":"info","ts":1526586617.1647713,"caller":"membership/cluster.go:485","msg":"set initial cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","cluster-version":"3.0"}`

-```
-14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
-```
+> `{"level":"info","ts":1526586617.1648536,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.0"}`
+
+> `{"level":"info","ts":1526586617.1649303,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","from":"3.0","from":"3.4"}`
+
+> `{"level":"info","ts":1526586617.1649797,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}`
+
+> `{"level":"info","ts":1526586617.2107732,"caller":"etcdserver/server.go:1770","msg":"published local member to cluster through raft","local-member-id":"7339c4e5e833c029","local-member-attributes":"{Name:s1 ClientURLs:[http://localhost:2379]}","request-path":"/0/members/7339c4e5e833c029/attributes","cluster-id":"7dee9ba76d59ed53","publish-timeout":7}`

 Verify that each member, and then the entire cluster, becomes healthy with the new v3.5 etcd binary:

-```
-$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
-localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
-localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
+```bash
+etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
+<<COMMENT
+localhost:32379 is healthy: successfully committed proposal: took = 2.337471ms
+localhost:22379 is healthy: successfully committed proposal: took = 1.130717ms
+localhost:2379 is healthy: successfully committed proposal: took = 2.124843ms
+COMMENT
 ```

-Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.5:
+Un-upgraded members will log warnings like the following until the entire cluster is upgraded.
+
+This is expected and will cease after all etcd cluster members are upgraded to v3.5:

 ```
-14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.5.0
-14:15:21.073110 W | etcdserver: the local etcd version 3.4.0 is not up-to-date
-14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.5.0
-14:15:21.073157 W | etcdserver: the local etcd version 3.4.0 is not up-to-date
-14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.5.0
+:41.942121 W | etcdserver: member 7339c4e5e833c029 has a higher version 3.5.0
+:45.945154 W | etcdserver: the local etcd version 3.4.0 is not up-to-date
 ```

-#### 4. Repeat step 2 to step 3 for all other members
-
-#### 5. Finish
+#### Step 5: repeat *step 3* and *step 4* for rest of the members

 When all members are upgraded, the cluster will report upgrading to 3.5 successfully:

-```
-14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5
-14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.5
-```
+Member 1:

-```
-$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
-localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
-localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
-localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
+> `{"level":"info","ts":1526586949.0920913,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.5"}`
+> `{"level":"info","ts":1526586949.0921566,"caller":"etcdserver/server.go:2272","msg":"cluster version is updated","cluster-version":"3.5"}`
+
+Member 2:
+
+> `{"level":"info","ts":1526586949.092117,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"729934363faa4a24","from":"3.4","from":"3.5"}`
+> `{"level":"info","ts":1526586949.0923078,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.5"}`
+
+Member 3:
+
+> `{"level":"info","ts":1526586949.0921423,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"b548c2511513015","from":"3.4","from":"3.5"}`
+> `{"level":"info","ts":1526586949.0922918,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.5"}`
+
+
+```bash
+endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
+<<COMMENT
+localhost:2379 is healthy: successfully committed proposal: took = 492.834µs
+localhost:22379 is healthy: successfully committed proposal: took = 1.015025ms
+localhost:32379 is healthy: successfully committed proposal: took = 1.853077ms
+COMMENT
+
+curl http://localhost:2379/version
+<<COMMENT
+{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
+COMMENT
+
+curl http://localhost:22379/version
+<<COMMENT
+{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
+COMMENT
+
+curl http://localhost:32379/version
+<<COMMENT
+{"etcdserver":"3.5.0","etcdcluster":"3.5.0"}
+COMMENT
 ```

 [etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev