mirror of
https://github.com/etcd-io/etcd.git
synced 2024-09-27 06:25:44 +00:00
Documentation/upgrades: backport all upgrade guides
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
This commit is contained in:
parent
8eeab582d0
commit
35c6ea7a67
@ -8,9 +8,11 @@ Before [starting an upgrade](#upgrade-procedure), read through the rest of this
|
||||
|
||||
### Upgrade checklists
|
||||
|
||||
**NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
|
||||
|
||||
#### Upgrade requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 3.0, the running cluster must be 2.3 or greater. If it's before 2.3, please upgrade to [2.3](https://github.com/coreos/etcd/releases/tag/v2.3.0) before upgrading to 3.0.
|
||||
To upgrade an existing etcd deployment to 3.0, the running cluster must be 2.3 or greater. If it's before 2.3, please upgrade to [2.3](https://github.com/coreos/etcd/releases/tag/v2.3.8) before upgrading to 3.0.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl cluster-health` command before proceeding.
|
||||
|
||||
@ -52,7 +54,7 @@ member 8211f1d0f64f3269 is healthy: got healthy result from http://localhost:123
|
||||
cluster is healthy
|
||||
|
||||
$ curl http://localhost:2379/version
|
||||
{"etcdserver":"2.3.x","etcdcluster":"2.3.0"}
|
||||
{"etcdserver":"2.3.x","etcdcluster":"2.3.8"}
|
||||
```
|
||||
|
||||
#### 2. Stop the existing etcd process
|
||||
@ -116,4 +118,14 @@ $ ETCDCTL_API=3 etcdctl endpoint health
|
||||
127.0.0.1:22379 is healthy: successfully committed proposal: took = 18.513301ms
|
||||
```
|
||||
|
||||
## Further considerations
|
||||
|
||||
- etcdctl environment variables have been updated. If `ETCDCTL_API=2 etcdctl cluster-health` works properly but `ETCDCTL_API=3 etcdctl endpoints health` responds with `Error: grpc: timed out when dialing`, be sure to use the [new variable names](https://github.com/coreos/etcd/tree/master/etcdctl#etcdctl).
|
||||
|
||||
## Known Issues
|
||||
|
||||
- etcd < v3.1 does not work properly if built with Go > v1.7. See [Issue 6951](https://github.com/coreos/etcd/issues/6951) for additional information.
|
||||
- If an error such as `transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF.` shows up in the etcd server logs, be sure etcd is a pre-built release or built with (etcd v3.1+ & go v1.7+) or (etcd <v3.1 & go v1.6.x).
|
||||
- Adding a v3 node to v2.3 cluster during upgrades is not supported and could trigger panics. See [Issue 7249](https://github.com/coreos/etcd/issues/7429) for additional information. Mixed versions of etcd members are only allowed during v3 migration. Finish upgrades before making any membership changes.
|
||||
|
||||
[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
|
||||
|
@ -8,9 +8,20 @@ Before [starting an upgrade](#upgrade-procedure), read through the rest of this
|
||||
|
||||
### Upgrade checklists
|
||||
|
||||
**NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
|
||||
|
||||
#### Monitoring
|
||||
|
||||
Following metrics from v3.0.x have been deprecated in favor of [go-grpc-prometheus](https://github.com/grpc-ecosystem/go-grpc-prometheus):
|
||||
|
||||
- `etcd_grpc_requests_total`
|
||||
- `etcd_grpc_requests_failed_total`
|
||||
- `etcd_grpc_active_streams`
|
||||
- `etcd_grpc_unary_requests_duration_seconds`
|
||||
|
||||
#### Upgrade requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 3.1, the running cluster must be 3.0 or greater. If it's before 3.0, please upgrade to [3.0](https://github.com/coreos/etcd/releases/tag/v3.0.16) before upgrading to 3.1.
|
||||
To upgrade an existing etcd deployment to 3.1, the running cluster must be 3.0 or greater. If it's before 3.0, please [upgrade to 3.0](upgrade_3_0.md) before upgrading to 3.1.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding.
|
||||
|
||||
|
338
Documentation/upgrades/upgrade_3_2.md
Normal file
338
Documentation/upgrades/upgrade_3_2.md
Normal file
@ -0,0 +1,338 @@
|
||||
## Upgrade etcd from 3.1 to 3.2
|
||||
|
||||
In the general case, upgrading from etcd 3.1 to 3.2 can be a zero-downtime, rolling upgrade:
|
||||
- one by one, stop the etcd v3.1 processes and replace them with etcd v3.2 processes
|
||||
- after running all v3.2 processes, new features in v3.2 are available to the cluster
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
### Upgrade checklists
|
||||
|
||||
**NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
|
||||
|
||||
Highlighted breaking changes in 3.2.
|
||||
|
||||
#### Change in default `snapshot-count` value
|
||||
|
||||
The default value of `--snapshot-count` has [changed from from 10,000 to 100,000](https://github.com/coreos/etcd/pull/7160). Higher snapshot count means it holds Raft entries in memory for longer before discarding old entries. It is a trade-off between less frequent snapshotting and [higher memory usage](https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-371977156). Higher `--snapshot-count` will be manifested with higher memory usage, while retaining more Raft entries helps with the availabilities of slow followers: leader is still able to replicate its logs to followers, rather than forcing followers to rebuild its stores from leader snapshots.
|
||||
|
||||
#### Change in gRPC dependency (>=3.2.10)
|
||||
|
||||
3.2.10 or later now requires [grpc/grpc-go](https://github.com/grpc/grpc-go/releases) `v1.7.5` (<=3.2.9 requires `v1.2.1`).
|
||||
|
||||
##### Deprecate `grpclog.Logger`
|
||||
|
||||
`grpclog.Logger` has been deprecated in favor of [`grpclog.LoggerV2`](https://github.com/grpc/grpc-go/blob/master/grpclog/loggerv2.go). `clientv3.Logger` is now `grpclog.LoggerV2`.
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
clientv3.SetLogger(log.New(os.Stderr, "grpc: ", 0))
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
import "google.golang.org/grpc/grpclog"
|
||||
clientv3.SetLogger(grpclog.NewLoggerV2(os.Stderr, os.Stderr, os.Stderr))
|
||||
|
||||
// log.New above cannot be used (not implement grpclog.LoggerV2 interface)
|
||||
```
|
||||
|
||||
##### Deprecate `grpc.ErrClientConnTimeout`
|
||||
|
||||
Previously, `grpc.ErrClientConnTimeout` error is returned on client dial time-outs. 3.2 instead returns `context.DeadlineExceeded` (see [#8504](https://github.com/coreos/etcd/issues/8504)).
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
// expect dial time-out on ipv4 blackhole
|
||||
_, err := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"http://254.0.0.1:12345"},
|
||||
DialTimeout: 2 * time.Second
|
||||
})
|
||||
if err == grpc.ErrClientConnTimeout {
|
||||
// handle errors
|
||||
}
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
_, err := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"http://254.0.0.1:12345"},
|
||||
DialTimeout: 2 * time.Second
|
||||
})
|
||||
if err == context.DeadlineExceeded {
|
||||
// handle errors
|
||||
}
|
||||
```
|
||||
|
||||
#### Change in maximum request size limits (>=3.2.10)
|
||||
|
||||
3.2.10 and 3.2.11 allow custom request size limits in server side. >=3.2.12 allows custom request size limits for both server and **client side**. In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB.
|
||||
|
||||
Server-side request limits can be configured with `--max-request-bytes` flag:
|
||||
|
||||
```bash
|
||||
# limits request size to 1.5 KiB
|
||||
etcd --max-request-bytes 1536
|
||||
|
||||
# client writes exceeding 1.5 KiB will be rejected
|
||||
etcdctl put foo [LARGE VALUE...]
|
||||
# etcdserver: request is too large
|
||||
```
|
||||
|
||||
Or configure `embed.Config.MaxRequestBytes` field:
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/embed"
|
||||
import "github.com/coreos/etcd/etcdserver/api/v3rpc/rpctypes"
|
||||
|
||||
// limit requests to 5 MiB
|
||||
cfg := embed.NewConfig()
|
||||
cfg.MaxRequestBytes = 5 * 1024 * 1024
|
||||
|
||||
// client writes exceeding 5 MiB will be rejected
|
||||
_, err := cli.Put(ctx, "foo", [LARGE VALUE...])
|
||||
err == rpctypes.ErrRequestTooLarge
|
||||
```
|
||||
|
||||
**If not specified, server-side limit defaults to 1.5 MiB**.
|
||||
|
||||
Client-side request limits must be configured based on server-side limits.
|
||||
|
||||
```bash
|
||||
# limits request size to 1 MiB
|
||||
etcd --max-request-bytes 1048576
|
||||
```
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
|
||||
cli, _ := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"127.0.0.1:2379"},
|
||||
MaxCallSendMsgSize: 2 * 1024 * 1024,
|
||||
MaxCallRecvMsgSize: 3 * 1024 * 1024,
|
||||
})
|
||||
|
||||
|
||||
// client writes exceeding "--max-request-bytes" will be rejected from etcd server
|
||||
_, err := cli.Put(ctx, "foo", strings.Repeat("a", 1*1024*1024+5))
|
||||
err == rpctypes.ErrRequestTooLarge
|
||||
|
||||
|
||||
// client writes exceeding "MaxCallSendMsgSize" will be rejected from client-side
|
||||
_, err = cli.Put(ctx, "foo", strings.Repeat("a", 5*1024*1024))
|
||||
err.Error() == "rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (5242890 vs. 2097152)"
|
||||
|
||||
|
||||
// some writes under limits
|
||||
for i := range []int{0,1,2,3,4} {
|
||||
_, err = cli.Put(ctx, fmt.Sprintf("foo%d", i), strings.Repeat("a", 1*1024*1024-500))
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
// client reads exceeding "MaxCallRecvMsgSize" will be rejected from client-side
|
||||
_, err = cli.Get(ctx, "foo", clientv3.WithPrefix())
|
||||
err.Error() == "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5240509 vs. 3145728)"
|
||||
```
|
||||
|
||||
**If not specified, client-side send limit defaults to 2 MiB (1.5 MiB + gRPC overhead bytes) and receive limit to `math.MaxInt32`**. Please see [clientv3 godoc](https://godoc.org/github.com/coreos/etcd/clientv3#Config) for more detail.
|
||||
|
||||
#### Change in raw gRPC client wrappers
|
||||
|
||||
3.2.12 or later changes the function signatures of `clientv3` gRPC client wrapper. This change was needed to support [custom `grpc.CallOption` on message size limits](https://github.com/coreos/etcd/pull/9047).
|
||||
|
||||
Before and after
|
||||
|
||||
```diff
|
||||
-func NewKVFromKVClient(remote pb.KVClient) KV {
|
||||
+func NewKVFromKVClient(remote pb.KVClient, c *Client) KV {
|
||||
|
||||
-func NewClusterFromClusterClient(remote pb.ClusterClient) Cluster {
|
||||
+func NewClusterFromClusterClient(remote pb.ClusterClient, c *Client) Cluster {
|
||||
|
||||
-func NewLeaseFromLeaseClient(remote pb.LeaseClient, keepAliveTimeout time.Duration) Lease {
|
||||
+func NewLeaseFromLeaseClient(remote pb.LeaseClient, c *Client, keepAliveTimeout time.Duration) Lease {
|
||||
|
||||
-func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient) Maintenance {
|
||||
+func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient, c *Client) Maintenance {
|
||||
|
||||
-func NewWatchFromWatchClient(wc pb.WatchClient) Watcher {
|
||||
+func NewWatchFromWatchClient(wc pb.WatchClient, c *Client) Watcher {
|
||||
```
|
||||
|
||||
#### Change in `clientv3.Lease.TimeToLive` API
|
||||
|
||||
Previously, `clientv3.Lease.TimeToLive` API returned `lease.ErrLeaseNotFound` on non-existent lease ID. 3.2 instead returns TTL=-1 in its response and no error (see [#7305](https://github.com/coreos/etcd/pull/7305)).
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
// when leaseID does not exist
|
||||
resp, err := TimeToLive(ctx, leaseID)
|
||||
resp == nil
|
||||
err == lease.ErrLeaseNotFound
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
// when leaseID does not exist
|
||||
resp, err := TimeToLive(ctx, leaseID)
|
||||
resp.TTL == -1
|
||||
err == nil
|
||||
```
|
||||
|
||||
#### Change in `clientv3.NewFromConfigFile`
|
||||
|
||||
`clientv3.NewFromConfigFile` is moved to `yaml.NewConfig`.
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
clientv3.NewFromConfigFile
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
import clientv3yaml "github.com/coreos/etcd/clientv3/yaml"
|
||||
clientv3yaml.NewConfig
|
||||
```
|
||||
|
||||
#### Change in `--listen-peer-urls` and `--listen-client-urls`
|
||||
|
||||
3.2 now rejects domains names for `--listen-peer-urls` and `--listen-client-urls` (3.1 only prints out warnings), since domain name is invalid for network interface binding. Make sure that those URLs are properly formated as `scheme://IP:port`.
|
||||
|
||||
See [issue #6336](https://github.com/coreos/etcd/issues/6336) for more contexts.
|
||||
|
||||
### Server upgrade checklists
|
||||
|
||||
#### Upgrade requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 3.2, the running cluster must be 3.1 or greater. If it's before 3.1, please [upgrade to 3.1](upgrade_3_1.md) before upgrading to 3.2.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding.
|
||||
|
||||
#### Preparation
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
|
||||
|
||||
#### Mixed versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.2. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
|
||||
|
||||
#### Limitations
|
||||
|
||||
Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation.
|
||||
|
||||
If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
|
||||
|
||||
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure.
|
||||
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v3.2, the cluster will be upgraded to v3.2, and downgrade from this completed state is **not possible**. If any single member is still v3.1, however, the cluster and its operations remains "v3.1", and it is possible from this mixed cluster state to return to using a v3.1 etcd binary on all members.
|
||||
|
||||
Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
|
||||
|
||||
### Upgrade procedure
|
||||
|
||||
This example shows how to upgrade a 3-member v3.1 ectd cluster running on a local machine.
|
||||
|
||||
#### 1. Check upgrade requirements
|
||||
|
||||
Is the cluster healthy and running v3.1.x?
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
|
||||
|
||||
$ curl http://localhost:2379/version
|
||||
{"etcdserver":"3.1.7","etcdcluster":"3.1.0"}
|
||||
```
|
||||
|
||||
#### 2. Stop the existing etcd process
|
||||
|
||||
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
|
||||
|
||||
```
|
||||
2017-04-27 14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
|
||||
2017-04-27 14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
|
||||
2017-04-27 14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
|
||||
2017-04-27 14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
|
||||
2017-04-27 14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
|
||||
2017-04-27 14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
|
||||
2017-04-27 14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
|
||||
2017-04-27 14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
|
||||
2017-04-27 14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
|
||||
2017-04-27 14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
|
||||
2017-04-27 14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
|
||||
2017-04-27 14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
|
||||
2017-04-27 14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
|
||||
2017-04-27 14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
|
||||
2017-04-27 14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
|
||||
```
|
||||
|
||||
It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
|
||||
|
||||
```
|
||||
$ etcdctl snapshot save backup.db
|
||||
```
|
||||
|
||||
#### 3. Drop-in etcd v3.2 binary and start the new etcd process
|
||||
|
||||
The new v3.2 etcd will publish its information to the cluster:
|
||||
|
||||
```
|
||||
2017-04-27 14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
|
||||
```
|
||||
|
||||
Verify that each member, and then the entire cluster, becomes healthy with the new v3.2 etcd binary:
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
|
||||
```
|
||||
|
||||
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.2:
|
||||
|
||||
```
|
||||
2017-04-27 14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0
|
||||
2017-04-27 14:15:21.073110 W | etcdserver: the local etcd version 3.1.7 is not up-to-date
|
||||
2017-04-27 14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.2.0
|
||||
2017-04-27 14:15:21.073157 W | etcdserver: the local etcd version 3.1.7 is not up-to-date
|
||||
2017-04-27 14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0
|
||||
```
|
||||
|
||||
#### 4. Repeat step 2 to step 3 for all other members
|
||||
|
||||
#### 5. Finish
|
||||
|
||||
When all members are upgraded, the cluster will report upgrading to 3.2 successfully:
|
||||
|
||||
```
|
||||
2017-04-27 14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.1 to 3.2
|
||||
2017-04-27 14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.2
|
||||
```
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
|
||||
```
|
||||
|
||||
[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
|
476
Documentation/upgrades/upgrade_3_3.md
Normal file
476
Documentation/upgrades/upgrade_3_3.md
Normal file
@ -0,0 +1,476 @@
|
||||
## Upgrade etcd from 3.2 to 3.3
|
||||
|
||||
In the general case, upgrading from etcd 3.2 to 3.3 can be a zero-downtime, rolling upgrade:
|
||||
- one by one, stop the etcd v3.2 processes and replace them with etcd v3.3 processes
|
||||
- after running all v3.3 processes, new features in v3.3 are available to the cluster
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
### Upgrade checklists
|
||||
|
||||
**NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
|
||||
|
||||
Highlighted breaking changes in 3.3.
|
||||
|
||||
#### Change in `etcdserver.EtcdServer` struct
|
||||
|
||||
`etcdserver.EtcdServer` has changed the type of its member field `*etcdserver.ServerConfig` to `etcdserver.ServerConfig`. And `etcdserver.NewServer` now takes `etcdserver.ServerConfig`, instead of `*etcdserver.ServerConfig`.
|
||||
|
||||
Before and after (e.g. [k8s.io/kubernetes/test/e2e_node/services/etcd.go](https://github.com/kubernetes/kubernetes/blob/release-1.8/test/e2e_node/services/etcd.go#L50-L55))
|
||||
|
||||
```diff
|
||||
import "github.com/coreos/etcd/etcdserver"
|
||||
|
||||
type EtcdServer struct {
|
||||
*etcdserver.EtcdServer
|
||||
- config *etcdserver.ServerConfig
|
||||
+ config etcdserver.ServerConfig
|
||||
}
|
||||
|
||||
func NewEtcd(dataDir string) *EtcdServer {
|
||||
- config := &etcdserver.ServerConfig{
|
||||
+ config := etcdserver.ServerConfig{
|
||||
DataDir: dataDir,
|
||||
...
|
||||
}
|
||||
return &EtcdServer{config: config}
|
||||
}
|
||||
|
||||
func (e *EtcdServer) Start() error {
|
||||
var err error
|
||||
e.EtcdServer, err = etcdserver.NewServer(e.config)
|
||||
...
|
||||
```
|
||||
|
||||
#### Change in `embed.EtcdServer` struct
|
||||
|
||||
Field `LogOutput` is added to `embed.Config`:
|
||||
|
||||
```diff
|
||||
package embed
|
||||
|
||||
type Config struct {
|
||||
Debug bool `json:"debug"`
|
||||
LogPkgLevels string `json:"log-package-levels"`
|
||||
+ LogOutput string `json:"log-output"`
|
||||
...
|
||||
```
|
||||
|
||||
Before gRPC server warnings were logged in etcdserver.
|
||||
|
||||
```
|
||||
WARNING: 2017/11/02 11:35:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {localhost:2379 <nil>}
|
||||
WARNING: 2017/11/02 11:35:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {localhost:2379 <nil>}
|
||||
```
|
||||
|
||||
From v3.3, gRPC server logs are disabled by default.
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/embed"
|
||||
|
||||
cfg := &embed.Config{Debug: false}
|
||||
cfg.SetupLogging()
|
||||
```
|
||||
|
||||
Set `embed.Config.Debug` field to `true` to enable gRPC server logs.
|
||||
|
||||
#### Change in `/health` endpoint response
|
||||
|
||||
Previously, `[endpoint]:[client-port]/health` returned manually marshaled JSON value. 3.3 now defines [`etcdhttp.Health`](https://godoc.org/github.com/coreos/etcd/etcdserver/api/etcdhttp#Health) struct.
|
||||
|
||||
Note that in v3.3.0-rc.0, v3.3.0-rc.1, and v3.3.0-rc.2, `etcdhttp.Health` has boolean type `"health"` and `"errors"` fields. For backward compatibilities, we reverted `"health"` field to `string` type and removed `"errors"` field. Further health information will be provided in separate APIs.
|
||||
|
||||
```bash
|
||||
$ curl http://localhost:2379/health
|
||||
{"health":"true"}
|
||||
```
|
||||
|
||||
#### Change in gRPC gateway HTTP endpoints (replaced `/v3alpha` with `/v3beta`)
|
||||
|
||||
Before
|
||||
|
||||
```bash
|
||||
curl -L http://localhost:2379/v3alpha/kv/put \
|
||||
-X POST -d '{"key": "Zm9v", "value": "YmFy"}'
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```bash
|
||||
curl -L http://localhost:2379/v3beta/kv/put \
|
||||
-X POST -d '{"key": "Zm9v", "value": "YmFy"}'
|
||||
```
|
||||
|
||||
Requests to `/v3alpha` endpoints will redirect to `/v3beta`, and `/v3alpha` will be removed in 3.4 release.
|
||||
|
||||
#### Change in maximum request size limits
|
||||
|
||||
3.3 now allows custom request size limits for both server and **client side**. In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB.
|
||||
|
||||
Server-side request limits can be configured with `--max-request-bytes` flag:
|
||||
|
||||
```bash
|
||||
# limits request size to 1.5 KiB
|
||||
etcd --max-request-bytes 1536
|
||||
|
||||
# client writes exceeding 1.5 KiB will be rejected
|
||||
etcdctl put foo [LARGE VALUE...]
|
||||
# etcdserver: request is too large
|
||||
```
|
||||
|
||||
Or configure `embed.Config.MaxRequestBytes` field:
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/embed"
|
||||
import "github.com/coreos/etcd/etcdserver/api/v3rpc/rpctypes"
|
||||
|
||||
// limit requests to 5 MiB
|
||||
cfg := embed.NewConfig()
|
||||
cfg.MaxRequestBytes = 5 * 1024 * 1024
|
||||
|
||||
// client writes exceeding 5 MiB will be rejected
|
||||
_, err := cli.Put(ctx, "foo", [LARGE VALUE...])
|
||||
err == rpctypes.ErrRequestTooLarge
|
||||
```
|
||||
|
||||
**If not specified, server-side limit defaults to 1.5 MiB**.
|
||||
|
||||
Client-side request limits must be configured based on server-side limits.
|
||||
|
||||
```bash
|
||||
# limits request size to 1 MiB
|
||||
etcd --max-request-bytes 1048576
|
||||
```
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
|
||||
cli, _ := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"127.0.0.1:2379"},
|
||||
MaxCallSendMsgSize: 2 * 1024 * 1024,
|
||||
MaxCallRecvMsgSize: 3 * 1024 * 1024,
|
||||
})
|
||||
|
||||
|
||||
// client writes exceeding "--max-request-bytes" will be rejected from etcd server
|
||||
_, err := cli.Put(ctx, "foo", strings.Repeat("a", 1*1024*1024+5))
|
||||
err == rpctypes.ErrRequestTooLarge
|
||||
|
||||
|
||||
// client writes exceeding "MaxCallSendMsgSize" will be rejected from client-side
|
||||
_, err = cli.Put(ctx, "foo", strings.Repeat("a", 5*1024*1024))
|
||||
err.Error() == "rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (5242890 vs. 2097152)"
|
||||
|
||||
|
||||
// some writes under limits
|
||||
for i := range []int{0,1,2,3,4} {
|
||||
_, err = cli.Put(ctx, fmt.Sprintf("foo%d", i), strings.Repeat("a", 1*1024*1024-500))
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
// client reads exceeding "MaxCallRecvMsgSize" will be rejected from client-side
|
||||
_, err = cli.Get(ctx, "foo", clientv3.WithPrefix())
|
||||
err.Error() == "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5240509 vs. 3145728)"
|
||||
```
|
||||
|
||||
**If not specified, client-side send limit defaults to 2 MiB (1.5 MiB + gRPC overhead bytes) and receive limit to `math.MaxInt32`**. Please see [clientv3 godoc](https://godoc.org/github.com/coreos/etcd/clientv3#Config) for more detail.
|
||||
|
||||
#### Change in raw gRPC client wrappers
|
||||
|
||||
3.3 changes the function signatures of `clientv3` gRPC client wrapper. This change was needed to support [custom `grpc.CallOption` on message size limits](https://github.com/coreos/etcd/pull/9047).
|
||||
|
||||
Before and after
|
||||
|
||||
```diff
|
||||
-func NewKVFromKVClient(remote pb.KVClient) KV {
|
||||
+func NewKVFromKVClient(remote pb.KVClient, c *Client) KV {
|
||||
|
||||
-func NewClusterFromClusterClient(remote pb.ClusterClient) Cluster {
|
||||
+func NewClusterFromClusterClient(remote pb.ClusterClient, c *Client) Cluster {
|
||||
|
||||
-func NewLeaseFromLeaseClient(remote pb.LeaseClient, keepAliveTimeout time.Duration) Lease {
|
||||
+func NewLeaseFromLeaseClient(remote pb.LeaseClient, c *Client, keepAliveTimeout time.Duration) Lease {
|
||||
|
||||
-func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient) Maintenance {
|
||||
+func NewMaintenanceFromMaintenanceClient(remote pb.MaintenanceClient, c *Client) Maintenance {
|
||||
|
||||
-func NewWatchFromWatchClient(wc pb.WatchClient) Watcher {
|
||||
+func NewWatchFromWatchClient(wc pb.WatchClient, c *Client) Watcher {
|
||||
```
|
||||
|
||||
#### Change in clientv3 `Snapshot` API error type
|
||||
|
||||
Previously, clientv3 `Snapshot` API returned raw [`grpc/*status.statusError`] type error. v3.3 now translates those errors to corresponding public error types, to be consistent with other APIs.
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
import "context"
|
||||
|
||||
// reading snapshot with canceled context should error out
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
rc, _ := cli.Snapshot(ctx)
|
||||
cancel()
|
||||
_, err := io.Copy(f, rc)
|
||||
err.Error() == "rpc error: code = Canceled desc = context canceled"
|
||||
|
||||
// reading snapshot with deadline exceeded should error out
|
||||
ctx, cancel = context.WithTimeout(context.Background(), time.Second)
|
||||
defer cancel()
|
||||
rc, _ = cli.Snapshot(ctx)
|
||||
time.Sleep(2 * time.Second)
|
||||
_, err = io.Copy(f, rc)
|
||||
err.Error() == "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
import "context"
|
||||
|
||||
// reading snapshot with canceled context should error out
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
rc, _ := cli.Snapshot(ctx)
|
||||
cancel()
|
||||
_, err := io.Copy(f, rc)
|
||||
err == context.Canceled
|
||||
|
||||
// reading snapshot with deadline exceeded should error out
|
||||
ctx, cancel = context.WithTimeout(context.Background(), time.Second)
|
||||
defer cancel()
|
||||
rc, _ = cli.Snapshot(ctx)
|
||||
time.Sleep(2 * time.Second)
|
||||
_, err = io.Copy(f, rc)
|
||||
err == context.DeadlineExceeded
|
||||
```
|
||||
|
||||
#### Change in `etcdctl lease timetolive` command output
|
||||
|
||||
Previously, `lease timetolive LEASE_ID` command on expired lease prints `-1s` for remaining seconds. 3.3 now outputs clearer messages.
|
||||
|
||||
Before
|
||||
|
||||
|
||||
```bash
|
||||
lease 2d8257079fa1bc0c granted with TTL(0s), remaining(-1s)
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```bash
|
||||
lease 2d8257079fa1bc0c already expired
|
||||
```
|
||||
|
||||
#### Change in `golang.org/x/net/context` imports
|
||||
|
||||
`clientv3` has deprecated `golang.org/x/net/context`. If a project vendors `golang.org/x/net/context` in other code (e.g. etcd generated protocol buffer code) and imports `github.com/coreos/etcd/clientv3`, it requires Go 1.9+ to compile.
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
import "golang.org/x/net/context"
|
||||
cli.Put(context.Background(), "f", "v")
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
import "context"
|
||||
cli.Put(context.Background(), "f", "v")
|
||||
```
|
||||
|
||||
#### Change in gRPC dependency
|
||||
|
||||
3.3 now requires [grpc/grpc-go](https://github.com/grpc/grpc-go/releases) `v1.7.5`.
|
||||
|
||||
##### Deprecate `grpclog.Logger`
|
||||
|
||||
`grpclog.Logger` has been deprecated in favor of [`grpclog.LoggerV2`](https://github.com/grpc/grpc-go/blob/master/grpclog/loggerv2.go). `clientv3.Logger` is now `grpclog.LoggerV2`.
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
clientv3.SetLogger(log.New(os.Stderr, "grpc: ", 0))
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
import "github.com/coreos/etcd/clientv3"
|
||||
import "google.golang.org/grpc/grpclog"
|
||||
clientv3.SetLogger(grpclog.NewLoggerV2(os.Stderr, os.Stderr, os.Stderr))
|
||||
|
||||
// log.New above cannot be used (not implement grpclog.LoggerV2 interface)
|
||||
```
|
||||
|
||||
##### Deprecate `grpc.ErrClientConnTimeout`
|
||||
|
||||
Previously, `grpc.ErrClientConnTimeout` error is returned on client dial time-outs. 3.3 instead returns `context.DeadlineExceeded` (see [#8504](https://github.com/coreos/etcd/issues/8504)).
|
||||
|
||||
Before
|
||||
|
||||
```go
|
||||
// expect dial time-out on ipv4 blackhole
|
||||
_, err := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"http://254.0.0.1:12345"},
|
||||
DialTimeout: 2 * time.Second
|
||||
})
|
||||
if err == grpc.ErrClientConnTimeout {
|
||||
// handle errors
|
||||
}
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```go
|
||||
_, err := clientv3.New(clientv3.Config{
|
||||
Endpoints: []string{"http://254.0.0.1:12345"},
|
||||
DialTimeout: 2 * time.Second
|
||||
})
|
||||
if err == context.DeadlineExceeded {
|
||||
// handle errors
|
||||
}
|
||||
```
|
||||
|
||||
#### Change in official container registry
|
||||
|
||||
etcd now uses [`gcr.io/etcd-development/etcd`](https://gcr.io/etcd-development/etcd) as a primary container registry, and [`quay.io/coreos/etcd`](https://quay.io/coreos/etcd) as secondary.
|
||||
|
||||
Before
|
||||
|
||||
```bash
|
||||
docker pull quay.io/coreos/etcd:v3.2.5
|
||||
```
|
||||
|
||||
After
|
||||
|
||||
```bash
|
||||
docker pull gcr.io/etcd-development/etcd:v3.3.0
|
||||
```
|
||||
|
||||
### Server upgrade checklists
|
||||
|
||||
#### Upgrade requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 3.3, the running cluster must be 3.2 or greater. If it's before 3.2, please [upgrade to 3.2](upgrade_3_2.md) before upgrading to 3.3.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding.
|
||||
|
||||
#### Preparation
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
|
||||
|
||||
#### Mixed versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.3. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
|
||||
|
||||
#### Limitations
|
||||
|
||||
Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation.
|
||||
|
||||
If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
|
||||
|
||||
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure.
|
||||
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v3.3, the cluster will be upgraded to v3.3, and downgrade from this completed state is **not possible**. If any single member is still v3.2, however, the cluster and its operations remains "v3.2", and it is possible from this mixed cluster state to return to using a v3.2 etcd binary on all members.
|
||||
|
||||
Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
|
||||
|
||||
### Upgrade procedure
|
||||
|
||||
This example shows how to upgrade a 3-member v3.2 ectd cluster running on a local machine.
|
||||
|
||||
#### 1. Check upgrade requirements
|
||||
|
||||
Is the cluster healthy and running v3.2.x?
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
|
||||
|
||||
$ curl http://localhost:2379/version
|
||||
{"etcdserver":"3.2.7","etcdcluster":"3.2.0"}
|
||||
```
|
||||
|
||||
#### 2. Stop the existing etcd process
|
||||
|
||||
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
|
||||
|
||||
```
|
||||
14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
|
||||
14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
|
||||
14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
|
||||
14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
|
||||
14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
|
||||
14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
|
||||
14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
|
||||
14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
|
||||
14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
|
||||
14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
|
||||
14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
|
||||
14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
|
||||
14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
|
||||
14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
|
||||
14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
|
||||
```
|
||||
|
||||
It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
|
||||
|
||||
```
|
||||
$ etcdctl snapshot save backup.db
|
||||
```
|
||||
|
||||
#### 3. Drop-in etcd v3.3 binary and start the new etcd process
|
||||
|
||||
The new v3.3 etcd will publish its information to the cluster:
|
||||
|
||||
```
|
||||
14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
|
||||
```
|
||||
|
||||
Verify that each member, and then the entire cluster, becomes healthy with the new v3.3 etcd binary:
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
|
||||
```
|
||||
|
||||
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.3:
|
||||
|
||||
```
|
||||
14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.3.0
|
||||
14:15:21.073110 W | etcdserver: the local etcd version 3.2.7 is not up-to-date
|
||||
14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.3.0
|
||||
14:15:21.073157 W | etcdserver: the local etcd version 3.2.7 is not up-to-date
|
||||
14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.3.0
|
||||
```
|
||||
|
||||
#### 4. Repeat step 2 to step 3 for all other members
|
||||
|
||||
#### 5. Finish
|
||||
|
||||
When all members are upgraded, the cluster will report upgrading to 3.3 successfully:
|
||||
|
||||
```
|
||||
14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.2 to 3.3
|
||||
14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.3
|
||||
```
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
|
||||
```
|
||||
|
||||
[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
|
171
Documentation/upgrades/upgrade_3_4.md
Normal file
171
Documentation/upgrades/upgrade_3_4.md
Normal file
@ -0,0 +1,171 @@
|
||||
## Upgrade etcd from 3.3 to 3.4
|
||||
|
||||
In the general case, upgrading from etcd 3.3 to 3.4 can be a zero-downtime, rolling upgrade:
|
||||
- one by one, stop the etcd v3.3 processes and replace them with etcd v3.4 processes
|
||||
- after running all v3.4 processes, new features in v3.4 are available to the cluster
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
### Upgrade checklists
|
||||
|
||||
**NOTE:** When [migrating from v2 with no v3 data](https://github.com/coreos/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data.
|
||||
|
||||
Highlighted breaking changes in 3.4.
|
||||
|
||||
#### Change in `etcd` flags
|
||||
|
||||
`--ca-file` and `--peer-ca-file` flags are deprecated; they have been deprecated since v2.1.
|
||||
|
||||
```diff
|
||||
-etcd --ca-file ca-client.crt
|
||||
+etcd --trusted-ca-file ca-client.crt
|
||||
```
|
||||
|
||||
```diff
|
||||
-etcd --peer-ca-file ca-peer.crt
|
||||
+etcd --peer-trusted-ca-file ca-peer.crt
|
||||
```
|
||||
|
||||
#### Change in ``pkg/transport`
|
||||
|
||||
Deprecated `pkg/transport.TLSInfo.CAFile` field.
|
||||
|
||||
```diff
|
||||
import "github.com/coreos/etcd/pkg/transport"
|
||||
|
||||
tlsInfo := transport.TLSInfo{
|
||||
CertFile: "/tmp/test-certs/test.pem",
|
||||
KeyFile: "/tmp/test-certs/test-key.pem",
|
||||
- CAFile: "/tmp/test-certs/trusted-ca.pem",
|
||||
+ TrustedCAFile: "/tmp/test-certs/trusted-ca.pem",
|
||||
}
|
||||
tlsConfig, err := tlsInfo.ClientConfig()
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
```
|
||||
|
||||
### Server upgrade checklists
|
||||
|
||||
#### Upgrade requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 3.4, the running cluster must be 3.3 or greater. If it's before 3.3, please [upgrade to 3.3](upgrade_3_3.md) before upgrading to 3.4.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding.
|
||||
|
||||
#### Preparation
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
|
||||
|
||||
#### Mixed versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.4. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
|
||||
|
||||
#### Limitations
|
||||
|
||||
Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation.
|
||||
|
||||
If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
|
||||
|
||||
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure.
|
||||
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v3.4, the cluster will be upgraded to v3.4, and downgrade from this completed state is **not possible**. If any single member is still v3.3, however, the cluster and its operations remains "v3.3", and it is possible from this mixed cluster state to return to using a v3.3 etcd binary on all members.
|
||||
|
||||
Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
|
||||
|
||||
### Upgrade procedure
|
||||
|
||||
This example shows how to upgrade a 3-member v3.3 ectd cluster running on a local machine.
|
||||
|
||||
#### 1. Check upgrade requirements
|
||||
|
||||
Is the cluster healthy and running v3.3.x?
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
|
||||
|
||||
$ curl http://localhost:2379/version
|
||||
{"etcdserver":"3.3.0","etcdcluster":"3.3.0"}
|
||||
```
|
||||
|
||||
#### 2. Stop the existing etcd process
|
||||
|
||||
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
|
||||
|
||||
```
|
||||
14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
|
||||
14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
|
||||
14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
|
||||
14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
|
||||
14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
|
||||
14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
|
||||
14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
|
||||
14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
|
||||
14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
|
||||
14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
|
||||
14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
|
||||
14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
|
||||
14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
|
||||
14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
|
||||
14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
|
||||
```
|
||||
|
||||
It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
|
||||
|
||||
```
|
||||
$ etcdctl snapshot save backup.db
|
||||
```
|
||||
|
||||
#### 3. Drop-in etcd v3.4 binary and start the new etcd process
|
||||
|
||||
The new v3.4 etcd will publish its information to the cluster:
|
||||
|
||||
```
|
||||
14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
|
||||
```
|
||||
|
||||
Verify that each member, and then the entire cluster, becomes healthy with the new v3.4 etcd binary:
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
|
||||
```
|
||||
|
||||
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.4:
|
||||
|
||||
```
|
||||
14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.4.0
|
||||
14:15:21.073110 W | etcdserver: the local etcd version 3.3.0 is not up-to-date
|
||||
14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.4.0
|
||||
14:15:21.073157 W | etcdserver: the local etcd version 3.3.0 is not up-to-date
|
||||
14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.4.0
|
||||
```
|
||||
|
||||
#### 4. Repeat step 2 to step 3 for all other members
|
||||
|
||||
#### 5. Finish
|
||||
|
||||
When all members are upgraded, the cluster will report upgrading to 3.4 successfully:
|
||||
|
||||
```
|
||||
14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.3 to 3.4
|
||||
14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.4
|
||||
```
|
||||
|
||||
```
|
||||
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
|
||||
localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
|
||||
localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
|
||||
localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms
|
||||
```
|
||||
|
||||
[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev
|
19
Documentation/upgrades/upgrading-etcd.md
Normal file
19
Documentation/upgrades/upgrading-etcd.md
Normal file
@ -0,0 +1,19 @@
|
||||
# Upgrading etcd clusters and applications
|
||||
|
||||
This section contains documents specific to upgrading etcd clusters and applications.
|
||||
|
||||
## Moving from etcd API v2 to API v3
|
||||
* [Migrate applications from using API v2 to API v3][migrate-apps]
|
||||
|
||||
## Upgrading an etcd v3.x cluster
|
||||
* [Upgrade etcd from 3.0 to 3.1][upgrade-3-1]
|
||||
* [Upgrade etcd from 3.1 to 3.2][upgrade-3-2]
|
||||
|
||||
## Upgrading from etcd v2.3
|
||||
* [Upgrade a v2.3 cluster to v3.0][upgrade-cluster]
|
||||
|
||||
|
||||
[migrate-apps]: ../op-guide/v2-migration.md
|
||||
[upgrade-cluster]: upgrade_3_0.md
|
||||
[upgrade-3-1]: upgrade_3_1.md
|
||||
[upgrade-3-2]: upgrade_3_2.md
|
Loading…
x
Reference in New Issue
Block a user