Merge pull request #13941 from serathius/recommendation-v3.5.3

Update production recommendation for v3.5.3
This commit is contained in:
Marek Siarkowicz 2022-04-13 21:57:20 +02:00 committed by GitHub
commit ff1569f134
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 3 additions and 13 deletions

View File

@ -5,7 +5,7 @@ Previous change logs can be found at [CHANGELOG-3.4](https://github.com/etcd-io/
<hr>
## v3.5.3 (TBD)
## v3.5.3 (2022-04-13)
### etcd server
- Fix [Provide a better liveness probe for when etcd runs as a Kubernetes pod](https://github.com/etcd-io/etcd/pull/13706)

View File

@ -1,21 +1,11 @@
# Production recommendation
The minimum recommended etcd versions to run in **production** are 3.3.18+, 3.4.2+. Refer to the [versioning policy](https://etcd.io/docs/v3.5/op-guide/versioning/) for more details.
Etcd v3.5.[0-2] versions are no longer recommended for production due to data corruption issue.
The minimum recommended etcd versions to run in **production** are 3.3.18+, 3.4.2+, v3.5.3+. Refer to the [versioning policy](https://etcd.io/docs/v3.5/op-guide/versioning/) for more details.
### v3.5 data corruption issue
Running etcd v3.5.2, v3.5.1 and v3.5.0 under high load can cause a data corruption issue.
If etcd process is killed, occasionally some committed transactions are not reflected on all the members.
Recommendations if you are running v3.4.X:
* **Don't upgrade your etcd clusters to v3.5** until the problem is fixed in the upcoming v3.5.3 release.
* There are no breaking changes in API, meaning **its safe to let v3.5 clients (e.g. the latest Kubernetes releases) talk to v3.4 servers**.
Recommendations if you are running v3.5.0, v3.5.1, or v3.5.2:
* **Enable data corruption check** with `--experimental-initial-corrupt-check` flag. The flag is the only reliable automated way of detecting an inconsistency. This mode has seen significant usage in production and is going to be promoted as default in etcd v3.6.
* **Ensure etcd cluster is not memory pressured or sigkill interrupted**, which could lead to processes being disrupted in the middle of business logic and trigger the issue.
* **Etcd downgrade should be avoided** as they are not officially supported and clusters can be safely recovered as long as data corruption check is enabled.
Recommendation is to upgrade to v3.5.3.
If you have encountered data corruption, please follow instructions on https://etcd.io/docs/v3.5/op-guide/data_corruption/.