mirror of
https://github.com/etcd-io/etcd.git
synced 2024-09-27 06:25:44 +00:00
A cluster with three members could see three leader changes during a healthy rolling reboot, and we don't want to alert on that. Growing to 4 reduces false-alarms for clusters with three or fewer members, and that's probably most clusters. It will also slightly increase the risk of false-negatives, but if the cluster is struggling with high latency, it seems likely that it would quickly pass the new threshold too. The hard-coded threshold means that we are still likely to get false-positives during rolling reboots of clusters with four or more members. Ideally we'd scale this with the cluster size, or something, but I'm not sure how to do that. Three members is the minimum size for high availability, so reducing false positives for that case seems worth addressing even if we leave larger clusters largely unchanges. Also manually catch etcd3_alert.rules up to speed, since it seems to have been passed over by 16fc8a2b4b (Documentation/op-guide: Re-generate alert rules and dashboard from mixin, 2020-04-07, #11768).
The etcd documentation
etcd is a distributed key-value store designed to reliably and quickly preserve and provide access to critical data. It enables reliable distributed coordination through distributed locking, leader elections, and write barriers. An etcd cluster is intended for high availability and permanent data storage and retrieval.
Please note that the files in this directory are source files for the built and rendered documentation that can be viewed at etcd.io/docs.