A cluster with three members could see three leader changes during a
healthy rolling reboot, and we don't want to alert on that. Growing
to 4 reduces false-alarms for clusters with three or fewer members,
and that's probably most clusters. It will also slightly increase the
risk of false-negatives, but if the cluster is struggling with high
latency, it seems likely that it would quickly pass the new threshold
too.
The hard-coded threshold means that we are still likely to get
false-positives during rolling reboots of clusters with four or more
members. Ideally we'd scale this with the cluster size, or something,
but I'm not sure how to do that. Three members is the minimum size
for high availability, so reducing false positives for that case seems
worth addressing even if we leave larger clusters largely unchanges.
Also manually catch etcd3_alert.rules up to speed, since it seems to
have been passed over by 16fc8a2b4b (Documentation/op-guide:
Re-generate alert rules and dashboard from mixin, 2020-04-07, #11768).
See the issue created here:
https://github.com/etcd-io/etcd/issues/10989#issuecomment-518726038
doc: fix broken links referring to etcd.redhatdocs.io
Adding links to internal Documentation within github.com.
Update runtime-configuration.md
Update runtime-configuration.md
Update CHANGELOG-3.3.md
Remove extra space
Keep the formatting similar to original
Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.
This commit extends the peer validation logic to use Go's
`(*"crypto/x509".Certificate).ValidateHostname` function for name
validation, which allows SANs to be used for peer access control.
In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).
Be more explicit in document and command line usage message that if a
config file is provided, other command line flags and environment
variables will be ignored.
The documentation mentions fio as a tool to benchmark disks to assess
whether they are fast enough for etcd. But doing that is far from trivial,
because fio is very flexible and complex to use, and the user must make sure
that the workload fio generates mirrors the I/O workload of its etcd cluster
closely enough. This commit adds links to a blog post with an example of how
to do that.
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>