mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

History

Dan Mace cd3df73944 Documentation: Further improve etcdMembersDown alert

Before this change, the default window for the etcdMembersDown network failure
rate function was recently changed to 1 minute. While this helps detect a etcd
recovery more quickly, it depends on scrape intervals of <= 15s to collect
sufficient data points for the rate function. In practice, an interval of >= 30s
is more typical, which causes the rate function to be less accurate.

This patch increases the window to 2m, which is a compromise between the
original value of 3m and the 1m change introuced with 2aa5684, and should
accomodate more typical scrape intervals.

To offset the window change and to further improve the chance that the alert
will only fire when etcd is truly dead, this patch changes the `for` clause from
3m to 10m. The rationale is as follows:

1. There can be significant variance in durations following a reboot before etcd
is scraped and detected as available.

2. A conservative trigger like 10m seems less likely to produce a false alarm in
the face of such variance.

3. In this alerting situation, if the outage is real, it seems unlikely that an
additional 7 minutes of delay before (for example) paging somebody will make a
significant impact on the overall response.

2020-07-31 09:26:46 -04:00

mixin.libsonnet

Documentation: Further improve etcdMembersDown alert

2020-07-31 09:26:46 -04:00

README.md

Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting

2018-10-15 19:23:43 +02:00

test.yaml

Documentation: Further improve etcdMembersDown alert

2020-07-31 09:26:46 -04:00

README.md

Prometheus Monitoring Mixin for etcd

NOTE: This project is alpha stage. Flags, configuration, behaviour and design may change significantly in following releases.

A set of customisable Prometheus alerts for etcd.

Instructions for use are the same as the kubernetes-mixin.

Background

For more information about monitoring mixins, see this design doc.

Testing alerts

Make sure to have jsonnet and gojsontoyaml installed.

First compile the mixin to a YAML file, which the promtool will read:

jsonnet -e '(import "mixin.libsonnet").prometheusAlerts' | gojsontoyaml > mixin.yaml

Then run the unit test:

promtool test rules test.yaml