mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Files

Dan Mace 2aa5684ada Documentation: Tweak etcdMembersDown to reduce false negatives

Before this change, during a reboot in which etcd recovers quickly (e.g. 1 min),
the etcdMembersDown alert tends to fire even when etcd is fully healthy because
the averaging function can take more than 3 minutes to average back down below
the 0.01 threshold.

This change tries to reduce the possibility of a false negative by considering a
shorter (1 min) failure rate window which tends to average down below the
threshold far more quickly (within 1 min). The `for` clause of the alert should
ensure that the alert still fires if the poor conditions are sustained for an
unreasonable overall time (3 min).

2020-07-13 08:58:21 -04:00

mixin.libsonnet

Documentation: Tweak etcdMembersDown to reduce false negatives

2020-07-13 08:58:21 -04:00

README.md

Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting

2018-10-15 19:23:43 +02:00

test.yaml

Documentation/etcd-mixin: Raise etcdHighNumberOfLeaderChanges threshold to 4

2020-06-25 15:38:15 -07:00

README.md

Prometheus Monitoring Mixin for etcd

NOTE: This project is alpha stage. Flags, configuration, behaviour and design may change significantly in following releases.

A set of customisable Prometheus alerts for etcd.

Instructions for use are the same as the kubernetes-mixin.

Background

For more information about monitoring mixins, see this design doc.

Testing alerts

Make sure to have jsonnet and gojsontoyaml installed.

First compile the mixin to a YAML file, which the promtool will read:

jsonnet -e '(import "mixin.libsonnet").prometheusAlerts' | gojsontoyaml > mixin.yaml

Then run the unit test:

promtool test rules test.yaml