Files
etcd/Documentation/etcd-mixin
Dan Mace 2aa5684ada Documentation: Tweak etcdMembersDown to reduce false negatives
Before this change, during a reboot in which etcd recovers quickly (e.g. 1 min),
the etcdMembersDown alert tends to fire even when etcd is fully healthy because
the averaging function can take more than 3 minutes to average back down below
the 0.01 threshold.

This change tries to reduce the possibility of a false negative by considering a
shorter (1 min) failure rate window which tends to average down below the
threshold far more quickly (within 1 min). The `for` clause of the alert should
ensure that the alert still fires if the poor conditions are sustained for an
unreasonable overall time (3 min).
2020-07-13 08:58:21 -04:00
..

Prometheus Monitoring Mixin for etcd

NOTE: This project is alpha stage. Flags, configuration, behaviour and design may change significantly in following releases.

A set of customisable Prometheus alerts for etcd.

Instructions for use are the same as the kubernetes-mixin.

Background

  • For more information about monitoring mixins, see this design doc.

Testing alerts

Make sure to have jsonnet and gojsontoyaml installed.

First compile the mixin to a YAML file, which the promtool will read:

jsonnet -e '(import "mixin.libsonnet").prometheusAlerts' | gojsontoyaml > mixin.yaml

Then run the unit test:

promtool test rules test.yaml