Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Frederic Branczyk	2c4877064e	Documentation/etcd-mixin: Use etcd_mvcc_db_total_size_in_bytes metric	2020-04-07 18:14:23 +02:00
Frederic Branczyk	68c5f6066f	Documentation/etcd-mixin: Set unique UID for Grafana dashboard	2020-04-07 18:13:41 +02:00
Clayton Coleman	322c38e169	Documentation/etcd-mixin: Fix etcdHighNumberOfLeaderChanges (#11448 ) The `etcdHighNumberOfLeaderChanges` alert had a copy and paste error when it was converted from docs to mixin in 10244 - we moved from "increase over 15m > 3" to "rate over 15m > 3" which is not the same (rate is measured per second, so it should have been "rate over 15m > (3 / 60 / 15)"). As part of fixing that, we need to capture when prometheus starts or when new etcd clusters are captured with a high leader change - i.e. if you start a new etcd cluster and at the moment prometheus first scrapes you are already at 5 leader changes, we should fire on that transition. This alert is also now more responsive, so if you get a quick burst of 3 leader changes we'll alert within 5m rather than 15m.	2019-12-13 16:00:11 -08:00
Clayton Coleman	465592a718	Documentation/etcd-mixin: Add an alert for down etcd members An etcd member being down is an important failure state - while normal admin operations may cause transient outages to rotate, when any member is down the cluster is operating in a degraded fashion. Add an alert that records when any members are down so that administrators know whether the next failure is fatal. The rule is more complicated than `up{...} == 0` because not all failure modes for etcd may have an `up{...}` entry for each member. For instance, a Kubernetes service in front of an etcd cluster might only have 2 endpoints recorded in `up` because the third pod is evicted by the kubelet - the cluster is degraded but `count(up{...})` would not return the full quorum size. Instead, use network peer send failures as a failure detector and attempt to return the max of down services or failing peers. We may undercount the number of total failures, but we will at least alert that a member is down.	2019-07-30 14:39:50 -04:00
paulfantom	886d30d223	Documentation: provide better user experience with autorefreshing grafana dashboard	2019-05-08 06:58:28 -04:00
Povilas Versockas	eb8e94c4ed	etcd-mixin: Improve etcdHighNumberOfLeaderChanges,etcdHighNumberOfFailedProposals message Currently alert messages state that we detect issue within the last 1 hour, although we check for last 15min and wait for 15min for this alert to keep firing. This fix changes the message to be 30minutes.	2019-02-04 09:28:23 +02:00
Dmitry Verkhoturov	0929080834	doc: exclude 404 error because kubelet generating false positive	2018-12-17 11:57:12 +03:00
Dmitry Verkhoturov	830d064903	doc: convert etcd to lower-case everywhere	2018-12-17 11:57:12 +03:00
Dmitry Verkhoturov	358cc1a8fa	doc: sync prometheus rules with prometheus-operator version (and remove non-etcd specific FdExhaustionClose) https://github.com/coreos/prometheus-operator/blob/master/helm/exporter-kube-etcd/templates/etcd3.rules.yaml sync etcd alert rules with libsonnet Signed-off-by: Dmitry Verkhoturov <paskal.07@gmail.com>	2018-12-17 11:57:12 +03:00
Christian Beneke	c75ba98f81	Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1 instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2 unavailable instances and $value now contains the number of available instances instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.	2018-10-15 19:23:43 +02:00
Joonyoung Park	bd74c10fdb	Documentation/etcd-mixin: fix typo in README.md Promethues -> Prometheus	2018-07-19 19:10:46 +09:00
Joshua Olson	3826107af6	Documentation: removing alerts that were specific to etcd v2	2018-07-18 12:31:46 -04:00
Frederic Branczyk	778bfe1c82	Documentation: Add Grafana dashboard to etcd monitoring mixin	2018-05-30 13:42:36 +02:00
Tom Wilkie	13d4e1509b	Documentation: add Prometheus monitoring-mixin Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-29 09:52:40 -07:00

14 Commits