The `etcdHighNumberOfLeaderChanges` alert had a copy and paste error when it was converted from docs to mixin in 10244 - we moved from "increase over 15m > 3" to "rate over 15m > 3" which is not the same (rate is measured per second, so it should have been "rate over 15m > (3 / 60 / 15)"). As part of fixing that, we need to capture when prometheus starts or when new etcd clusters are captured with a high leader change - i.e. if you start a new etcd cluster and at the moment prometheus first scrapes you are already at 5 leader changes, we should fire on that transition. This alert is also now more responsive, so if you get a quick burst of 3 leader changes we'll alert within 5m rather than 15m.
Documentation
etcd is a distributed key-value store designed to reliably and quickly preserve and provide access to critical data. It enables reliable distributed coordination through distributed locking, leader elections, and write barriers. An etcd cluster is intended for high availability and permanent data storage and retrieval.
Getting started
New etcd users and developers should get started by downloading and building etcd. After getting etcd, follow this quick demo to see the basics of creating and working with an etcd cluster.
Developing with etcd
The easiest way to get started using etcd as a distributed key-value store is to set up a local cluster.
- Setting up local clusters
- Interacting with etcd
- gRPC etcd core and etcd concurrency API references
- HTTP JSON API through the gRPC gateway
- gRPC naming and discovery
- Client and proxy namespacing
- Embedding etcd
- Experimental features and APIs
- System limits
Operating etcd clusters
Administrators who need a fault-tolerant etcd cluster for either development or production should begin with a cluster on multiple machines.
Setting up etcd
System configuration
Platform guides
Security
Maintenance and troubleshooting
Learning
To learn more about the concepts and internals behind etcd, read the following pages: