Dan Mace 9571325fe8 etcdserver: fix incorrect metrics generated when clients cancel watches
Before this patch, a client which cancels the context for a watch results in the
server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of
a gRPC `Unavailable` metric in association with the client watch cancellation.
The metric looks like this:

    grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}

So, the watch server has misidentified the error as a server error and then
propagates the mistake to metrics, leading to a false indicator that the leader
has been lost. This false signal then leads to false alerting.

The commit 9c103dd0dedfc723cd4f33b6a5e81343d8a6bae7 introduced an interceptor which wraps
watch streams requiring a leader, causing those streams to be actively canceled
when leader loss is detected.

However, the error handling code assumes all stream context cancellations are
from the interceptor. This assumption is broken when the context was canceled
because of a client stream cancelation.

The core challenge is lack of information conveyed via `context.Context` which
is shared by both the send and receive sides of the stream handling and is
subject to cancellation by all paths (including the gRPC library itself). If any
piece of the system cancels the shared context, there's no way for a context
consumer to understand who cancelled the context or why.

To solve the ambiguity of the stream interceptor code specifically, this patch
introduces a custom context struct which the interceptor uses to expose a custom
error through the context when the interceptor decides to actively cancel a
stream. Now the consuming side can more safely assume a generic context
cancellation can be propagated as a cancellation, and the server generated
leader error is preserved and propagated normally without any special inference.

When a client cancels the stream, there remains a race in the error handling
code between the send and receive goroutines whereby the underlying gRPC error
is lost in the case where the send path returns and is handled first, but this
issue can be taken separately as no matter which paths wins, we can detect a
generic cancellation.

This is a replacement of https://github.com/etcd-io/etcd/pull/11375.

Fixes #10289, #9725, #9576, #9166
2020-11-18 17:02:09 -05:00
2020-10-26 13:02:32 +01:00
2014-12-18 14:59:06 -08:00
2020-11-02 19:37:25 -08:00
2020-10-26 13:02:32 +01:00
2016-05-12 20:56:50 -07:00
2020-10-26 13:02:32 +01:00
2020-10-26 13:02:32 +01:00
2020-10-30 10:10:30 +08:00
2020-10-26 13:02:32 +01:00
2020-10-26 13:02:32 +01:00
2019-09-30 15:24:14 -04:00
2020-04-23 10:47:02 -07:00
2020-10-11 20:40:34 +02:00
2020-11-03 07:59:42 -08:00

etcd

Go Report Card Coverage Build Status Travis Build Status Semaphore Docs Godoc Releases LICENSE

Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order to get stable binaries.

etcd Logo

etcd is a distributed reliable key-value store for the most critical data of a distributed system, with a focus on being:

  • Simple: well-defined, user-facing API (gRPC)
  • Secure: automatic TLS with optional client cert authentication
  • Fast: benchmarked 10,000 writes/sec
  • Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.

etcd is used in production by many companies, and the development team stands behind it in critical deployment scenarios, where etcd is frequently teamed with applications such as Kubernetes, locksmith, vulcand, Doorman, and many others. Reliability is further ensured by rigorous testing.

See etcdctl for a simple command line client.

Community meetings

etcd contributors and maintainers have monthly (every four weeks) meetings at 11:00 AM (USA Pacific) on Thursday.

An initial agenda will be posted to the shared Google docs a day before each meeting, and everyone is welcome to suggest additional topics or other agendas.

Time:

Join Hangouts Meet: meet.google.com/umg-nrxn-qvs

Join by phone: +1 405-792-0633 PIN: 299 906#

Getting started

Getting etcd

The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, and Docker on the release page.

For more installation guides, please check out play.etcd.io and operating etcd.

For those wanting to try the very latest version, build the latest version of etcd from the master branch. This first needs Go installed (version 1.13+ is required). All development occurs on master, including new features and bug fixes. Bug fixes are first targeted at master and subsequently ported to release branches, as described in the branch management guide.

Running etcd

First start a single-member cluster of etcd.

If etcd is installed using the pre-built release binaries, run it from the installation location as below:

/tmp/etcd-download-test/etcd

The etcd command can be simply run as such if it is moved to the system path as below:

mv /tmp/etcd-download-test/etcd /usr/local/bin/
etcd

If etcd is built from the master branch, run it as below:

./bin/etcd

This will bring up etcd listening on port 2379 for client communication and on port 2380 for server-to-server communication.

Next, let's set a single key, and then retrieve it:

etcdctl put mykey "this is awesome"
etcdctl get mykey

etcd is now running and serving client requests. For more, please check out:

etcd TCP ports

The official etcd ports are 2379 for client requests, and 2380 for peer communication.

Running a local etcd cluster

First install goreman, which manages Procfile-based applications.

Our Procfile script will set up a local example cluster. Start it with:

goreman start

This will bring up 3 etcd members infra1, infra2 and infra3 and optionally etcd grpc-proxy, which runs locally and composes a cluster.

Every cluster member and proxy accepts key value reads and key value writes.

Follow the steps in Procfile.learner to add a learner node to the cluster. Start the learner node with:

goreman -f ./Procfile.learner start

Next steps

Now it's time to dig into the full etcd API and other guides.

Contact

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

Reporting bugs

See reporting bugs for details about reporting any issues.

Reporting a security vulnerability

See security disclosure and release process for details on how to report a security vulnerability and how the etcd team manages it.

Issue and PR management

See issue triage guidelines for details on how issues are managed.

See PR management for guidelines on how pull requests are managed.

etcd Emeritus Maintainers

These emeritus maintainers dedicated a part of their career to etcd and reviewed code, triaged bugs, and pushed the project forward over a substantial period of time. Their contribution is greatly appreciated.

  • Fanmin Shi
  • Anthony Romano

License

etcd is under the Apache 2.0 license. See the LICENSE file for details.

Description
Distributed reliable key-value store for the most critical data of a distributed system
Readme
Languages
Go 96.5%
Shell 2%
Jsonnet 1.1%
Makefile 0.3%
Procfile 0.1%