mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Go to file

Gyu-Ho Lee 91f6aee4f2 etcdserver: ensure waitForApply sync with applyAll

Problem is:

`Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203)

`Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738)

`StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`.

`StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`.

`StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`.

`rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule`
side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It
is possible that `StepC` happens first, and proceeds without waiting on
apply. And the restarting member comes back as a leader in single-node
cluster, when there is no synchronization between apply-layer and
config-change Raft entry apply. Confirmed with more debugging lines below,
only reproducible with slow CPU VM (~2 vCPU).

```
~:24.005397 I | etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided]
~:24.011136 I | etcdserver: [DEBUG] 29b2d24047a277df waitForApply before
~:24.011194 I | etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs
~:24.011234 I | etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0)
~:24.011268 I | etcdserver: [DEBUG] 29b2d24047a277df waitForApply after
~:24.011348 I | etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df
~:24.011396 I | etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0
~:24.011437 I | etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678
~:24.011477 I | etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df
~:24.011509 I | etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0
~:24.011545 I | etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678
~:24.012500 I | etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before
~:24.013014 I | etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949
~:24.013066 I | etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after
~:24.013113 I | etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger
~:24.013158 I | etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before
~:24.013666 W | etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms)
~:24.013709 W | etcdserver: server is likely overloaded
~:24.013750 W | etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms)
~:24.013775 W | etcdserver: server is likely overloaded
~:24.013950 I | raft: 29b2d24047a277df is starting a new election at term 4
~:24.014012 I | raft: 29b2d24047a277df became candidate at term 5
~:24.014051 I | raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5
~:24.014107 I | raft: 29b2d24047a277df became leader at term 5
~:24.014146 I | raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5
```

I am printing out the number of pending jobs before we call
`sched.WaitFinish(0)`, and there was no pending jobs, so it returned
immediately (before we schedule `applyAll`).

This is the root cause to:

- https://github.com/coreos/etcd/issues/7595
- https://github.com/coreos/etcd/issues/7739
- https://github.com/coreos/etcd/issues/7802

`sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and
`f.finished==0`. Config-change is the first job to apply, so
`f.finished` is 0 in this case.

`f.finished` monotonically increases, so we need `WaitFinish(finished+1)`.
And `finished` must be the one before calling `Schedule`. This is safe
because `Schedule(applyAll)` is the only place adding jobs to `sched`.
Then scheduler waits on the single job of `applyAll`, by getting the
current number of finished jobs before sending `Schedule`.

Or just make it be blocked until `applyAll` routine triggers on the
config-change job. This patch just removes `waitForApply`, and
signal `raftDone` to wait until `applyAll` finishes applying entries.

Confirmed that it fixes the issue, as below:

```
~:43.198354 I | rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader)
~:43.198740 I | etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before
~:43.198836 I | etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs
~:43.200696 I | integration: launched 3169361310155633349 ()
~:43.201784 I | etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b
~:43.201884 I | etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c
~:43.201965 I | etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727
~:43.202070 I | etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b
~:43.202139 I | etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c
~:43.202204 I | etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727
~:43.202444 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before
~:43.204486 I | etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674
~:43.204588 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after
~:43.204703 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger
~:43.204791 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before
~:43.205689 I | etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674
~:43.205783 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after
~:43.205929 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger
~:43.206056 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before
~:43.207353 I | etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674
~:43.207516 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after
~:43.207619 I | etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger
~:43.207710 I | etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b
~:43.207781 I | etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c
~:43.207843 I | etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727
~:43.207951 I | etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1)
~:43.208029 I | rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727
~:43.210339 I | rafthttp: peer 3988bc20c2b2e40c became active
~:43.210435 I | rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader)
~:43.210861 I | rafthttp: started streaming with peer 3988bc20c2b2e40c (writer)
~:43.211732 I | etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after
```

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>

2017-04-25 10:22:27 -07:00

.github

*: eschew you from documentation

2017-03-06 11:40:46 -08:00

alarm

*: update LICENSE header

2016-05-12 20:51:48 -07:00

auth

*: simply ignore ErrAuthNotEnabled in clientv3 if auth is not enabled

2017-04-19 11:27:14 +09:00

client

client/integration: use only digits in unix port

2017-03-21 17:10:59 -07:00

clientv3

clientv3/integration: use new interfaces in lease tests

2017-04-24 23:49:44 -07:00

cmd

vendor: remove testify

2017-04-22 20:29:58 -07:00

compactor

compactor: fix TestPeriodic

2017-03-30 15:00:49 -07:00

contrib

*: eschew you from documentation

2017-03-06 11:40:46 -08:00

discovery

discovery: remove dead token argument from SRVGetCluster

2017-04-12 16:49:44 -07:00

Documentation

release.md: Update for multi arch release

2017-04-21 10:04:41 -07:00

e2e

*: simply ignore ErrAuthNotEnabled in clientv3 if auth is not enabled

2017-04-19 11:27:14 +09:00

embed

*: put gateway stubs into their own packages

2017-04-19 13:09:06 -07:00

error

*: update LICENSE header

2016-05-12 20:51:48 -07:00

etcdctl

etcdcdtl: use new lease interface

2017-04-24 23:49:44 -07:00

etcdmain

etcdmain: trigger embed.Etcd.Close for OS interrupt

2017-04-17 14:07:16 -07:00

etcdserver

etcdserver: ensure waitForApply sync with applyAll

2017-04-25 10:22:27 -07:00

hack

*: eschew you from documentation

2017-03-06 11:40:46 -08:00

integration

integration: close proxy's lease client

2017-04-24 23:49:45 -07:00

lease

*: add swagger and grpc-gateway assets for v3lock and v3election

2017-04-10 15:21:07 -07:00

logos

logos: add SVG and PNG logos

2014-12-18 14:59:06 -08:00

mvcc

*: clear redundant return statement warnings (S1027)

2017-04-21 14:01:00 -07:00

pkg

testutil: add assert functions

2017-04-22 20:29:58 -07:00

proxy

grpcproxy: use new lease interface

2017-04-24 23:49:44 -07:00

raft

raft: Avoid holding unneeded memory in unstable log's entries array

2017-04-18 10:55:16 -04:00

rafthttp

Merge pull request #7687 from heyitsanthony/deny-tls-ipsan

2017-04-13 15:03:25 -07:00

scripts

scripts: remove testify hack in updatedep

2017-04-22 20:29:58 -07:00

snap

*: add swagger and grpc-gateway assets for v3lock and v3election

2017-04-10 15:21:07 -07:00

store

store: replace testify asserts with testutil asserts

2017-04-22 20:29:58 -07:00

tools

benchmark: use new lease interface

2017-04-24 23:49:45 -07:00

version

version: bump to v3.2.0+git

2017-01-13 12:58:15 -08:00

wal

*: add swagger and grpc-gateway assets for v3lock and v3election

2017-04-10 15:21:07 -07:00

.dockerignore

Add .dockerignore to avoid including .git in docker build context

2014-08-15 16:38:29 +08:00

.gitignore

.gitignore: Adding .idea to .gitignore

2017-01-24 22:14:20 -08:00

.godir

create .godir

2014-11-18 15:01:57 -08:00

.header

.header: update to 'etcd Authors'

2016-05-12 20:56:50 -07:00

.travis.yml

travis: bump up to Go 1.8.1

2017-04-17 20:08:27 -07:00

bill-of-materials.json

*: add bill of materials

2017-04-17 14:50:55 -07:00

bill-of-materials.override.json

*: add bill of materials

2017-04-17 14:50:55 -07:00

build

build: remove dir use -r flag

2016-12-13 16:08:50 +08:00

build.bat

vendor: only vendor on emitted binaries

2016-04-05 21:01:16 -07:00

build.ps1

build: add option to enable binaries stripping for windows

2016-10-20 00:52:57 +05:30

CONTRIBUTING.md

*: eschew you from documentation

2017-03-06 11:40:46 -08:00

cover

test, scripts: use /usr/bin/env to find bash

2015-08-08 20:52:53 -06:00

DCO

docs(readme/contrib): clean up README, merge changes from CONTRIBUTING.md and split out DCO

2014-04-04 10:58:34 -07:00

Dockerfile

*: separate Dockerfile for quay build trigger

2016-06-24 12:55:10 -07:00

Dockerfile-release

Dockerfile-release: add nsswitch.conf into image

2017-03-21 13:08:42 +02:00

Dockerfile-release.arm64

build-docker: Updates for multi-arch release

2017-04-21 10:04:41 -07:00

Dockerfile-release.ppc64le

build-docker: Updates for multi-arch release

2017-04-21 10:04:41 -07:00

etcd.conf.yml.sample

embed/etcd.go: make v2 endpoint optional. fixes #7100

2017-01-20 11:49:52 +05:30

glide.lock

vendor: remove testify

2017-04-22 20:29:58 -07:00

glide.yaml

vendor: remove testify

2017-04-22 20:29:58 -07:00

LICENSE

…

main_test.go

*: remove os.Kill from signal.Notify

2017-04-07 10:52:54 -07:00

main.go

*: update LICENSE header

2016-05-12 20:51:48 -07:00

MAINTAINERS

MAINTAINERS: add Fanmin

2017-02-23 14:38:14 -08:00

NEWS

NEWS: update v3.1.6

2017-04-18 10:09:53 -07:00

NOTICE

…

Procfile

Procfile: v3 as default

2016-05-23 11:59:23 -07:00

README.md

*: coreos/rkt -> rkt/rkt

2017-04-11 08:48:48 -07:00

ROADMAP.md

roadmap: update roadmap

2017-01-20 13:50:36 -08:00

test

test: ensure clientv3 has no grpc-gateway dependency

2017-04-19 13:09:23 -07:00

V2Procfile

Procfile: v3 as default

2016-05-23 11:59:23 -07:00

README.md

etcd

Note: The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order to get stable binaries.

the etcd v2 documentation has moved

etcd is a distributed, consistent key-value store for shared configuration and service discovery, with a focus on being:

Simple: well-defined, user-facing API (gRPC)
Secure: automatic TLS with optional client cert authentication
Fast: benchmarked 10,000 writes/sec
Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.

etcd is used in production by many companies, and the development team stands behind it in critical deployment scenarios, where etcd is frequently teamed with applications such as Kubernetes, fleet, locksmith, vulcand, Doorman, and many others. Reliability is further ensured by rigorous testing.

See etcdctl for a simple command line client.

Getting started

Getting etcd

The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, rkt, and Docker. Instructions for using these binaries are on the GitHub releases page.

For those wanting to try the very latest version, build the latest version of etcd from the master branch. This first needs Go installed (version 1.8+ is required). All development occurs on master, including new features and bug fixes. Bug fixes are first targeted at master and subsequently ported to release branches, as described in the branch management guide.

Running etcd

First start a single-member cluster of etcd:

./bin/etcd

This will bring up etcd listening on port 2379 for client communication and on port 2380 for server-to-server communication.

Next, let's set a single key, and then retrieve it:

ETCDCTL_API=3 etcdctl put mykey "this is awesome"
ETCDCTL_API=3 etcdctl get mykey

That's it! etcd is now running and serving client requests. For more

etcd TCP ports

The official etcd ports are 2379 for client requests, and 2380 for peer communication.

Running a local etcd cluster

First install goreman, which manages Procfile-based applications.

Our Procfile script will set up a local example cluster. Start it with:

goreman start

This will bring up 3 etcd members infra1, infra2 and infra3 and etcd proxy proxy, which runs locally and composes a cluster.

Every cluster member and proxy accepts key value reads and key value writes.

Running etcd on Kubernetes

To run an etcd cluster on Kubernetes, try etcd operator.

Next steps

Now it's time to dig into the full etcd API and other guides.

Read the full documentation.
Explore the full gRPC API.
Set up a multi-machine cluster.
Learn the config format, env variables and flags.
Find language bindings and tools.
Use TLS to secure an etcd cluster.
Tune etcd.

Contact

Mailing list: etcd-dev
IRC: #etcd on freenode.org
Planning/Roadmap: milestones, roadmap
Bugs: issues

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

Reporting bugs

See reporting bugs for details about reporting any issues.

License

etcd is under the Apache 2.0 license. See the LICENSE file for details.

Languages

Go 96.5%

Shell 2%

Jsonnet 1.1%

Makefile 0.3%

Procfile 0.1%