Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Gyuho Lee	c6e3401255	etcdserver: make raft log configured by top level logger To make it consistent Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-07-29 15:43:19 -07:00
Tobias Schottdorf	b9c051e7a7	raftpb: clean up naming in ConfChange	2019-07-23 10:40:03 +02:00
Tobias Schottdorf	eb4d9b640a	etcdserver: fix createConfChangeEnts It created a sequence of conf changes that could intermittently cause an empty set of voters, which Raft asserts against as of #10889. This fixes TestCtlV2BackupSnapshot and TestCtlV2BackupV3Snapshot, see: https://github.com/etcd-io/etcd/issues/10700#issuecomment-512358126	2019-07-19 17:13:08 +02:00
Tobias Schottdorf	c9491d7861	raft: clean up bootstrap This is the first (maybe not last) step in cleaning up the bootstrap code around StartNode. Initializing a Raft group for the first time is awkward, since a configuration has to be pulled from thin air. The way this is solved today is unclean: The app is supposed to pass peers to StartNode(), we add configuration changes for them to the log, immediately pretend that they are applied, but actually leave them unapplied (to give the app a chance to observe them, though if the app did decide to not apply them things would really go off the rails), and then return control to the app. The app will then process the initial Readys and as a result the configuration will be persisted to disk; restarts of the node then use RestartNode which doesn't take any peers. The code that did this lived awkwardly in two places fairly deep down the callstack, though it was really only necessary in StartNode(). This commit refactors things to make this more obvious: only StartNode does this dance now. In particular, RawNode does not support this at all any more; it expects the app to set up its Storage correctly. Future work may provide helpers to make this "preseeding" of the Storage more user-friendly. It isn't entirely straightforward to do so since the Storage interface doesn't provide the right accessors for this purpose. Briefly speaking, we want to make sure that a non-bootstrapped node can never catch up via the log so that we can implicitly use one of the "skipped" log entries to represent the configuration change into the bootstrap configuration. This is an invasive change that affects all consumers of raft, and it is of lower urgency since the code (post this commit) already encapsulates the complexity sufficiently.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	f9c2d00fb3	raft: extract 'tracker' package Mechanically extract `progressTracker`, `Progress`, and `inflights` to their own package named `tracker`. Add lots of comments in the progress, and take the opportunity to rename and clarify various fields.	2019-06-21 22:15:00 +02:00
Gyuho Lee	34bd797e67	*: revert module import paths Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-05-28 15:39:35 -07:00
shivaramr	9150bf52d6	go modules: Fix module path version to include version number	2019-04-26 15:29:50 -07:00
Gyuho Lee	877f11bed8	etcdserver: improve heartbeat send failures logging Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-04-19 10:58:17 -07:00
James Shubin	368f70a37c	etcdserver: Use panic instead of fatal on no space left error When using the embed package to embed etcd, sometimes the storage prefix being used might be full. In this case, this code path triggers, causing an: `etcdserver: create wal error: no space left on device` error, which causes a fatal. A fatal differs from a panic in that it also calls os.Exit(1). In this situation, the calling program that embeds the etcd server will be abruptly killed, which prevents it from cleaning up safely, and giving a proper error message. Depending on what the calling program is, this can cause corruption and data loss. This patch switches the fatal to a panic. Ideally this would be a regular error which would get propagated upwards to the StartEtcd command, but in the meantime at least this can be caught with recover(). This fixes the most common fatal that I've experienced, but there are surely more that need looking into. If possible, the errors should be threaded down into the code path so that embedding etcd can be more robust. Fixes: https://github.com/etcd-io/etcd/issues/10588	2019-03-27 15:24:33 -04:00
Gyuho Lee	8d1a62e7ef	*: use default log configuration for server Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2019-02-21 10:57:26 -08:00
Gyuho Lee	1399bc69ce	etcdserver: update import paths to "go.etcd.io/etcd" Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2018-08-28 17:47:55 -07:00
Gyuho Lee	a1aade8c1b	etcdserver: rename to "heartbeat_send_failures_total" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-23 13:11:08 -07:00
Gyuho Lee	896a5e4a2b	etcdserver: add "etcd_server_heartbeat_failures_total" {"level":"warn","ts":1527101858.4149103,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.025771662} {"level":"warn","ts":1527101858.4149644,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.034015766} Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-23 13:09:42 -07:00
Gyuho Lee	7940113906	*: move internal "etcdserver/api/rafthttp" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-21 10:31:16 -07:00
Gyuho Lee	9149565cb3	*: move to "etcdserver/api/membership" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-21 10:31:16 -07:00
Gyuho Lee	955fd99bc9	Merge pull request #9746 from gyuho/raft-logger etcdserver: set default Raft logger with zap.Logger	2018-05-18 16:32:48 -07:00
Gyuho Lee	58ae15bd29	etcdserver: set default Raft logger with zap.Logger Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-18 15:38:39 -07:00
Gyuho Lee	49d672ff9b	etcdserver: rename "SnapshotCount", add "SnapshotCatchUpEntries" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-18 14:37:50 -07:00
Gyuho Lee	3ea7a5d0bd	etcdserver: add "LoggerCore" field for Raft logger Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-25 10:16:54 -07:00
Maciej Borsz	46bc966aa7	etcdserver: add is_leader prometheus metric that is 1 on the leader. Before this change, we had now way to find a leader using /metrics endpoint. This commit adds a metric to do that.	2018-04-19 11:47:40 +02:00
Gyuho Lee	d0847f4f25	*: clean up/fix server structured logs Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-18 12:54:43 -07:00
Gyuho Lee	bdbed26f64	etcdserver: support structured logging Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-16 17:36:00 -07:00
Gyuho Lee	041b9069a2	*: configure server logger - Add/Document "logger" to support structured logging. - This makes functional tests run easier, since zap logger provides built-in log redirect to files. - "etcd --logger-option=zap" to enable structured logging. - Current "capnslog" will still be used as "default". - We may switch the default or deprecate "capnslog" in v3.5. - Either way, will clearly be documented. Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-16 17:36:00 -07:00
Gyuho Lee	4f754c1850	etcdserver: clean up with "RaftStatusGetter" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-15 19:30:08 -04:00
Gyuho Lee	9680b8a157	etcdserver: adjust election ticks on restart Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-10 19:09:38 -08:00
Gyuho Lee	edec229e10	etcdserver: make "advanceTicks" method Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-10 18:50:50 -08:00
Gyuho Lee	78918848bd	etcdserver: support Raft Pre-Vote Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-06 09:55:55 -08:00
Gyuho Lee	69357adf33	etcdserver: enable "CheckQuorum" when starting with "ForceNewCluster" We enable "raft.Config.CheckQuorum" by default in other Raft initial starts. So should start with "ForceNewCluster". Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-02-23 00:26:42 -08:00
dvonthenen	25cdf4ed92	*: expose Raft Applied Index through to "etcdctl endpoint status" Fixed based on feedback Fixed spacing Fix gofmt	2018-01-22 07:37:21 -08:00
Gyu-Ho Lee	75110dd839	*: fix naked returns Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-11-10 18:46:15 -08:00
Anthony Romano	dcf52bbfac	etcdserver, embed, integration: don't use pointer for ServerConfig ServerConfig is owned by etdcserver and unshared, so don't pass or store by pointer. Also removes duplicated field 'snapCount'.	2017-06-15 13:02:13 -07:00
fanmin shi	8b7b7222dd	etcdserver: renaming db happens after snapshot persists to wal and snap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen after snapshot is persisted to wal and snap file. this guarantees wal and snap files are newer than db. on server restart, etcd server checks if snap index > db consistent index. if yes, etcd server attempts to load xxx.snap.db where xxx=snap index if there is any and panic other wise. FIXES #7628	2017-05-09 14:00:12 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
Anthony Romano	714b48a4b4	etcdserver: initialize raftNode with constructor raftNode was being initialized in start(), which was causing hangs when trying to stop the etcd server since the stop channel would not be initialized in time for the stop call. Instead, setup non-configurable bits in a constructor. Fixes #7668	2017-04-18 09:33:59 -07:00
Gyu-Ho Lee	04354f32ab	etcdserver: wait apply on conf change Raft entry When apply-layer sees configuration change entry in raft.Ready.CommittedEntries, the server should not proceed until that entry is applied. Otherwise, follower's raft layer advances, possibly election-timeouts, and becomes the leader in single-node cluster, before add-node conf change of other nodes is applied. Fix https://github.com/coreos/etcd/issues/7595. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-13 15:59:24 -07:00
Xiang	7f0733cf46	etcdserver: candidate should wait for applying all configuration changes	2017-03-14 17:20:20 -07:00
Gyu-Ho Lee	3d75395875	*: remove never-unused vars, minor lint fix Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-03-06 14:59:12 -08:00
fanmin shi	2a1bae0c2a	etcdserver: consistent naming in raftReadyHandler	2016-12-29 11:27:16 -08:00
fanmin shi	2faf72f47c	etcdserver: rework update committed index logic	2016-12-27 10:11:40 -08:00
Gyu-Ho Lee	3fd1d951f8	etcdserver: time out when readStateC is blocking Otherwise, it will block forever when the server is overloaded. Fix https://github.com/coreos/etcd/issues/6891.	2016-12-05 15:34:46 -08:00
Gyu-Ho Lee	6ec03d3f7c	etcdserver: move 'EtcdServer.send' to raft.go Clear 'TODO'	2016-10-26 16:26:00 -07:00
Gyu-Ho Lee	e011ea25ca	etcdserver: separate EtcdServer from raftNode	2016-10-07 13:18:39 -07:00
Xiang Li	0f0c048e29	etcdserver: fix early lessor promotion issue If we promote the lessor before finish applying all entries from the last term, we might incorrectly renew the already revoked leases. Here is an example: - Term 1: revoke lease A accepted by raft - Old leader failed, new election happened - Term 2: promote - Term 2: keep alive A succeed. A now has 10 seconds TTL - Term 2: revoke lease A from Term 1 got committed and applied - Term 2: the lease A with 10 seconds TTL is revoked To solve this, the new leader MUST apply all entries from old term before promote its lessor to start accept renew requests.	2016-10-05 14:41:47 -07:00
Xiang Li	e3e3993022	etcdserver: support read index Use read index to achieve l-read.	2016-09-27 13:41:40 +08:00
Anthony Romano	de68818f03	etcdserver: add some failpoints	2016-06-21 14:43:20 -07:00
Xiang Li	9c78cda088	etcdserver: save state before save snapshot	2016-06-15 22:00:33 -07:00
Gyu-Ho Lee	32d766d749	etcdserver: preallocate slice	2016-06-15 13:03:10 -07:00
Gyu-Ho Lee	abb4cd5646	etcdserver: update LICENSE header	2016-05-12 20:49:40 -07:00
Xiang Li	9c103dd0de	*: cancel required leader streams when memeber lost its leader	2016-05-12 19:42:21 -07:00

1 2

94 Commits