172 Commits

Author SHA1 Message Date
Anthony Romano
20461ab11a *: fix many typos 2016-01-31 21:42:39 -08:00
Gyu-Ho Lee
71c2a9bb3c *: fix minor typos, comments 2016-01-30 18:15:56 -08:00
Ben Darnell
22925a1d2f raft: Remove redundant raft.Commit field.
Keeping this field in sync with `raft.raftLog.committed` was
error-prone, so instead we synthesize the `HardState` on demand.

Fixes #4278.
2016-01-26 15:18:55 -05:00
Ben Darnell
46bb2582fe raft: Call maybeCommit after removing a node.
removeNode reduces the required quorum size, so some pending entries may
be able to commit after it is applied.

Discovered in cockroachdb/cockroach#3642
2016-01-20 11:05:48 -04:00
Hitoshi Mitake
9b2da76796 raft: remove go vet compliants 2015-12-16 13:29:23 +09:00
Xiang Li
cc6d98bf89 etcdserver: only send snapshot when the member is active 2015-12-10 16:15:26 -08:00
Xiang Li
a8cc1570d0 raft: support quorum check when raft is leader
If quorum check fails, the leader will step down to follower.
2015-11-24 09:36:37 -08:00
Ben Darnell
fbeb58d265 raft: no-op instead of panic for Campaigning while leader
We need to be able to force an election (on one node) after creating a
new group (cockroachdb/cockroach#1384), but it is difficult to ensure
that our call to Campaign does not race with an election that may be
started by raft itself. A redundant call to Campaign should be a no-op
instead of a panic. (But the panic in becomeCandidate remains, because
we don't want to update the term or change the committed index in this
case)
2015-11-16 21:44:14 -05:00
Dmitry Smirnov
b2f4a5f587 *: fix spelling issues (codespell).
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2015-09-11 10:22:29 +10:00
Xiang Li
6b23a8131f *: test gofmt with -s and fix reported issues 2015-08-21 18:52:16 -07:00
es-chow
cc362ccdad raft: set logger to raft so log context such as multinode groupID can be logged 2015-08-12 22:56:00 +08:00
Xiang Li
b4022899eb raft: fix panic in send app
sendApp accesses the storage several times. Perviously, we
assume that the storage will not be modified during the read
opeartions. The assumption is not true since the storage can
be compacted between the read operations. If a compaction
causes a read entries error, we should not painc. Instead, we
can simply retry the sendApp logic until succeed.
2015-06-15 14:23:33 -07:00
Ben Darnell
c9d507df11 raft: Use raft.Config in MultiNode. 2015-03-24 15:37:13 -04:00
Xiang Li
b3fb052ad4 raft: make peers a prviate field in raft.Config 2015-03-24 11:10:07 -07:00
Xiang Li
d9b5b56c82 raft: make raft configurable 2015-03-23 09:55:19 -07:00
Xiang Li
4a64373225 raft: add flow control for progress
Each progress has a inflighs sliding window. When the progress
is in replicate state, inflights will control the sending speed
of the leader.

The leader can have at most maxInflight number of inflight
messages for each replicate progress. Receving a appResp moves
forward the sliding window. Heartbeat response free one
slot if the window is full.
2015-03-20 20:04:33 -07:00
Xiang Li
7571b2cde2 raft: limit the size of msgApp
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
2015-03-18 15:59:30 -07:00
Yicheng Qin
67194c0b22 raft: introduce progress states 2015-03-18 08:16:32 -07:00
Yicheng Qin
be0bf2a2bd raft: fall back to bad path when unreachable 2015-03-11 13:21:23 -07:00
Yicheng Qin
fbd5c81139 raft: remove shadowing of variables from test 2015-02-28 12:09:33 -08:00
Xiang Li
2af33fd494 raft: add reportUnreachable 2015-02-28 10:45:22 -08:00
Xiang Li
5ede18be74 raft: separate compact and createsnap in memory storage 2015-02-28 10:08:30 -08:00
Barak Michener
92dca0af0f *: remove shadowing of variables from etcd and add travis test
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
Ben Darnell
33d2400063 raft: Send any waiting appends after receiving MsgAppResp.
This addresses a problem that comes up in the cockroach tests,
in which the order of messages may lead to deadlocks (due to
the fact that we don't have regular heartbeat timers in most
of our tests).
2015-01-27 17:43:29 -05:00
Jonathan Boulle
f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell
59214978a2 raft: Add applied index as an argument to newRaft and RestartNode. 2015-01-22 11:38:05 -05:00
Xiang Li
003b97a60f raft: public progress struct in raft 2015-01-20 10:26:22 -08:00
Ben Darnell
2e1c36cdd9 raft: introduce MsgHeartbeatResp.
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.

This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Xiang Li
35b907ac58 raft: add lastIndex as rejectHint
Add the lastindex of the raft log as reject hint, so the leader can
bypass the greater index probing and decrease the next index directly
to last + 1.
2015-01-01 19:04:07 -08:00
Xiang Li
896bac1f76 raft: flush the commit to fix a race in test 2014-12-18 17:10:37 -08:00
Xiang Li
88767d913d raft: leader waits for the reply of previous message when follower is not in good path.
It is reasonable for the leader to wait for the reply before sending out the next
msgApp or msgSnap for the follower in bad path. Or the leader will send out useless
messages if the previous message is rejected or the previous message is a snapshot.
Especially for the snapshot case, the leader will be 100% to send out duplicate message
including the snapshot, which is a huge waste.

This commit implement a timeout based wait mechanism. The timeout for normal msgApp is a
heartbeatTimeout and the timeout for snapshot is electionTimeout(snapshot is larger). We
can implement a piggyback mechanism(application notifies the msg lost) in the future
if necessary.
2014-12-18 15:01:50 -08:00
Xiang Li
044e35b814 raft: use newRaft 2014-12-15 11:25:35 -08:00
Yicheng Qin
3867c72c8a raft: support to do multiple proposals in one message 2014-12-10 20:00:59 -08:00
Xiang Li
197e6b1b20 Merge pull request #1858 from vlajos/typofixes-vlajos-20141204
typofixes - https://github.com/vlajos/misspell_fixer
2014-12-04 14:52:27 -08:00
Veres Lajos
3de2ab2c04 *: typofixes
https://github.com/vlajos/misspell_fixer
2014-12-04 22:51:19 +00:00
Xiang Li
149389cbfa raft: add msgHeartbeat type 2014-12-04 08:29:31 -08:00
Xiang Li
b3841afcc3 raft: do not restore snapshot if local raft has longer matching history
Raft should not restore the snapshot if it has longer matching history.
Or restoring snapshot might remove the matched entries.
2014-12-02 21:34:14 -08:00
Xiang Li
788d1e59a2 raft: use index in entry 2014-12-02 10:25:27 -08:00
Xiang Li
3c0fbe285c raft: stableTo checks term matching
stableTo should only mark the index stable if the term is matched. After raft sends out unstable
entries to application, raft makes progress without waiting for reply. When the appliaction
calls the stableTo to notify the entries up to "index" are stable, raft might have truncated
some entries before "index" due to leader lost. raft must verify the (index,term) of stableTo,
before marking the entries as stable.
2014-11-28 14:13:07 -08:00
Xiang Li
66252c7d62 raft: move all unstable stuff into one struct for future cleanup 2014-11-26 13:36:17 -08:00
Xiang Li
65ad1f6ffd raft: attach Index to Entry in all tests 2014-11-24 17:13:47 -08:00
Ben Darnell
9ddd8ee539 Rename Storage.HardState back to InitialState and include ConfState.
This fixes integration/migration_test.go (and highlights the fact that
we need some more raft-level testing of restoring from snapshots).
2014-11-21 17:22:20 -05:00
Ben Darnell
03c8881e35 Fix TestSlowNodeRestore 2014-11-21 16:40:41 -05:00
Ben Darnell
0d680d0e6b Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  rafthttp: fix import
  raft: should not decrease match and next when handling out of order msgAppResp
  Fix migration to allow snapshots to have the right IDs
  add snapshotted integration test
  fix test import loop
  fix import loop, add set to types, and fix comments
  etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
  wal: add a bench for write entry
  rafthttp: add streaming server and client
  dep: use vendored imports in codegangsta/cli
  dep: bump golang.org/x/net/context

Conflicts:
	etcdserver/server.go
	etcdserver/server_test.go
	migrate/snapshot.go
2014-11-21 15:40:11 -05:00
Xiang Li
063c5c77a0 raft: should not decrease match and next when handling out of order msgAppResp 2014-11-20 17:58:23 -08:00
Ben Darnell
b29240baf0 Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  scripts: build-docker tag and use ENTRYPOINT
  scripts: build-release add etcd-migrate
  create .godir
  raft: optimistically increase the next if the follower is already matched
  raft: add handleHeartbeat handleHeartbeat commits to the commit index in the message. It never decreases the commit index of the raft state machine.
  rafthttp: send takes raft message instead of bytes
  *: add rafthttp pkg into test list
  raft: include commitIndex in heartbeat
  rafthttp: move server stats in raftHandler to etcdserver
  *: etcdhttp.raftHandler -> rafthttp.RaftHandler
  etcdserver: rename sender.go -> sendhub.go
  *: etcdserver.sender -> rafthttp.Sender

Conflicts:
	raft/log.go
	raft/raft_paper_test.go
2014-11-19 17:05:16 -05:00
Ben Darnell
355ee4f393 raft: Integrate snapshots into the raft.Storage interface.
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.

A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.

raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.

raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
2014-11-19 16:40:26 -05:00
Xiang Li
b50f331558 Merge pull request #1744 from xiang90/next
raft: optimistically increase the next if the follower is already matched
2014-11-19 13:21:11 -08:00
Xiang Li
4c1fd07311 raft: optimistically increase the next if the follower is already matched
This is useful since we want to pipeline the appendEntry requests. Without
enabling optimistic increasing, the second pipelining appendEntry request
will include the entries the first one has already sent out. We decrease
the next directly to match if the leader receives a rejection for a matched
follower. This happens if one pipelining request get lost and following ones
arrives at the follower.
2014-11-18 13:41:38 -08:00
Xiang Li
bd4cfa2a07 raft: add handleHeartbeat
handleHeartbeat commits to the commit index in the message. It never decreases the
commit index of the raft state machine.
2014-11-18 08:34:06 -08:00