628 Commits

Author SHA1 Message Date
Xiang Li
7fe608532a raft: do not reset vote if term is not changed
raft MUST keep the voting information for the same term. reset
should not reset vote if term is not changed.
2015-03-07 22:31:20 -08:00
Ben Darnell
725c411346 Add ReportUnreachable and ReportSnapshot to MultiNode.
Add ReportSnapshot requirement to doc.go.
2015-03-05 12:39:52 -05:00
Xiang Li
6b9b695167 Merge pull request #2435 from bdarnell/multinode
raft: Introduce MultiNode.
2015-03-04 21:27:20 -08:00
Ben Darnell
c824c867ec raft: more doc updates.
Including parallelism of persist and send, cancellation of
ConfChanges, and the risks of two-node clusters.
2015-03-04 15:48:35 -05:00
Ben Darnell
4e74d81bbb raft: Introduce MultiNode.
MultiNode is an alternative to raft.Node that is more efficient
when a node may participate in many consensus groups. It is currently
used in the CockroachDB project; this commit merges the
github.com/cockroachdb/etcd fork back into the mainline.
2015-03-04 15:30:21 -05:00
Ben Darnell
250970cc23 raft: Expand doc.go
Includes more details on the required caller behavior and the safety of
membership changes.

Closes #2397
2015-03-04 13:18:02 -05:00
Yicheng Qin
b4b9b9118a rafthttp: report MsgSnap status 2015-03-02 09:38:11 -08:00
Yicheng Qin
09f181f585 raft: log unreachable remote node 2015-03-01 16:47:49 -08:00
Yicheng Qin
fbd5c81139 raft: remove shadowing of variables from test 2015-02-28 12:09:33 -08:00
Xiang Li
9b4d52ee73 raft: do not resend snapshot if not necessary
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
2015-02-28 11:41:58 -08:00
Xiang Li
2185ac5ac8 raft: cleanup unreachable 2015-02-28 11:35:16 -08:00
Xiang Li
2af33fd494 raft: add reportUnreachable 2015-02-28 10:45:22 -08:00
Xiang Li
cbef6ab152 raft: clean up storage 2015-02-28 10:09:07 -08:00
Xiang Li
5ede18be74 raft: separate compact and createsnap in memory storage 2015-02-28 10:08:30 -08:00
Ben Darnell
b53dc0826e Only use the EntryFormatter for normal entries.
ConfChange entries also have a Data field but the application-supplied
formatter won't know what to do with them.
2015-02-20 13:51:14 -05:00
Barak Michener
92dca0af0f *: remove shadowing of variables from etcd and add travis test
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
Xiang Li
fa66055f66 rafttest: drop isPaused 2015-02-09 18:52:34 -08:00
Xiang Li
085b608de9 rafttest: support node pause 2015-02-09 16:26:43 -08:00
Xiang Li
279b216f9a raftest: wait for network sending 2015-02-09 15:52:16 -08:00
Xiang Li
65cd0051fe rafttest: add network delay 2015-02-06 15:01:07 -08:00
Xiang Li
d423946fa4 rafttest: add network drop 2015-02-06 10:50:55 -08:00
Xiang Li
83edf0d862 rafttest: separate network interface and network 2015-02-03 22:50:27 -08:00
Xiang Li
b147a6328d raftest: add restart and related simple test 2015-02-03 10:08:52 -08:00
Xiang Li
d65af21b73 raft: add raft test suite 2015-02-01 14:53:22 -08:00
Xiang Li
bff2ccaa22 Merge pull request #2170 from xiang90/remove_log
raft: remove default verbose logging
2015-01-27 15:58:53 -08:00
Xiang Li
553379e82b raft: remove default verbose logging 2015-01-27 15:57:44 -08:00
Ben Darnell
33d2400063 raft: Send any waiting appends after receiving MsgAppResp.
This addresses a problem that comes up in the cockroach tests,
in which the order of messages may lead to deadlocks (due to
the fact that we don't have regular heartbeat timers in most
of our tests).
2015-01-27 17:43:29 -05:00
Xiang Li
276c9540b4 etcdserver: support raft.status 2015-01-26 16:39:33 -08:00
Jonathan Boulle
f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell
8c3a6508e9 raft: Add applied to the newRaft log message. 2015-01-22 12:04:40 -05:00
Ben Darnell
59214978a2 raft: Add applied index as an argument to newRaft and RestartNode. 2015-01-22 11:38:05 -05:00
Ben Darnell
cd9d5573d4 raft: make EntryFormatter less clever. 2015-01-21 19:27:26 -05:00
Ben Darnell
e73d442e32 raft: Add support for custom formatters in DescribeMessage/DescribeEntry 2015-01-21 14:12:58 -05:00
Xiang Li
003b97a60f raft: public progress struct in raft 2015-01-20 10:26:22 -08:00
Xiang Li
b34936b097 raft: add progress into status 2015-01-18 15:23:50 -08:00
Xiang Li
0eaaad0e48 raft: add Status interface
Status returns the current status of raft state machine.
2015-01-16 14:02:04 -08:00
Ben Darnell
2e1c36cdd9 raft: introduce MsgHeartbeatResp.
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.

This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Ben Darnell
9972e62d94 raft: Use <= instead of < for heartbeat ticks.
In code outside the raft package, we cannot call raft.bcastHeartbeat
directly. Instead, to control heartbeats we set heartbeatInterval to 1
and call Tick().
2015-01-14 15:27:32 -05:00
Yicheng Qin
7a2fa39e52 Merge pull request #2012 from andybons/master
raft: add link to the paper raft_paper_test.go refers to
2015-01-06 00:27:47 -08:00
Xiang Li
2a83e350b1 Merge pull request #1992 from xiang90/rm_leader
*: support removing the leader from a 2 members cluster
2015-01-02 14:15:12 -08:00
Xiang Li
35b907ac58 raft: add lastIndex as rejectHint
Add the lastindex of the raft log as reject hint, so the leader can
bypass the greater index probing and decrease the next index directly
to last + 1.
2015-01-01 19:04:07 -08:00
Xiang Li
152676f43a *: support removing the leader from a 2 members cluster 2014-12-29 11:34:33 -08:00
Andrew Bonventre
4463f5c4b3 raft: add link to the paper raft_paper_tests.go refers to 2014-12-29 14:17:48 -05:00
Xiang Li
fc96a9e4a7 raft: remove unnecessary funcs in raft.go 2014-12-25 17:04:33 -08:00
Xiang Li
2dbdf87f86 raft: add doc for storage 2014-12-22 12:33:14 -08:00
Xiang Li
896bac1f76 raft: flush the commit to fix a race in test 2014-12-18 17:10:37 -08:00
Xiang Li
88767d913d raft: leader waits for the reply of previous message when follower is not in good path.
It is reasonable for the leader to wait for the reply before sending out the next
msgApp or msgSnap for the follower in bad path. Or the leader will send out useless
messages if the previous message is rejected or the previous message is a snapshot.
Especially for the snapshot case, the leader will be 100% to send out duplicate message
including the snapshot, which is a huge waste.

This commit implement a timeout based wait mechanism. The timeout for normal msgApp is a
heartbeatTimeout and the timeout for snapshot is electionTimeout(snapshot is larger). We
can implement a piggyback mechanism(application notifies the msg lost) in the future
if necessary.
2014-12-18 15:01:50 -08:00
Xiang Li
044e35b814 raft: use newRaft 2014-12-15 11:25:35 -08:00
Xiang Li
c586d5012c raft: log term as %d 2014-12-14 10:06:45 -08:00
Xiang Li
2c2e032155 Merge pull request #1908 from bdarnell/error-fixes
raft: remove panic when we see a proposal with no leader.
2014-12-11 13:58:51 -08:00