13 Commits

Author SHA1 Message Date
Tobias Schottdorf
7a8ab37bfd raft: fix correctness bug in CommittedEntries pagination
In #9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448
2018-09-04 14:52:23 +02:00
Gyuho Lee
bb60f8ab1d raft: change import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:52 -07:00
Vincent Lee
11fa4f0275 raft: raft learners should be returned after applyConfChange 2018-01-11 17:30:17 +08:00
Ben Darnell
8d8f3195e4 raft: Avoid scanning raft log in becomeLeader
Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.

Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change *could* exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.
2017-12-30 10:13:36 -05:00
siddontang
c6f2db2e92 raft: support learner 2017-11-11 10:38:21 +08:00
Peter Mattis
37fa6ac45c raft: add RawNode.TickQuiesced
TickQuiesced allows the caller to support "quiesced" Raft groups which
do not perform periodic heartbeats and elections. This is useful in a
system with thousands of Raft groups where these periodic operations can
be overwhelming in an otherwise idle system.

It might seem possible to avoid advancing the logical clock at all in
such Raft groups, but doing so has an interaction with the CheckQuorum
functionality. If a follower is not quiesced while the leader is the
follower can call an election that will fail because the leader's lease
has not expired (electionElapsed < electionTimeout). The next time the
leader sends a heartbeat to this follower the follower will see that the
heartbeat is from a previous term and respond with a MsgAppResp. This in
turn will cause the leader to step down and become a follower even
though there isn't a leader in the group. By allowing the leader's
logical clock to advance via TickQuiesced, the leader won't reject the
election and there will be a smooth transfer of leadership to the
follower.
2016-09-15 21:05:18 -04:00
Dylan.Wen
eeca614cd3 raft: add read index for RawNode 2016-09-14 14:43:46 +08:00
Xiang Li
484f579905 raft: hide Campaign rules on applying all entries 2016-07-25 15:53:39 -07:00
Gyu-Ho Lee
fe884f8209 raft: update LICENSE header 2016-05-12 20:49:15 -07:00
es-chow
ac059eb8cb raft: transfer leader feature 2016-04-08 16:56:32 +08:00
Ben Darnell
22925a1d2f raft: Remove redundant raft.Commit field.
Keeping this field in sync with `raft.raftLog.committed` was
error-prone, so instead we synthesize the `HardState` on demand.

Fixes #4278.
2016-01-26 15:18:55 -05:00
siddontang
54a45ba2f5 *: fix typo 2016-01-06 16:17:02 +08:00
es-chow
5bc56786dc raft: add RawNode which is a thread-unsafe node without goroutine and remove MultiNode 2015-11-26 17:14:14 +08:00