advance() should use rs.req.Entries[0].Data as the context instead of
req.Context for deletion. Since req.Context is never set, there won't be
any context being deleted from pendingReadIndex; results mem leak.
FIXES#7571
TestNodeTick relies on a unreliable func `waitForSchedule` when running
with GOMAXPROCS > 1. This commit changes the test to make sure we stop
the node afte it drains the tick chan. The test should be reliable now.
Getting gosimple suggestion while running test script, so this PR is for fixing gosimple S1019 check.
raft/node_test.go:456:40: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:457:49: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:458:43: should use make([]raftpb.Message, 1) instead (S1019)
Refer https://github.com/dominikh/go-tools/blob/master/cmd/gosimple/README.md#checks for more information.
n.Tick() is async. It can be racy when running with n.Stop().
n.Status() is sync and has a feedback mechnism internally. So there wont be
any race between n.Status() and n.Stop() call.
The "logs converge" case in TestLeaderElectionPreVote was incorrectly
passing because some nodes were not actually using the preVoteConfig.
This test case was more complex than its siblings and it was not
verifying what it wanted to verify, so pull it out into a separate test
where everything can be tested more explicitly.
Fixes#6895
After add node conf proposed twice with the same node id, the pending state is not reset because
the addNode returned without setting the pending state at the second
time and the pending state will always be true unless other conf changed. During this we
can not add any new node because the propose will be ignored since the
pending state is true.
If the node is stopped, then Status can hang forever because there is no
event loop to answer. So, just return empty status to avoid deadlocks.
Fix#6855
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
If MsgTimeoutNow arrived after a node was removed, the node could start
and win an election, then panic in becomeLeader (see
cockroachdb/cockroach#8535)
Move all vote handling from the per-state step functions to the
top-level Step(). This wasn't necessary before because MsgVote would
cause us to become a follower, but MsgPreVote needs to be handled
without changing the node's current state.
Some tests were starting nodes with a non-empty log but a term of zero,
which cannot happen in the real world. This was affecting the final term
being tested in TestLeaderElection.
TickQuiesced allows the caller to support "quiesced" Raft groups which
do not perform periodic heartbeats and elections. This is useful in a
system with thousands of Raft groups where these periodic operations can
be overwhelming in an otherwise idle system.
It might seem possible to avoid advancing the logical clock at all in
such Raft groups, but doing so has an interaction with the CheckQuorum
functionality. If a follower is not quiesced while the leader is the
follower can call an election that will fail because the leader's lease
has not expired (electionElapsed < electionTimeout). The next time the
leader sends a heartbeat to this follower the follower will see that the
heartbeat is from a previous term and respond with a MsgAppResp. This in
turn will cause the leader to step down and become a follower even
though there isn't a leader in the group. By allowing the leader's
logical clock to advance via TickQuiesced, the leader won't reject the
election and there will be a smooth transfer of leadership to the
follower.