Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.
Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change *could* exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.
TestNodeTick relies on a unreliable func `waitForSchedule` when running
with GOMAXPROCS > 1. This commit changes the test to make sure we stop
the node afte it drains the tick chan. The test should be reliable now.
Getting gosimple suggestion while running test script, so this PR is for fixing gosimple S1019 check.
raft/node_test.go:456:40: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:457:49: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:458:43: should use make([]raftpb.Message, 1) instead (S1019)
Refer https://github.com/dominikh/go-tools/blob/master/cmd/gosimple/README.md#checks for more information.
n.Tick() is async. It can be racy when running with n.Stop().
n.Status() is sync and has a feedback mechnism internally. So there wont be
any race between n.Status() and n.Stop() call.
ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it
could promise that other goroutines run long enough
because it always yield the processor to other goroutines. But it cannot
yield processor to goroutine running on other processors. So when
GOMAXPROCS>1, the yield may finish when goroutine on the other
processor just runs for little time.
Here is a test to confirm the case:
```
package main
import (
"fmt"
"runtime"
"testing"
)
func ForceGosched() {
// possibility enough to sched up to 10 go routines.
for i := 0; i < 10000; i++ {
runtime.Gosched()
}
}
var d int
func loop(c chan struct{}) {
for {
select {
case <-c:
for i := 0; i < 1000; i++ {
fmt.Sprintf("come to time %d", i)
}
d++
}
}
}
func TestLoop(t *testing.T) {
c := make(chan struct{}, 1)
go loop(c)
c <- struct{}{}
ForceGosched()
if d != 1 {
t.Fatal("d is not incremented")
}
}
```
`go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails.
Change the functionality to waiting for schedule to happen.
raft node should set initial prev hard state to empty.
Or it will not send the first hard coded state to application
until the state changes again.
This commit fixs the issue. It introduce a small overhead, that
the same tate might send to application twice when restarting.
But this is fine.
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
* coreos/master:
rafthttp: fix import
raft: should not decrease match and next when handling out of order msgAppResp
Fix migration to allow snapshots to have the right IDs
add snapshotted integration test
fix test import loop
fix import loop, add set to types, and fix comments
etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
wal: add a bench for write entry
rafthttp: add streaming server and client
dep: use vendored imports in codegangsta/cli
dep: bump golang.org/x/net/context
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
migrate/snapshot.go
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.
A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.
raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.
raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.