TestNodeTick relies on a unreliable func `waitForSchedule` when running
with GOMAXPROCS > 1. This commit changes the test to make sure we stop
the node afte it drains the tick chan. The test should be reliable now.
Getting gosimple suggestion while running test script, so this PR is for fixing gosimple S1019 check.
raft/node_test.go:456:40: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:457:49: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:458:43: should use make([]raftpb.Message, 1) instead (S1019)
Refer https://github.com/dominikh/go-tools/blob/master/cmd/gosimple/README.md#checks for more information.
n.Tick() is async. It can be racy when running with n.Stop().
n.Status() is sync and has a feedback mechnism internally. So there wont be
any race between n.Status() and n.Stop() call.
ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it
could promise that other goroutines run long enough
because it always yield the processor to other goroutines. But it cannot
yield processor to goroutine running on other processors. So when
GOMAXPROCS>1, the yield may finish when goroutine on the other
processor just runs for little time.
Here is a test to confirm the case:
```
package main
import (
"fmt"
"runtime"
"testing"
)
func ForceGosched() {
// possibility enough to sched up to 10 go routines.
for i := 0; i < 10000; i++ {
runtime.Gosched()
}
}
var d int
func loop(c chan struct{}) {
for {
select {
case <-c:
for i := 0; i < 1000; i++ {
fmt.Sprintf("come to time %d", i)
}
d++
}
}
}
func TestLoop(t *testing.T) {
c := make(chan struct{}, 1)
go loop(c)
c <- struct{}{}
ForceGosched()
if d != 1 {
t.Fatal("d is not incremented")
}
}
```
`go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails.
Change the functionality to waiting for schedule to happen.
raft node should set initial prev hard state to empty.
Or it will not send the first hard coded state to application
until the state changes again.
This commit fixs the issue. It introduce a small overhead, that
the same tate might send to application twice when restarting.
But this is fine.
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
* coreos/master:
rafthttp: fix import
raft: should not decrease match and next when handling out of order msgAppResp
Fix migration to allow snapshots to have the right IDs
add snapshotted integration test
fix test import loop
fix import loop, add set to types, and fix comments
etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
wal: add a bench for write entry
rafthttp: add streaming server and client
dep: use vendored imports in codegangsta/cli
dep: bump golang.org/x/net/context
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
migrate/snapshot.go
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.
A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.
raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.
raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
* coreos/master: (21 commits)
etcdserver: refactor ValidateClusterAndAssignIDs
integration: add integration test for remove member
integration: add test for member restart
version: bump to alpha.3
etcdserver: add buffer to the sender queue
*: gracefully stop etcdserver
Fix up migration tool, add snapshot migration
etcd4: migration from v0.4 -> v0.5
etcdserver: export Member.StoreKey
etcdserver: recover cluster when receiving newer snapshot
etcdserver: check and select committed entries to apply
etcdserver: recover from snapshot before applying requests
raft: not set applied when restored from snapshot
sender: support elegant stop
etcdserver: add StopNotify
etcdserver: fix TestDoProposalStopped test
etcdserver: minor cleanup
etcdserver: validate new node is not registered before in best effort
etcdserver: fix server.Stop()
*: print out configuration when necessary
...
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
raft/log.go
The first entry in the log is a dummy which is used for matchTerm
but may not have an actual payload. This change permits Storage
implementations to treat this term value specially instead of
storing it as a dummy Entry.
Storage.FirstIndex() no longer includes the term-only entry.
This reverses a recent decision to create entry zero as initially
unstable; Storage implementations are now required to make
Term(0) == 0 and the first unstable entry is now index 1.
stableTo(0) is no longer allowed.
* coreos/master:
etcdserver: add sender tests
raft: Only call stableTo when we have ready entries or a snapshot.
etcdserver: add ID() function to the Server interface.
sender: use RoundTripper instead of Client in sender