The commit > unstable might not true for follower. The leader only need
to ensure the entry is stored on the majority of nodes to commit an
entry. So the minority of the cluster might receive commit > unstable
append request. This is normal.
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
It is necessary to make this check because of the following case:
1. memory storage contains ents from index 0 to 50, and unstable has
ents from index 50 to 60.
2. raft receives an incoming snapshot with index 100.
3. raft restores its unstable to 100, but has not applied snapshot on memory storage.
4. raft receives an out-dated MsgApp from index 60.
5. raft finds the term of index 60 to check the match.
6. raft asks memory storage about the term of index 60 after it failed to get
it from unstable.
7. memory storage panics because it knows nothing about index 60.
stableTo should only mark the index stable if the term is matched. After raft sends out unstable
entries to application, raft makes progress without waiting for reply. When the appliaction
calls the stableTo to notify the entries up to "index" are stable, raft might have truncated
some entries before "index" due to leader lost. raft must verify the (index,term) of stableTo,
before marking the entries as stable.
* coreos/master:
scripts: build-docker tag and use ENTRYPOINT
scripts: build-release add etcd-migrate
create .godir
raft: optimistically increase the next if the follower is already matched
raft: add handleHeartbeat handleHeartbeat commits to the commit index in the message. It never decreases the commit index of the raft state machine.
rafthttp: send takes raft message instead of bytes
*: add rafthttp pkg into test list
raft: include commitIndex in heartbeat
rafthttp: move server stats in raftHandler to etcdserver
*: etcdhttp.raftHandler -> rafthttp.RaftHandler
etcdserver: rename sender.go -> sendhub.go
*: etcdserver.sender -> rafthttp.Sender
Conflicts:
raft/log.go
raft/raft_paper_test.go
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.
A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.
raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.
raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
* coreos/master: (21 commits)
etcdserver: refactor ValidateClusterAndAssignIDs
integration: add integration test for remove member
integration: add test for member restart
version: bump to alpha.3
etcdserver: add buffer to the sender queue
*: gracefully stop etcdserver
Fix up migration tool, add snapshot migration
etcd4: migration from v0.4 -> v0.5
etcdserver: export Member.StoreKey
etcdserver: recover cluster when receiving newer snapshot
etcdserver: check and select committed entries to apply
etcdserver: recover from snapshot before applying requests
raft: not set applied when restored from snapshot
sender: support elegant stop
etcdserver: add StopNotify
etcdserver: fix TestDoProposalStopped test
etcdserver: minor cleanup
etcdserver: validate new node is not registered before in best effort
etcdserver: fix server.Stop()
*: print out configuration when necessary
...
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
raft/log.go
The first entry in the log is a dummy which is used for matchTerm
but may not have an actual payload. This change permits Storage
implementations to treat this term value specially instead of
storing it as a dummy Entry.
Storage.FirstIndex() no longer includes the term-only entry.
This reverses a recent decision to create entry zero as initially
unstable; Storage implementations are now required to make
Term(0) == 0 and the first unstable entry is now index 1.
stableTo(0) is no longer allowed.