- Add a large detailed comment about the use and necessity of
both the follower and leader probing optimization
- fix the log message in stepLeader that previously mixed up the
log term for the rejection and the index of the append
- improve the test via subtests
- add some verbiage in findConflictByTerm around first index
This change makes the etcd package compatible with the existing Go
ecosystem for module versioning.
Used this tool to update package imports:
https://github.com/KSubedi/gomove
Appending to an empty slice twice could (and often did) result in
multiple allocations. This was wasteful. We can avoid this by performing
a single allocation with the correct size and copying into it.
Prior to this change, MaxSizePerMsg was used both to cap the total byte size of
entries in messages as well as the total byte size of entries passed through
CommittedEntries in the Ready struct. This change adds a new Config parameter
MaxCommittedSizePerReady which defaults to MaxSizePerMsg and contols the second
of above descibed settings.
The MaxSizePerMsg setting is now used to limit the size of
Ready.CommittedEntries. This prevents out-of-memory errors if the raft
log has become very large and commits all at once.
sendApp accesses the storage several times. Perviously, we
assume that the storage will not be modified during the read
opeartions. The assumption is not true since the storage can
be compacted between the read operations. If a compaction
causes a read entries error, we should not painc. Instead, we
can simply retry the sendApp logic until succeed.
The commit > unstable might not true for follower. The leader only need
to ensure the entry is stored on the majority of nodes to commit an
entry. So the minority of the cluster might receive commit > unstable
append request. This is normal.
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
It is necessary to make this check because of the following case:
1. memory storage contains ents from index 0 to 50, and unstable has
ents from index 50 to 60.
2. raft receives an incoming snapshot with index 100.
3. raft restores its unstable to 100, but has not applied snapshot on memory storage.
4. raft receives an out-dated MsgApp from index 60.
5. raft finds the term of index 60 to check the match.
6. raft asks memory storage about the term of index 60 after it failed to get
it from unstable.
7. memory storage panics because it knows nothing about index 60.
stableTo should only mark the index stable if the term is matched. After raft sends out unstable
entries to application, raft makes progress without waiting for reply. When the appliaction
calls the stableTo to notify the entries up to "index" are stable, raft might have truncated
some entries before "index" due to leader lost. raft must verify the (index,term) of stableTo,
before marking the entries as stable.