We need to return io.ErrUnexpectedEOF in the error chain, so that
etcdserver can repair it automatically.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The pb provides an accessor method to get field and it will not panic if
the owner is nil. And add non-empty RangeRespone into the test case.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
TODO:
1. Update Documentation/contributor-guide/modules.svg;
2. Update bill-of-materials.json when raft and raftexample are removed in future;
Signed-off-by: Benjamin Wang <wachao@vmware.com>
When two members in a 5 member cluster are corrupted, and they
have different hashes, etcd will raise alarm for both members,
but the order isn't guaranteed. But if the two corrupted members
have the same hash, then the order is guaranteed. The leader
always raise alarm in the same order as the member list.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
https://github.com/etcd-io/etcd/issues/14849
Signed-off-by: Benjamin Wang <wachao@vmware.com>
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.
We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Comments fixed as per goword in go test files that shell
function go_srcs_in_module lists as per changes on #14827
Helps in #14827
Signed-off-by: Bhargav Ravuri <bhargav.ravuri@infracloud.io>
This commit makes the rarely used `raftpb.Message.Snapshot` field nullable.
In doing so, it reduces the memory size of a `raftpb.Message` message from
264 bytes to 128 bytes — a 52% reduction in size.
While this commit does not change the protobuf encoding, it does change
how that encoding is used. `(gogoproto.nullable) = false` instruct the
generated proto marshaling logic to always encode a value for the field,
even if that value is empty. `(gogoproto.nullable) = true` instructs the
generated proto marshaling logic to omit an encoded value for the field
if the field is nil.
This raises compatibility concerns in both directions. Messages encoded
by new binary versions without a `Snapshot` field will be decoded as an
empty field by old binary versions. In other words, old binary versions
can't tell the difference. However, messages encoded by old binary versions
with an empty Snapshot field will be decoded as a non-nil, empty field by
new binary versions. As a result, new binary versions need to be prepared
to handle such messages.
While Message.Snapshot is not intentionally part of the external interface
of this library, it was possible for users of the library to access it and
manipulate it. As such, this change may be considered a breaking change.
Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
All maintenance APIs require admin privilege when auth is enabled,
otherwise, the request will be rejected. If auth isn't enabled,
then no such requirement any more.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,
```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl alarm list; done
kill -9 <etcd_pid>
etcd
```
Signed-off-by: Benjamin Wang <wachao@vmware.com>
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.
When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
1. etcd commits the boltDB transaction periodically instead of on each request;
2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.
For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.
Signed-off-by: Benjamin Wang <wachao@vmware.com>