We need to return io.ErrUnexpectedEOF in the error chain, so that
etcdserver can repair it automatically.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Previously etcdservers depends on raft/raftpb/raft.proto directly.
After moving raft to a separate repo, we need to add raft to the
tools/mod, and get raft included in the -I protc flags.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Comments fixed as per goword in go test files that shell
function go_srcs_in_module lists as per changes on #14827
Helps in #14827
Signed-off-by: Bhargav Ravuri <bhargav.ravuri@infracloud.io>
`unsafeCommit` is called by both `(*batchTxBuffered) commit` and
`(*backend) defrag`. When users perform the defragmentation
operation, etcd doesn't update the consistent index. If etcd
crashes(e.g. panicking) in the process for whatever reason, then
etcd replays the WAL entries starting from the latest snapshot,
accordingly it may re-apply entries which might have already been
applied, eventually the revision isn't consistent with other members.
Refer to discussion in https://github.com/etcd-io/etcd/pull/14685
Signed-off-by: Benjamin Wang <wachao@vmware.com>
When a key-value store corruption check happens immediately after a
compaction, the revision at which the key-value store hash is computed,
is the compacted revision itself.
In that case, the hash computation logic was incorrect because it
returned an ErrCompacted error; this error should instead be returned when
the revision at which the key-value store is hashed, is strictly lower
than the compacted revision.
Fixes#14325
Signed-off-by: Jeremy Leach <44558776+jbml@users.noreply.github.com>
A WAL object was closed by defer, however the WAL was rewritten afterwards,
so defer closed already closed WAL but not the new one. It caused a data
race between writing file and cleaning up a temporary test directory,
which led to a non-deterministic bug.
Fixes#14332
Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
Problem: We pass grpc context down to applier in readonly serializable txn.
This context can be cancelled for example due to timeout.
This will trigger panic inside applyTxn
Solution: Only panic for transactions with write operations
fixes https://github.com/etcd-io/etcd/issues/14110
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
The `ErrFileNotFound` was used for for three cases:
1. There is no any WAL files (probably due to no read permission);
2. There is no WAL files matching the snapshot index;
3. The WAL file seqs do not increase continuously.
It's not good for debug when users see the `ErrFileNotFound` error,
so in this PR, a different error is returned for each case above.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.
On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.
Signed-off-by: Benjamin Wang <wachao@vmware.com>