381 Commits

Author SHA1 Message Date
Benjamin Wang
bd9f1584d4 process the scenaro of the last WAL record being partially synced to disk
We need to return io.ErrUnexpectedEOF in the error chain, so that
etcdserver can repair it automatically.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-01-08 04:46:51 +08:00
Piotr Tabor
9abc895122 Goimports: Apply automated fixing to test files as well.
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 13:04:45 +01:00
Piotr Tabor
6f899a7b40
Merge pull request #15052 from ptabor/20221228-goimports-fix
./scripts/fix.sh: Takes care of goimports across the whole project.
2022-12-29 11:31:22 +01:00
Piotr Tabor
9e1abbab6e Fix goimports in all existing files. Execution of ./scripts/fix.sh
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 09:41:31 +01:00
KiloG
101a2a61ea
etcdserver: fix typo in comment
etcdserver: fix typo in comment
2022-12-28 18:41:08 +08:00
wafuwafu13
2ffa9e7c91 tests(etcdserver): refactor
Signed-off-by: wafuwafu13 <mariobaske@i.softbank.jp>
2022-12-16 10:09:04 +09:00
wafuwafu13
8dcfca0097 tests(etcdserver): add server_access_control_test.go
Signed-off-by: wafuwafu13 <mariobaske@i.softbank.jp>
2022-12-15 21:46:48 +09:00
Wei Fu
f59896c735 chore: use Getter in WarnOfExpensiveReadOnlyTxnRequest
The pb provides an accessor method to get field and it will not panic if
the owner is nil. And add non-empty RangeRespone into the test case.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-12-07 17:54:52 +08:00
Benjamin Wang
daad3a2154 etcdserver: fix nil pointer panic for readonly txn
FYI. https://github.com/etcd-io/etcd/issues/14891#issuecomment-1337191993

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-06 14:53:47 +08:00
Benjamin Wang
394956ca4e doc: cleanup etcd/raft in all documents
TODO:
1. Update Documentation/contributor-guide/modules.svg;
2. Update bill-of-materials.json when raft and raftexample are removed in future;

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-02 14:13:18 +08:00
Benjamin Wang
faff80a2b3 etcdserve: format the source code
gofmt -w ./server

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-02 13:00:59 +08:00
Benjamin Wang
e9aa275b36 etcdserver: update etcdserver to use the new raft module go.etcd.io/raft/v3
Just replaced all go.etcd.io/etcd/raft/v3 with go.etcd.io/raft/v3
under directory server.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-02 09:33:45 +08:00
Benjamin Wang
fae9435b66 test: fix unit test Instability
When two members in a 5 member cluster are corrupted, and they
have different hashes, etcd will raise alarm for both members,
but the order isn't guaranteed. But if the two corrupted members
have the same hash, then the order is guaranteed. The leader
always raise alarm in the same order as the member list.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-29 06:51:50 +08:00
Benjamin Wang
d545d603e9 test: update both unit test and e2e/integration test for CompactHashCheck
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 20:13:20 +08:00
Benjamin Wang
6049af072c etcdserver: intentionally set memberID as 0 when can't identify the corrupted member
If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
https://github.com/etcd-io/etcd/issues/14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Benjamin Wang
e95e82f0b9 etcdserver: added a summary for the CompactHashCheck method
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Benjamin Wang
85fc09d09b etcdserver: resolve review comments in PR 14828
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Benjamin Wang
8b98fee9ce etcdserver: detect corrupted member based on quorum
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.

We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-26 19:35:38 +08:00
Bhargav Ravuri
2feec4fe68 comments: fix comments as per goword in go test files
Comments fixed as per goword in go test files that shell
function go_srcs_in_module lists as per changes on #14827

Helps in #14827

Signed-off-by: Bhargav Ravuri <bhargav.ravuri@infracloud.io>
2022-11-23 23:05:42 +05:30
Andrew Sims
f656fa0f49 add missing copyright headers
Signed-off-by: Andrew Sims <andrew.cameron.sims@gmail.com>
2022-11-23 19:13:43 +11:00
Sasha Melentyev
c3b6cbdb73 all: goimports -w .
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-17 19:07:04 +03:00
Sasha Melentyev
2c9c209eb6 all: Changing Printf and friends to Print if there is no formatting
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-15 22:11:23 +03:00
Sasha Melentyev
006e747a44 all: Change time unit
Signed-off-by: Sasha Melentyev <sasha@melentyev.io>
2022-11-15 01:15:01 +03:00
Benjamin Wang
f77b8a735f etcdserver: populate HashRevision when responding to leader or client's HashKV request
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-14 08:33:44 +08:00
Nathan VanBenschoten
0f9d7a4f95 raft: make Message.Snapshot nullable, halve struct size
This commit makes the rarely used `raftpb.Message.Snapshot` field nullable.
In doing so, it reduces the memory size of a `raftpb.Message` message from
264 bytes to 128 bytes — a 52% reduction in size.

While this commit does not change the protobuf encoding, it does change
how that encoding is used. `(gogoproto.nullable) = false` instruct the
generated proto marshaling logic to always encode a value for the field,
even if that value is empty. `(gogoproto.nullable) = true` instructs the
generated proto marshaling logic to omit an encoded value for the field
if the field is nil.

This raises compatibility concerns in both directions. Messages encoded
by new binary versions without a `Snapshot` field will be decoded as an
empty field by old binary versions. In other words, old binary versions
can't tell the difference. However, messages encoded by old binary versions
with an empty Snapshot field will be decoded as a non-nil, empty field by
new binary versions. As a result, new binary versions need to be prepared
to handle such messages.

While Message.Snapshot is not intentionally part of the external interface
of this library, it was possible for users of the library to access it and
manipulate it. As such, this change may be considered a breaking change.

Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
2022-11-09 17:35:52 +00:00
Benjamin Wang
2ac149b96a etcdserver: fix log typo when checking version compatiblity
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-08 18:27:46 +08:00
Benjamin Wang
c967715d93 auth: protect all maintainence APIs when auth is enabled
All maintenance APIs require admin privilege when auth is enabled,
otherwise, the request will be rejected. If auth isn't enabled,
then no such requirement any more.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-03 04:39:42 +08:00
Marek Siarkowicz
f215cd89d2
Merge pull request #14612 from spacewander/azq
chore: commit the change generated by scripts/genproto.sh
2022-10-24 20:00:52 +02:00
spacewander
3a63a0d5e3 chore: commit the change generated by scripts/genproto.sh
TODO: ensure the generated code is up-to-date in the CI.
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2022-10-23 21:13:55 +08:00
Samuele Resca
b58f9c27e4 Refactoring code to remove duplicate code test.
Signed-off-by: Samuele Resca <sr7@ad.datcon.co.uk>
Signed-off-by: Samuele Resca <samuele.resca@gmail.com>
2022-10-23 13:46:10 +01:00
Samuele Resca
3d9c5c6166 Adding fuzz test on v3rpc interfaces.
Signed-off-by: Samuele Resca <sr7@ad.datcon.co.uk>
Signed-off-by: Samuele Resca <samuele.resca@gmail.com>
2022-10-23 13:46:10 +01:00
Marek Siarkowicz
2b178fdd96 server: Handle cluster version equal downgrade version
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-10-17 12:05:57 +02:00
Benjamin Wang
1ccdb3762d Test: fix all corruption detection related unit test cases
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-30 06:06:41 +08:00
Benjamin Wang
d116d02e04 etcdserver: update corrupt hash detection's logic
get peer's hash using the same revision as the value used by leader

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-30 06:03:50 +08:00
Benjamin Wang
1c20ed2cc5
Merge pull request #14521 from lovehhf/remove_pick_peer_url
membership: Remove PickPeerURL Method
2022-09-27 02:10:35 +08:00
Hongfei Huang
f6d808736c membership: Remove PickPeerURL Method
PickPeerURL only used by unit test

Signed-off-by: Hongfei Huang <853885165@qq.com>
2022-09-26 23:21:10 +08:00
Kafuu Chino
f1d4935e91 *: avoid closing a watch with ID 0 incorrectly
Signed-off-by: Kafuu Chino <KafuuChinoQ@gmail.com>

add test
2022-09-26 20:30:33 +08:00
Benjamin Wang
9097e61b40 etcdserve: revert the etcdserver side change for the data loss on one node cluster
Since the raft side change has been merged, so we need to revert the etcdserver
side change.
Refer to
  https://github.com/etcd-io/etcd/pull/14413
  https://github.com/etcd-io/etcd/pull/14400

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-22 15:19:20 +08:00
Benjamin Wang
7f10dccbaf Bump go 1.19: update all the dependencies and go.sum files
1. run ./scripts/fix.sh;
2. cd tools/mod; gofmt -w . & go mod tidy;

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-22 08:47:46 +08:00
Marek Siarkowicz
026794495f
Merge pull request #14494 from demoManito/remove/redundant-type-conversion
etcd: remove redundant type conversion
2022-09-21 11:34:19 +02:00
Benjamin Wang
2441a24cee
Merge pull request #14493 from demoManito/style/format-import-order
etcd: format import order
2022-09-21 06:03:31 +08:00
demoManito
f67ec10779 etcd: format import order
golang CodeReviewComments:
https://github.com/golang/go/wiki/CodeReviewComments#imports

Signed-off-by: demoManito <1430482733@qq.com>
2022-09-20 18:41:39 +08:00
demoManito
a9c3d56508 etcd: remove redundant type conversion
Signed-off-by: demoManito <1430482733@qq.com>
2022-09-20 11:26:02 +08:00
Benjamin Wang
159ed15afc
Merge pull request #14479 from demoManito/fix/declaring-empty-slice
etcd: modify declaring empty slices
2022-09-20 05:22:59 +08:00
Hitoshi Mitake
2dcfa83094 *: handle auth invalid token and old revision errors in watch
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
2022-09-17 21:51:36 +09:00
demoManito
72cf0cc04a etcd: modify declaring empty slices
declare an empty slice to var s []int replace  s :=[]int{}, https://github.com/golang/go/wiki/CodeReviewComments#declaring-empty-slices

Signed-off-by: demoManito <1430482733@qq.com>
2022-09-16 14:41:14 +08:00
Benjamin Wang
3dc5348d94
Merge pull request #14419 from ahrtr/alarm_list_ci
Move consistent_index forward when executing alarmList operation
2022-09-06 03:50:58 +08:00
Benjamin Wang
cc840336f0 move consistent_index forward when executing alarmList operation
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,

```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl  alarm list; done
kill -9 <etcd_pid>
etcd
```

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-05 10:05:55 +08:00
Benjamin Wang
2a10049e47 fix the potential data loss for clusters with only one member
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.

When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
   1. etcd commits the boltDB transaction periodically instead of on each request;
   2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.

For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-30 15:29:20 +08:00
spacewander
508ce517e0 update according to the review
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2022-08-17 09:25:37 +08:00