17202 Commits

Author SHA1 Message Date
Marek Siarkowicz
3f26995f99 server: Move unsafeHashByRev to new hash.go file
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
bc592c7b01 server: Extract unsafeHashByRev function
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
336fef4ce2 server: Test HashByRev values to make sure they don't change
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
78a6f387cb server: Cover corruptionMonitor with tests
Get 100% coverage on InitialCheck and PeriodicCheck functions to avoid
any mistakes.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
35cbdf3961 server: Extract corruption detection to dedicated struct
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
d32de2c410 server: Extract triggerCorruptAlarm to function
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Benjamin Wang
204c0319a1
Merge pull request #14429 from ahrtr/alarm_list_ci_3.5
[3.5] Move consistent_index forward when executing alarmList operation
2022-09-06 15:17:13 +08:00
Benjamin Wang
5c8aa08e2c move consistent_index forward when executing alarmList operation
Cherry pick https://github.com/etcd-io/etcd/pull/14419 to 3.5.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-06 12:48:06 +08:00
Benjamin Wang
747bf5ceff
Merge pull request #14424 from serathius/one_member_data_loss_raft_3_5
[release-3.5] fix the potential data loss for clusters with only one member
2022-09-06 03:28:24 +08:00
Benjamin Wang
7eb696dfcd fix the potential data loss for clusters with only one member
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.

When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
   1. etcd commits the boltDB transaction periodically instead of on each request;
   2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.

For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-05 14:26:24 +02:00
Marek Siarkowicz
fbb14f91bf
Merge pull request #14397 from biosvs/backport-grpc-proxy-endpoints-autosync
Backport of pull/14354 to release-3.5
2022-09-01 10:14:22 +02:00
Vitalii Levitskii
67e4c59e01 Backport of pull/14354 to 3.5.5
Signed-off-by: Vitalii Levitskii <vitalii@uber.com>
2022-08-29 15:58:17 +03:00
Benjamin Wang
74aa38ec10
Merge pull request #14366 from ahrtr/keepalive_3.5_20220820
[3.5] Refactor the keepAliveListener and keepAliveConn
2022-08-24 10:14:26 +08:00
Benjamin Wang
9ea5b1ba22 Refactor the keepAliveListener and keepAliveConn
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.

Also refer to https://github.com/etcd-io/etcd/pull/14356

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-20 15:03:15 +08:00
Benjamin Wang
6bab3677eb
Merge pull request #14361 from amdprophet/3.5-close-keepalive-stream
[3.5] clientv3: close streams after use in lessor keepAliveOnce method
2022-08-20 05:33:40 +08:00
Justin Kolberg
eab0b999a8
clientv3: close streams after use in lessor keepAliveOnce method
Streams are now closed after being used in the lessor `keepAliveOnce` method.
This prevents the "failed to receive lease keepalive request from gRPC stream"
message from being logged by the server after the context is cancelled by the
client.

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
2022-08-18 09:54:12 -07:00
Benjamin Wang
9e95685d0a
Merge pull request #14312 from ahrtr/3.5_bump_otl
[3.5] etcdserver: bump OpenTelemetry to 1.0.1 and gRPC to 1.41.0
2022-08-09 04:03:21 +08:00
Benjamin Wang
8fdca41cd8 Change default sampling rate from 100% to 0%
Refer to https://github.com/etcd-io/etcd/pull/14318

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-07 07:19:30 +08:00
Benjamin Wang
8c5f110b59 Fix the failure in TestEndpointSwitchResolvesViolation
Refer to a0bdfc4fc9

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-07 07:17:27 +08:00
Benjamin Wang
2751c61f24 update all related dependencies
Upgrade grpc to 1.41.0;
Run ./script/fix.sh to fix all related issue.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-07 07:17:27 +08:00
Benjamin Wang
5a86ae2c33 move setupTracing into a separate file config_tracing.go
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-07 07:17:27 +08:00
Benjamin Wang
2d7e49002c etcdserver: bump OpenTelemetry to 1.0.1
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-07 07:16:08 +08:00
Benjamin Wang
6145831683
Merge pull request #14318 from damemi/3.5-tracing-sample
Change default sampling rate from 100% to 0%
2022-08-07 07:14:35 +08:00
Mike Dame
4c013c91e9
Change default sampling rate from 100% to 0%
This changes the default parent-based trace sampling rate from
100% to 0%. Due to the high QPS etcd can handle, having 100% trace
sampling leads to very high resource usage. Defaulting to 0% means
that only already-sampled traces will be sampled in etcd.

Fixes #14310

Signed-off-by: Mike Dame <mikedame@google.com>
2022-08-05 15:00:40 +00:00
Marek Siarkowicz
9d7e10863e
Merge pull request #14227 from mitake/perm-cache-lock-3.5
server/auth: protect rangePermCache with a RW lock
2022-07-20 10:36:00 +02:00
Hitoshi Mitake
e15c005fef server/auth: protect rangePermCache with a RW lock
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
2022-07-19 15:56:12 +09:00
Benjamin Wang
3237289fff
Merge pull request #14222 from Jille/backport-14203
[3.5] clientv3: Fix parsing of ETCD_CLIENT_DEBUG
2022-07-15 08:27:07 +08:00
Jille Timmermans
cbedaf90fe Improve error message for incorrect values of ETCD_CLIENT_DEBUG
Signed-off-by: Jille Timmermans <jille@quis.cx>
2022-07-14 09:43:54 +02:00
Benjamin Wang
fb71790611
Merge pull request #14219 from ahrtr/3.5_backport_maxstream
[3.5] Support configuring `MaxConcurrentStreams` for http2
2022-07-13 16:57:48 +08:00
Benjamin Wang
ff447b4a35 add e2e test cases to cover the maxConcurrentStreams
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-13 14:43:44 +08:00
Benjamin Wang
437f3778d0 Add flag --max-concurrent-streams to set the max concurrent stream each client can open at a time
Also refer to https://github.com/etcd-io/etcd/pull/14169#discussion_r917154243

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-13 14:18:15 +08:00
Benjamin Wang
40d1a43176 add the uint32Value data type
The golang buildin package `flag` doesn't support `uint32` data
type, so we need to support it via the `flag.Var`.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-13 13:57:22 +08:00
Benjamin Wang
57c1d92e20
Merge pull request #14187 from spzala/automated-cherry-pick-of-#14182-upstream-release-3.5
Automated cherry pick of #14182
2022-07-03 19:02:10 +08:00
Sahdev Zala
4df61af2df Client: fix check for WithPrefix op
Make sure that WithPrefix correctly set the flag, and add test.
Also, add test for WithFromKey.

fixes #14056

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
2022-07-02 23:33:26 -04:00
Marek Siarkowicz
c9f7473173
Merge pull request #14132 from ahrtr/auth_bundle
[3.5] client/v3: do not overwrite authTokenBundle on dial
2022-06-20 10:46:39 +02:00
Benjamin Wang
df632abd8a client/v3: do not overwrite authTokenBundle on dial
Cherry pick the PR https://github.com/etcd-io/etcd/pull/12992
to 3.5, so please refer to the original PR for more detailed info.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-06-18 06:59:55 +08:00
Benjamin Wang
4443e14dcd
Merge pull request #14127 from ahrtr/threshold_3.5
[3.5] Restrict the max size of each WAL entry to the remaining size of the WAL file
2022-06-17 15:03:35 +08:00
Benjamin Wang
621cd7b9e5 restrict the max size of each WAL entry to the remaining size of the file
Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.

On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-06-17 09:01:29 +08:00
Benjamin Wang
db0b67e8a0 Add FileReader and FileBufReader utilities
The FileReader interface is the wrapper of io.Reader. It provides
the fs.FileInfo as well. The FileBufReader struct is the wrapper of
bufio.Reader, it also provides fs.FileInfo.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-06-17 09:00:43 +08:00
Marek Siarkowicz
0be65da6cc
Merge pull request #14087 from ahrtr/lease_revoke_race
[3.5] Backport two lease related bug fixes to 3.5
2022-06-06 16:58:04 +02:00
Benjamin Wang
acb1ee993a Backport two lease related bug fixes to 3.5
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels

The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
2022-06-04 14:01:08 +08:00
Marek Siarkowicz
73876b176f
Merge pull request #14050 from serathius/avoid-clone-v3.5
[release-3.5] scripts: Avoid additional repo clone
2022-05-18 13:48:51 +02:00
Marek Siarkowicz
6aa934e546 scripts: Detect staged files before building release 2022-05-18 13:11:10 +02:00
Marek Siarkowicz
c05b9b13a8 scripts: Avoid additional repo clone
This PR removes additional clone when building artifacts.

When releasing v3.5.4 this clone was main cause of issues and
confusion about what release script is doing.

release.sh script already clones repo in /tmp/ directory, so clonning
before build is not needed. As precautions for bug in script leaving
/tmp/ clone in bad state  I moved "Verify the latest commit has the
version tag" and added "Verify the clean working tree" to be always run
before build.
2022-05-18 10:19:35 +02:00
Marek Siarkowicz
2e76dfb657
Merge pull request #14043 from serathius/test-release-3.5-v2
[release-3.5] Test release scripts
2022-05-16 14:03:01 +02:00
Marek Siarkowicz
c4b0a569ba Make DRY_RUN explicit 2022-05-16 13:10:05 +02:00
Marek Siarkowicz
c76a010b48 scripts: Add tests for release scripts 2022-05-16 13:09:46 +02:00
Piotr Tabor
b57881a164
Merge pull request #13205 from cfz/cherry-pick-#13172
[backport 3.5]: server/auth: enable tokenProvider if recoved store enables auth
2022-05-06 13:05:50 +02:00
cfz
cceb25d758
server/auth: enable tokenProvider if recoved store enables auth
we found a lease leak issue:
if a new member(by member add) is recovered by snapshot, and then
become leader, the lease will never expire afterwards. leader will
log the revoke failure caused by "invalid auth token", since the
token provider is not functional, and drops all generated token
from upper layer, which in this case, is the lease revoking
routine.
2022-05-06 12:24:28 +08:00
Piotr Tabor
8453b10e58
Merge pull request #13996 from cmurphy/update-crypto-3.5
Update golang.org/x/crypto to latest
2022-05-05 10:35:47 +02:00