1986 Commits

Author SHA1 Message Date
Marek Siarkowicz
9399dd1628 Fix progress notification for watch that doesn't get any events
When implementing the fix for progress notifications
(https://github.com/etcd-io/etcd/pull/15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-03-11 20:20:15 +01:00
Benjamin Wang
ff278c49c4
Merge pull request #17465 from ivanvc/release-3.4-backport-ignore-old-leader-leases-revoking-requests
[3.4] backport ignore old leader leases revoking requests
2024-02-22 09:50:25 +00:00
Ivan Valdes
bf04c67408
Backport ignore old leader's leases revoking request
Backport of PR #16822, commits f7e488dc9262685d6624755e0d3bb0a655863248,
67f17166bf2ba337dafb8e0ea8eea5f74a990767,
and f7ff898fd6c2d6dbb54278343073aa4fa5f46a03.

Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>
Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-02-20 11:31:29 -08:00
Siyuan Zhang
2caf0f0a12 Add schema verification when closing etcd.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-02-20 08:55:55 -08:00
Siyuan Zhang
83f97c1f46 Add handling of AuthStatusRequest.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-02-20 08:55:55 -08:00
Siyuan Zhang
c65c1ea559 Add function to migrate 3.5 data online.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-02-20 08:55:55 -08:00
Bogdan Kanivets
e7da7ebf7e add flag to allow downgrade from 3.5
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-02-20 08:55:55 -08:00
Wei Fu
51c99dd3fd etcdserver: drain leaky goroutines before test completed
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2024-02-06 12:58:35 +08:00
Marek Siarkowicz
73814a46f9 Don't flock snapshot files
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-01-08 15:11:44 +01:00
Benjamin Wang
f06c6e6189
Merge pull request #17144 from siyuanfoundation/livez-bp-3.4-e2e
[3.4] Backport e2e tests for livez/readyz.
2023-12-21 19:12:57 +00:00
Siyuan Zhang
c43530c402 [3.4] backport health check e2e tests.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-21 09:33:11 -08:00
Marek Siarkowicz
6723e3cc44 Check if be is nil to avoid panic when be is overriden with nil by recoverSnapshotBackend on line 471
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-12-20 17:01:01 +01:00
Siyuan Zhang
b6ab23900d etcdserver: add linearizable_read check to readyz.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-15 14:00:22 -08:00
Siyuan Zhang
c58ef8d10f etcdserver: add metric counters for livez/readyz health checks.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-15 08:39:51 -08:00
Siyuan Zhang
f4c229a41d etcdserver: add livez and ready http endpoints for etcd.
Add two separate probes, one for liveness and one for readiness. The liveness probe would check that the local individual node is up and running, or else restart the node, while the readiness probe would check that the cluster is ready to serve traffic. This would make etcd health-check fully Kubernetes API complient.

Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-15 08:39:51 -08:00
Chao Chen
d4861d660b http health check bug fixes
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-12-15 08:39:51 -08:00
Benjamin Wang
963af731bd
Merge pull request #17120 from siyuanfoundation/livez-bp-3.4
[3.4] Backport healthcheck code cleanup
2023-12-15 09:52:50 +00:00
Marek Siarkowicz
4a8381a461 server: Split metrics and health code
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-14 10:25:43 -08:00
Siyuan Zhang
cc44646a2e server: Cover V3 health with tests
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-14 10:23:49 -08:00
Siyuan Zhang
f009772c84 server: Refactor health checks
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-14 10:23:36 -08:00
Ivan Valdes
838cd9aa00
server: disable redirects in peer communication
Disable following redirects from peer HTTP communication on the client's side.
Etcd server may run into SSRF (Server-side request forgery) when adding a new
member. If users provide a malicious peer URL, the existing etcd members may be
redirected to another unexpected internal URL when getting the new member's
version.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2023-12-13 09:21:53 -08:00
Marek Siarkowicz
e74970d5a1 server: Run health check tests in subtests
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-11 17:07:09 -08:00
Marek Siarkowicz
34d2e743d2 server: Rename test case expect fields
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-11 17:04:08 -08:00
Marek Siarkowicz
ddf7a69fba server: Use named struct initialization in healthcheck test
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-12-11 17:03:48 -08:00
Benjamin Wang
75d2407fc0
Merge pull request #16990 from YaoC/backport-12890
[3.4] backport #12890 learner support snapshot RPC
2023-11-23 13:56:28 +00:00
Benjamin Wang
f3c0155f03
Merge pull request #16997 from chaochn47/release-3.4-upgrade-grpc-1.52.0
Release 3.4 upgrade grpc 1.52.0
2023-11-22 20:30:11 +00:00
Chao Chen
f549da33de backport https://github.com/etcd-io/etcd/pull/12709 and https://github.com/etcd-io/etcd/pull/12801 to resolve gogo unmarshal errors
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-11-22 11:14:50 -08:00
Benjamin Wang
c750e01e37 etcdserver: add cluster id check for hashKVHandler
backport https://github.com/etcd-io/etcd/pull/15924 to 3.4

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
2023-11-22 16:55:18 +00:00
tangcong
4ff558ee53 [3.4] backport #12890 learner support snapshot RPC
Signed-off-by: YaoC <chengyao09@hotmail.com>
2023-11-22 09:48:10 +00:00
CFC4N
1fc259d655 etcdserver: check authinfo if it is not InternalAuthenticateRequest.
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-10-25 12:51:30 -07:00
Piotr Tabor
6c0e4d97f1 Introduce grpc-1.30+ compatible client/v3/naming API.
This is not yet implementation, just API and tests to be filled
with implementation in next CLs,
tracked by: https://github.com/etcd-io/etcd/issues/12652

We propose here 3 packages:
 - clientv3/naming/endpoints ->
    That is abstraction layer over etcd that allows to write, read &
    watch Endpoints information. It's independent from GRPC API. It hides
    the storage details.

 - clientv3/naming/endpoints/internal ->
    That contains the grpc's compatible Update class to preserve the
    internal JSON mashalling format.

 - clientv3/naming/resolver ->
   That implements the GRPC resolver API, such that etcd can be
   used for connection.Dial in grpc.

Please see the grpc_naming.md document changes & grpcproxy/cluster.go
new integration, to see how the new abstractions work.

Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-10-19 12:59:24 -07:00
Thomas Jungblut
afa0167538 Add first unit test for authApplierV3
This contains a slight refactoring to expose enough information
to write meaningful tests for auth applier v3.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-06-16 10:08:47 +02:00
Thomas Jungblut
96d0831770 Early exit auth check on lease puts
Mitigates #15993 by not checking each key individually for permission
when auth is entirely disabled or admin user is calling the method.

Backport of #16005

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-06-06 11:45:28 +02:00
Hitoshi Mitake
71e85e9ded etcdserver: protect lease timetilive with auth
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-05-08 22:54:54 +09:00
Benjamin Wang
90e4d04c8e etcdserver: guarantee order of requested progress notification
Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.

This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.

This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.

1. Generate a message with WatchId -1, indicating the revision for
   *all* watchers in the stream

2. Guarantee that a response is (eventually) sent

The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-11 12:47:09 +08:00
Benjamin Wang
3618ab4b07 security: remove password after authenticating the user
fix https://nvd.nist.gov/vuln/detail/CVE-2021-28235

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-06 22:42:29 +09:00
J. David Lowe
cee78aca75 etcdserver: don't attempt to grant nil permission to a role
Prevent etcd from crashing when given a bad grant payload, e.g.:

$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/add
{"header":{"cluster_id":"14841639068965178418", ...
$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/grant
curl: (52) Empty reply from server

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: J. David Lowe <j.david.lowe@gmail.com>
2023-04-04 21:40:54 +09:00
Hitoshi Mitake
01c0d8b309 etcdserver: keep server side change of 14548
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
2023-03-28 21:43:17 +09:00
Hitoshi Mitake
c8f890cde1 Revert "*: handle auth invalid token and old revision errors in watch"
This reverts commit 0c6e466024ea2030380b13e3e2248b0b8fb879ca.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
2023-03-21 22:13:17 +09:00
James Blair
a91bacf567
Formatted source code for go 1.19.6.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-02-20 12:44:14 +13:00
Benjamin Wang
00b31512a1 etcdserver: return membership.ErrIDNotFound when the memberID not found
Backport https://github.com/etcd-io/etcd/pull/15095 to 3.4.

When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-01-17 06:27:31 +08:00
Benjamin Wang
5413ce46dc bump go version to 1.17.3
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-19 18:34:04 +08:00
Benjamin Wang
acca4fa93e etcdserver: fix nil pointer panic for readonly txn
Backporting https://github.com/etcd-io/etcd/pull/14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-12-06 18:09:47 +08:00
Benjamin Wang
2f4f7328d0 etcdserver: intentionally set the memberID as 0 in corruption alarm
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-11-25 15:58:23 +08:00
Kafuu Chino
ed10ca13f4 *: avoid closing a watch with ID 0 incorrectly
Signed-off-by: Kafuu Chino <KafuuChinoQ@gmail.com>

add test

1

1

1
2022-10-10 19:54:58 +08:00
Hitoshi Mitake
0c6e466024 *: handle auth invalid token and old revision errors in watch
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
2022-10-04 22:49:06 +09:00
Benjamin Wang
29911e9a5b etcdserver: fix memberID equals to zero in corruption alarm
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-28 11:01:26 +08:00
Benjamin Wang
b2b7b9d535
Merge pull request #14423 from serathius/one_member_data_loss_raft_3_4
[release-3.4] fix the potential data loss for clusters with only one member
2022-09-06 03:29:45 +08:00
Benjamin Wang
119e4dda19 fix the potential data loss for clusters with only one member
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.

When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
   1. etcd commits the boltDB transaction periodically instead of on each request;
   2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.

For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-05 14:15:47 +02:00
Vladimir Sokolov
38342e88da etcdserver: nil-logger issue fix for version 3.4
In v3.5 it is assumed that the logger should not be nil, however it is
still a case in v3.4. The PR targeted to v3.5 was backported to 3.4 and
that's why it's possible to get panic on nil logger in 3.4. This commit
fixed this issue.

Fixes #14402

Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
2022-09-03 04:34:03 +03:00