This is not yet implementation, just API and tests to be filled
with implementation in next CLs,
tracked by: https://github.com/etcd-io/etcd/issues/12652
We propose here 3 packages:
- clientv3/naming/endpoints ->
That is abstraction layer over etcd that allows to write, read &
watch Endpoints information. It's independent from GRPC API. It hides
the storage details.
- clientv3/naming/endpoints/internal ->
That contains the grpc's compatible Update class to preserve the
internal JSON mashalling format.
- clientv3/naming/resolver ->
That implements the GRPC resolver API, such that etcd can be
used for connection.Dial in grpc.
Please see the grpc_naming.md document changes & grpcproxy/cluster.go
new integration, to see how the new abstractions work.
Signed-off-by: Chao Chen <chaochn@amazon.com>
This contains a slight refactoring to expose enough information
to write meaningful tests for auth applier v3.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Mitigates #15993 by not checking each key individually for permission
when auth is entirely disabled or admin user is calling the method.
Backport of #16005
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.
This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.
1. Generate a message with WatchId -1, indicating the revision for
*all* watchers in the stream
2. Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Prevent etcd from crashing when given a bad grant payload, e.g.:
$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/add
{"header":{"cluster_id":"14841639068965178418", ...
$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/grant
curl: (52) Empty reply from server
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: J. David Lowe <j.david.lowe@gmail.com>
Backport https://github.com/etcd-io/etcd/pull/15095 to 3.4.
When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.
When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
1. etcd commits the boltDB transaction periodically instead of on each request;
2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.
For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In v3.5 it is assumed that the logger should not be nil, however it is
still a case in v3.4. The PR targeted to v3.5 was backported to 3.4 and
that's why it's possible to get panic on nil logger in 3.4. This commit
fixed this issue.
Fixes#14402
Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
This is a backport of https://github.com/etcd-io/etcd/pull/13435 and is
part of the work for 3.4.20
https://github.com/etcd-io/etcd/issues/14232.
The original change had a second commit that modifies a changelog file.
The 3.4 branch does not include any changelog file, so that part was not
cherry-picked.
Local Testing:
- `make build`
- `make test`
Both succeed.
Signed-off-by: Ramsés Morales <ramses@gmail.com>
To improve debuggability of `agreement among raft nodes before
linearized reading`, we added some tracing inside
`linearizableReadLoop`.
This will allow us to know the timing of `s.r.ReadIndex` vs
`s.applyWait.Wait(rs.Index)`.
Signed-off-by: Chao Chen <chaochn@amazon.com>
Cherry pick https://github.com/etcd-io/etcd/pull/13932 to 3.4.
When etcdserver receives a LeaseRenew request, it may be still in
progress of processing the LeaseGrantRequest on exact the same
leaseID. Accordingly it may return a TTL=0 to client due to the
leaseID not found error. So the leader should wait for the appliedID
to be available before processing client requests.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels
The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Items resolved:
1. fix the vet error: possible misuse of reflect.SliceHeader;
2. fix the vet error: call to (*T).Fatal from a non-test goroutine;
3. bump package golang.org/x/crypto, net and sys;
4. bump boltdb from 1.3.3 to 1.3.6;
5. remove the vendor directory;
6. remove go 1.12.17 and 1.15.15, add go 1.16.15 into pipeline;
7. bump go version to 1.16 in go.mod;
8. fix the issue: compile: version go1.16.15 does not match go tool version go1.17.11,
refer to https://github.com/actions/setup-go/issues/107;
9. fix data race on compactMainRev and watcherGauge;
10. fix test failure for TestLeasingTxnOwnerGet in cluster_proxy mode.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
To fix a panic that happens when trying to get ids of etcd members in
force new cluster mode, the issue happen if the cluster previously had
etcd learner nodes added to the cluster
Fixes#12285