fix the unexpected blocking when using Barrier.Wait(), e.g.
NewBarrier(client, "a").Wait() will block if key "a" is not existed but "a0" is existed, but it should return immediately.
Signed-off-by: zhangwenkang <zwenkang@vmware.com>
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return
Signed-off-by: yellowzf <zzhf3311@163.com>
Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.
This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.
1. Generate a message with WatchId -1, indicating the revision for
*all* watchers in the stream
2. Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In order to fix https://github.com/etcd-io/etcd/issues/12385,
PR https://github.com/etcd-io/etcd/pull/14322 introduced a change
in which the client side may retry based on the error message
returned from server side.
This is not good, as it's too fragile and it's also changed the
protocol between client and server. Please see the discussion
in https://github.com/kubernetes/kubernetes/pull/114403
Note: The issue https://github.com/etcd-io/etcd/issues/12385 only
happens when auth is enabled, and client side reuse the same client
to watch.
So we decided to rollback the change on 3.5, reasons:
1.K8s doesn't enable auth at all. It has no any impact on K8s.
2.It's very easy for client application to workaround the issue.
The client just needs to create a new client each time before watching.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Check the client count before creating the ephemeral key, do not
create the key if there are already too many clients. Check the
count after creating the key again, if the total kvs is bigger
than the expected count, then check the rev of the current key,
and take action accordingly based on its rev. If its rev is in
the first ${count}, then it's valid client, otherwise, it should
fail.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
When a user sets up a Mirror with a restricted user that doesn't have
access to the `foo` path, we will fail to get the most recent revision
due to permissions issues.
With this change, when a prefix is provided we will get the initial
revision from the prefix rather than /foo. This allows restricted users
to setup sync.
When etcdserver receives a LeaseRenew request, it may be still in
progress of processing the LeaseGrantRequest on exact the same
leaseID. Accordingly it may return a TTL=0 to client due to the
leaseID not found error. So the leader should wait for the appliedID
to be available before processing client requests.
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Current checkpointing mechanism is buggy. New checkpoints for any lease
are scheduled only until the first leader change. Added fix for that
and a test that will check it.