There is data race on `stop` channel. After verify write-timeout successfully,
the case won't wait for `blocker` to receive close signal from `stop` channel.
If the new `blocker`, which is to read-timeout verifier, get dial's result
immediately, the new `blocker` might fetch the message from `stop` channel
before old one and then close the connection, which causes that the
`conn.Read` returns `EOF` when it reads data.
How to reproduce this in linux devbox?
Use `taskset` to limit the test process in one-cpu.
```bash
cd ./client/pkg/transport
go test -c -o /tmp/test --race=true ./
taskset -c 0 /tmp/test -test.run TestWriteReadTimeoutListener -test.v -test.cpu 4 -test.count=10000 -test.failfast
```
To fix this, suggest to use seperate `stop` channel to prevent from data
race.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.
We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The correct param forMaxCallRecvMsgSize is '--max-recv-bytes' instead of '--max-request-bytes', so I fixed the documentation and description.
Signed-off-by: cleverhu <shouping.hu@daocloud.io>
When users use the TLS CommonName based authentication, the
authTokenBundle is always nil. But it's possible for the clients
to get `rpctypes.ErrAuthOldRevision` response when the clients
concurrently modify auth data (e.g, addUser, deleteUser etc.).
In this case, there is no need to refresh the token; instead the
clients just need to retry the operations (e.g. Put, Delete etc).
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Deprecated: use WithTransportCredentials and insecure.NewCredentials() instead. Will be supported throughout 1.x.
Signed-off-by: Ramil Mirhasanov <ramil600@yahoo.com>
Check the values of myKey and myRev first in Unlock method to prevent calling Unlock without Lock. Because this may cause the value of pfx to be deleted by mistake.
Signed-off-by: chenyahui <cyhone@qq.com>
The existing client may connect to different endpoint from the
specific endpoint to be maintained. Maintenance operations do not
go through raft at all, so it might run into issue if the server
hasn't finished applying the authentication request.
Let's work with an example. Assuming the existing client connects to
ep1, while the user wants to maintain ep2. If we getToken again, it
sends an authentication request, which goes through raft. When the
specific endpoint receives the maintenance request, it might haven't
finished previous authentication request, but the new token is already
carried in the context, so it will reject the maintenance request
due to invalid token.
We already have retry logic in `unaryClientInterceptor` and
`streamClientInterceptor`. When the token expires, it can automatically
refresh the token, so it should be safe to remove the `getToken`
logic in `maintenance.dial`
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Check the client count before creating the ephemeral key, do not
create the key if there are already too many clients. Check the
count after creating the key again, if the total kvs is bigger
than the expected count, then check the rev of the current key,
and take action accordingly based on its rev. If its rev is in
the first ${count}, then it's valid client, otherwise, it should
fail.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The client retries connection without backoff when the server is gone
after the watch stream is established. This results in high CPU usage
in the client process. This change introduces backoff when the stream is
failed and unavailable.
Signed-off-by: Hisanobu Tomari <posco.grubb@gmail.com>
Streams are now closed after being used in the lessor `keepAliveOnce` method.
This prevents the "failed to receive lease keepalive request from gRPC stream"
message from being logged by the server after the context is cancelled by the
client.
Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.
Signed-off-by: Benjamin Wang <wachao@vmware.com>