There is data race on `stop` channel. After verify write-timeout successfully,
the case won't wait for `blocker` to receive close signal from `stop` channel.
If the new `blocker`, which is to read-timeout verifier, get dial's result
immediately, the new `blocker` might fetch the message from `stop` channel
before old one and then close the connection, which causes that the
`conn.Read` returns `EOF` when it reads data.
How to reproduce this in linux devbox?
Use `taskset` to limit the test process in one-cpu.
```bash
cd ./client/pkg/transport
go test -c -o /tmp/test --race=true ./
taskset -c 0 /tmp/test -test.run TestWriteReadTimeoutListener -test.v -test.cpu 4 -test.count=10000 -test.failfast
```
To fix this, suggest to use seperate `stop` channel to prevent from data
race.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
When the leader detects data inconsistency by comparing hashes,
currently it assumes that the follower is the corrupted member.
It isn't correct, the leader might be the corrupted member as well.
We should depend on quorum to identify the corrupted member.
For example, for 3 member cluster, if 2 members have the same hash,
the the member with different hash is the corrupted one. For 5 member
cluster, if 3 members have the same same, the corrupted member is one
of the left two members; it's also possible that both the left members
are corrupted.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.
On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The FileReader interface is the wrapper of io.Reader. It provides
the fs.FileInfo as well. The FileBufReader struct is the wrapper of
bufio.Reader, it also provides fs.FileInfo.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
Downstream users of etcd experience build issues when using dependencies
which require more recent (incompatible) versions of opentelemetry. This
commit upgrades the dependencies so that downstream users stop
experiencing these issues.
The code now ensures that each of the test is running in its own directory as opposed to shared os.tempdir.
```
$ (cd tests && env go test -timeout=15m --race go.etcd.io/etcd/tests/v3/integration/clientv3/examples -run ExampleAuth)
2022/04/03 10:24:59 Running tests (examples): ...
2022/04/03 10:24:59 the function can be called only in the test context. Was integration.BeforeTest() called ?
2022/04/03 10:24:59 2022-04-03T10:24:59.462+0200 INFO m0 LISTEN GRPC {"member": "m0", "grpcAddr": "localhost:m0", "m.Name": "m0"}
```
```
% (cd client/v3 && env go test -short -timeout=3m --race ./...)
--- FAIL: TestAuthTokenBundleNoOverwrite (0.00s)
client_test.go:210: listen unix /var/folders/t1/3m8z9xz93t9c3vpt7zyzjm6w00374n/T/TestAuthTokenBundleNoOverwrite3197524989/001/etcd-auth-test:0: bind: invalid argument
FAIL
FAIL go.etcd.io/etcd/client/v3 4.270s
```
The reason was that the path exceeded 108 chars (that is too much for socket).
In the mitigation we first change chroot (working directory) to the tempDir... such the path is 'local'.