For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.
When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
1. etcd commits the boltDB transaction periodically instead of on each request;
2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.
For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.
Also refer to https://github.com/etcd-io/etcd/pull/14356
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Streams are now closed after being used in the lessor `keepAliveOnce` method.
This prevents the "failed to receive lease keepalive request from gRPC stream"
message from being logged by the server after the context is cancelled by the
client.
Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
This changes the default parent-based trace sampling rate from
100% to 0%. Due to the high QPS etcd can handle, having 100% trace
sampling leads to very high resource usage. Defaulting to 0% means
that only already-sampled traces will be sampled in etcd.
Fixes#14310
Signed-off-by: Mike Dame <mikedame@google.com>
The golang buildin package `flag` doesn't support `uint32` data
type, so we need to support it via the `flag.Var`.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Make sure that WithPrefix correctly set the flag, and add test.
Also, add test for WithFromKey.
fixes#14056
Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
Cherry pick the PR https://github.com/etcd-io/etcd/pull/12992
to 3.5, so please refer to the original PR for more detailed info.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.
On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The FileReader interface is the wrapper of io.Reader. It provides
the fs.FileInfo as well. The FileBufReader struct is the wrapper of
bufio.Reader, it also provides fs.FileInfo.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels
The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
This PR removes additional clone when building artifacts.
When releasing v3.5.4 this clone was main cause of issues and
confusion about what release script is doing.
release.sh script already clones repo in /tmp/ directory, so clonning
before build is not needed. As precautions for bug in script leaving
/tmp/ clone in bad state I moved "Verify the latest commit has the
version tag" and added "Verify the clean working tree" to be always run
before build.