Unlike SnapshotWithVersion, the client.Snapshot doesn't wait for first
response. The server could open db after we close connection or shutdown
the server. We can read few bytes to ensure server opens boltdb.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
1. ignore old leader's leases revoking request
2. double check current member's leadership before perform lease renew request
3. etcdserve: ensure current member's leadership before performing lease checkpoint request
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Changed TraceKey/StartTimeKey/TokenFieldNameGRPCKey to struct{} to
follow the correct usage of context. Similar patch to #8901.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
fix the unexpected blocking when using Barrier.Wait(), e.g.
NewBarrier(client, "a").Wait() will block if key "a" is not existed but "a0" is existed, but it should return immediately.
Signed-off-by: zhangwenkang <zwenkang@vmware.com>
1. rename confChangeCh to raftAdvancedC
2. rename waitApply to confChanged
3. add comments and test assertion
Signed-off-by: Chao Chen <chaochn@amazon.com>
If start grpc proxy with --resolver-prefix, memberlist will return all alive proxy nodes, when one grpc proxy node is down, it is expected to not return the down node, but it is still return
Signed-off-by: yellowzf <zzhf3311@163.com>
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.
Signed-off-by: James Blair <mail@jamesblair.net>
It's followup of #15667.
This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.
The test env
```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```
Before change:
* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04
After change:
* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07
Based on the test result, I think it's safe to be enabled by default.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.
Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.
T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)
After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.
However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.
```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...
INFO m2.raft 3da8ba707f1a21a4 became leader at term 2 {"member": "m2"}
...
INFO m2 triggering snapshot {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader
INFO m0.raft 99626fe5001fde8b became follower at term 2 {"member": "m0"}
INFO m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2 {"member": "m0"}
DEBUG m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23 {"member": "m2"}
DEBUG m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15] {"member": "m2"}
DEBUG m0 Applying entries {"member": "m0", "num-entries": 15}
DEBUG m0 Applying entry {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}
....
INFO m2 saved snapshot {"member": "m2", "snapshot-index": 22}
INFO m2 compacted Raft logs {"member": "m2", "compact-index": 17}
```
To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.
Fixes: #15545
Signed-off-by: Wei Fu <fuweid89@gmail.com>