23 Commits

Author SHA1 Message Date
Gyuho Lee
cfe37de6c0 rafthttp: log snapshot download duration
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-05-20 11:37:01 -07:00
Gyuho Lee
a668adba78 rafthttp: improve snapshot send logging
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-05-18 11:39:24 -07:00
yoyinzyc
97e68cf4e7 rafthttp: add 3.4 stream type 2019-10-17 14:33:53 -07:00
Gyuho Lee
abdb7ca17b etcdserver/api: add "etcd_network_snapshot_send_inflights_total", "etcd_network_snapshot_receive_inflights_total"
Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 14:01:45 -07:00
Nima Yahyazadeh
b1812a410f Raft HTTP: fix pause/resume race condition 2019-06-17 11:45:25 -04:00
Gyuho Lee
34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
shivaramr
9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
nolouch
decc0d5f43 api/rafthttp: fix the probing status print
Signed-off-by: nolouch <nolouch@gmail.com>
2019-04-23 19:51:34 +08:00
johncming
bd41f74168 etcdserver/api/rafthttp: fix the location of close http body. 2019-03-11 22:20:38 +08:00
zhoulin xie
a943ad0ee4 client/keys_bench_test.go: Fix some misspells
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-28 14:36:06 -05:00
caoming
4651f49a5c api/rafthttp: remove deprecated req.Cancel. 2019-01-07 10:12:47 +08:00
Gyuho Lee
884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee
7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee
4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee
47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
Gyuho Lee
1399bc69ce etcdserver: update import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:55 -07:00
Gyuho Lee
156ff6461d etcdserver/api/rafthttp: clarify "became inactive" warning
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:45:53 -07:00
Gyuho Lee
6f4c509ad8 etcdserver/api/rafthttp: add v3 snapshot send/receive metrics
Distribution would be:
0.1 second or more
...
25.6 seconds or more
51.2 seconds or more

etcd_network_snapshot_send_success
etcd_network_snapshot_send_failures
etcd_network_snapshot_send_total_duration_seconds
etcd_network_snapshot_receive_success
etcd_network_snapshot_receive_failures
etcd_network_snapshot_receive_total_duration_seconds

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-15 12:56:50 -07:00
Gyuho Lee
8990126c17 rafthttp: add "RaftDropHeartbeat" failpoint
To simulate network partition locally.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:10:58 -07:00
Gyuho Lee
3821f3364d etcdserver/api/rafthttp: add "etcd_network_active_peers/disconnected_peers_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:23:45 -07:00
Gyuho Lee
640f5e64a9 etcdserver/api/rafthttp: document round-trip metrics, clean up
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee
5a9e48be30 etcdserver/api/rafthttp: increase bucket upperbound up-to 3-sec
From 0.8 sec to 3.2 sec for more detailed latency analysis

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee
7940113906 *: move internal "etcdserver/api/rafthttp"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 10:31:16 -07:00