295 Commits

Author SHA1 Message Date
Gyuho Lee
1c8fab7365 etcdserver/api: add "etcd_network_snapshot_send_inflights_total", "etcd_network_snapshot_receive_inflights_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 15:12:08 -07:00
Nima Yahyazadeh
9f1d6ca1c9 Raft HTTP: fix pause/resume race condition
(cherry picked from commit b1812a410fbca6fb77bf95b496408c7b75d0a370)
2019-06-17 13:33:27 -04:00
Gyuho Lee
b45f5306dc rafthttp: probe all raft transports
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:18:27 -07:00
Gyuho Lee
d838e24f80 etcdserver/api/rafthttp: add v3 snapshot send/receive metrics
Distribution would be:
0.1 second or more
...
25.6 seconds or more
51.2 seconds or more

etcd_network_snapshot_send_success
etcd_network_snapshot_send_failures
etcd_network_snapshot_send_total_duration_seconds
etcd_network_snapshot_receive_success
etcd_network_snapshot_receive_failures
etcd_network_snapshot_receive_total_duration_seconds

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-03 11:12:42 -07:00
Gyuho Lee
c577335a64 rafthttp: clarify "became inactive" warning
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-29 14:34:15 -07:00
Gyuho Lee
863a56a998 rafthttp: add missing "peer_sent_failures_total" metrics call
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-14 12:44:38 -04:00
Gyuho Lee
6fe7316ec4 rafthttp: add "ActivePeers" to "Transport"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-10 20:05:35 -08:00
Gyuho Lee
bd9bd71a61 rafthttp: add 3.3.0 support
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2017-12-20 13:34:12 -08:00
Anthony Romano
846255b95e Merge pull request #8513 from shenlanse/bug-fix
rafthttp: add remote in pipeline and snapshot handler
2017-09-12 13:48:56 -07:00
blueblue
5f36875272 rafthttp: add remote in pipeline and snapshot handler when corresponding peer or remote do not exist
Fixes: #8506
2017-09-12 18:38:18 +08:00
Gyu-Ho Lee
0b2d8a6c96 *: fix minor typos
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-11 07:33:35 -07:00
Gyu-Ho Lee
f65aee0759 *: replace 'golang.org/x/net/context' with 'context'
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-07 13:39:42 -07:00
blueblue
2bb893b478 rafthttp: add remote in pipeline and snapshot handler when corresponding peer or remote do not exist
Fixes: #8506
2017-09-07 13:49:39 +08:00
blueblue
a0361ea3f9 rafthttp: add remote in pipeline and snapshot handler when corresponding peer or remote do not exist
Fixes: #8506
2017-09-07 10:14:54 +08:00
Anthony Romano
9543431aeb rafthttp: permit very large v2 snapshots
v2 snapshots were hitting the 512MB message decode limit, causing
sending snapshots to new members to fail for being too big.
2017-06-09 10:41:27 -07:00
Anthony Romano
887db5a3db *: fix go tool vet -all -shadow errors 2017-06-03 21:32:36 -07:00
Vitaly Isaev
4301f49988 rafthttp: configurable stream reader retry timeout
rafthttp.Transport.DialRetryTimeout field alters the frequency of dial attempts
+ minor changes after code review
2017-06-02 08:53:17 -07:00
Anthony Romano
1153e1e7d9 Merge pull request #7687 from heyitsanthony/deny-tls-ipsan
transport: deny incoming peer certs with wrong IP SAN
2017-04-13 15:03:25 -07:00
Gyu-Ho Lee
56b111df0c rafthttp: use 'transport.IsClosedConnError'
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-13 11:55:22 -07:00
Anthony Romano
cad1215b18 *: deny incoming peer certs with wrong IP SAN 2017-04-12 13:41:33 -07:00
Gyu-Ho Lee
8db8d01712 rafthttp: move test-only functions to '_test.go'
Not used in actual code base, only used in tests

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-10 16:07:31 -07:00
Gyu-Ho Lee
3d75395875 *: remove never-unused vars, minor lint fix
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-06 14:59:12 -08:00
sharat
2656b594bb rafthttp: use http.Request.WithContext instead of Cancel 2017-02-02 02:30:36 +05:30
Gyu-Ho Lee
fa9a78450c
rafthttp: add 3.2.0 stream type 2017-01-13 14:23:15 -08:00
Gyu-Ho Lee
d25f9feb19
rafthttp: bump up timeout in pipeline test
Fix https://github.com/coreos/etcd/issues/6283.

The timeout is too short. It could take more than 10ms
to send when the buffer gets full after 'pipelineBufSize' of
requests.
2016-12-30 09:46:16 -08:00
Gyu-Ho Lee
0626ee048e rafthttp: fix gofmt issues with go tip 2016-10-20 16:32:56 -07:00
Gyu-Ho Lee
8827619f5b rafthttp: add v3.x to supported streams 2016-09-16 20:49:00 +09:00
Xiang Li
fb760b4c53 Merge pull request #6403 from vimalk78/rafthttp-mertics-record-rw-failures
rafthttp/metrics.go:fixed TODO: record write/recv failures.
2016-09-15 02:46:20 -05:00
Vimal Kumar
64e1a327ee rafthttp/metrics.go:fixed TODO: record write/recv failures. 2016-09-15 11:32:08 +05:30
Xiang Li
0d35ba9b94 rafthttp: fix TestPipelineExceedMaximumServing
The timeout is too short. It might take more than 10ms to send
request over a blocking chan (buffer is full). Changing the timeout
to 1 second can fix this issue.
2016-09-13 19:06:11 +08:00
Anthony Romano
0250f0c984 rafthttp: log stream stopped message before closing channel
Was causing spurious goroutine leak failures in testing.
2016-09-09 12:47:06 -07:00
Anthony Romano
96ed856bca Merge pull request #6345 from topecongiro/patch-1
rafthttp: remove unnecessary sendc from peer
2016-09-06 11:32:16 -07:00
Nikita Vetoshkin
da26e230a0 rafthttp: fix misprint in readBytesLimit value
and make test path in restricted test environments
2016-09-05 11:06:08 +05:00
Gyu-Ho Lee
5c8ba23767 rafthttp: check decode size before buffer alloc
Fix https://github.com/coreos/etcd/issues/5386.
2016-09-05 14:06:03 +09:00
topecongiro
ec9e77db96 rafthttp: remove unnecessary sendc from peer 2016-09-04 13:07:31 +09:00
Anthony Romano
784c4446d9 rafthttp: fix race in TestStreamWriterAttachOutgoingConn
Fixes #6230
2016-08-19 19:59:16 -07:00
Anthony Romano
da1e022890 rafthttp: remove WaitSchedule() from tests
Fixes #6187
2016-08-18 16:26:35 -07:00
Gyu-Ho Lee
bd450c1ba3 rafthttp: use reportCriticalError, fix typo 2016-08-15 10:40:58 -07:00
Anthony Romano
9eb6ea34bd Merge pull request #6175 from heyitsanthony/fix-conn-race
rafthttp: fix race between streamReader.stop() and connection closer
2016-08-15 09:27:24 -07:00
Anthony Romano
911c8442b7 rafthttp: fix race between streamReader.stop() and connection closer 2016-08-15 01:36:09 -07:00
Gyu-Ho Lee
0503676bde rafthttp: fix httputil.RequestCanceler 2016-08-14 14:36:51 -07:00
Gyu-Ho Lee
937ae658dd rafthttp: add Transport.Cut/MendPeer
From https://github.com/coreos/etcd/pull/6140.
2016-08-10 17:09:35 -07:00
Anthony Romano
59ac42ff38 Merge pull request #6073 from heyitsanthony/rafthttp-close-stream
rafthttp: close http socket when pipeline handler gets a raft error
2016-07-31 21:49:04 -07:00
Anthony Romano
911dcc9386 rafthttp: close http socket when pipeline handler gets a raft error
Otherwise the http stream remains open and keeps receiving raft messages.
This can lead to "raft: stopped" log spam on closing an embedded server.

Fixes #5981
2016-07-31 20:25:42 -07:00
Xiang Li
9311d7b77e rafthttp: log health checking error early 2016-07-31 19:58:22 -07:00
Anthony Romano
3a080143a7 rafthttp: make health check meaning clearer 2016-07-06 10:31:13 -07:00
Nikita Vetoshkin
fd5bc21522 rafthttp: use pointers to avoid extra copies upon message encoding 2016-06-29 21:17:18 +05:00
Gyu-Ho Lee
e221699fd8 rafthttp: fix from go vet, go lint 2016-06-22 12:04:15 -07:00
Xiang Li
6af0917812 *: add peer prefix for network metrics between peers 2016-06-17 11:59:49 -07:00
Anthony Romano
dc91da50b5 rafthttp: snapshot tests 2016-06-06 11:38:11 -07:00