Gyuho Lee
0f0919c19c
Merge pull request #10159 from gyuho/version-log
...
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee
d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
...
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee
59dd78dde8
etcdserver: clear message in cluster version decision
...
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Gyuho Lee
601d8b4677
etcdserver/api/etcdhttp: remove unused "HandleHealth" function
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee
004e04a1d1
etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee
884a8bd36b
etcdserver/api/rafthttp: configure "streamProber" in tests
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee
7b1ef37054
etcdserver/api/rafthttp: probe all Raft messages' RTT
...
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.
In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee
4a239070c8
etcdserver/api/rafthttp: display roundtripper name in warnings
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee
47cff4dfe5
etcdserver/api/rafthttp: rename to "pipelineProber"
...
Preliminary work to add prober to "streamRt"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
nolouch
6ea54195a6
client/integration: try to fix tests
2018-09-18 01:44:57 +08:00
nolouch
c15fb607f6
server: broadcast leader changed
2018-09-17 14:15:04 +08:00
nolouch
f3f6427586
server: prevent blocking
2018-09-14 16:08:29 +08:00
nolouch
4de27039cb
server: drop read request if found leader changed
2018-09-14 15:58:35 +08:00
Gyuho Lee
8560221091
etcdserver: fix gofmt warnings with Go 1.11
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 21:45:12 -07:00
Gyuho Lee
0ef9ef3c74
*: rerun "gofmt"
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 18:25:39 -07:00
Gyuho Lee
1399bc69ce
etcdserver: update import paths to "go.etcd.io/etcd"
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:55 -07:00
Sam Batschelet
af85949b41
Merge pull request #10024 from gyuho/became-inactive
...
etcdserver/api/rafthttp: clarify "became inactive" warning
2018-08-24 22:12:16 -04:00
Sam Batschelet
24ee22ab48
Merge pull request #10026 from gyuho/read-index
...
etcdserver: clarify read index wait timeout warnings
2018-08-24 22:11:58 -04:00
Gyuho Lee
38711761a1
etcdserver: clarify read index wait timeout warnings
...
"read index" doesn't tell much about the root cause.
Most likely, the local follower node is having slow
network, thus timing out waiting to receive read
index response from leader.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:59:41 -07:00
Gyuho Lee
156ff6461d
etcdserver/api/rafthttp: clarify "became inactive" warning
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:45:53 -07:00
Jingyi Hu
8d85259b56
etcdserver/api/v3rpc/interceptor: add log level checking
...
Check log level before generating and writing log info.
2018-08-17 16:12:05 -07:00
Gyuho Lee
6f4c509ad8
etcdserver/api/rafthttp: add v3 snapshot send/receive metrics
...
Distribution would be:
0.1 second or more
...
25.6 seconds or more
51.2 seconds or more
etcd_network_snapshot_send_success
etcd_network_snapshot_send_failures
etcd_network_snapshot_send_total_duration_seconds
etcd_network_snapshot_receive_success
etcd_network_snapshot_receive_failures
etcd_network_snapshot_receive_total_duration_seconds
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-15 12:56:50 -07:00
Gyuho Lee
c392cd20cf
etcdserver/api/snap: add v3 snapshot fsync metrics
...
etcd_snap_db_fsync_duration_seconds_count
etcd_snap_db_save_total_duration_seconds_bucket
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-15 12:56:44 -07:00
Gyuho Lee
eb6738053b
etcdserver: add "etcd_server_id" metric
...
```
etcd_server_id{server_id="8e9e05c52164694d"} 1
```
Useful for automating membership change operations,
no need to run "endpoint status" or "member list"
command to get member IDs.
Also, useful for "etcd_network" metrics with "To/From" labels.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-13 00:39:18 -07:00
Jingyi Hu
368010d8a3
etcdserver: code clean up
...
Code clean up in interceptor.go
2018-08-10 16:29:42 -07:00
Jingyi Hu
30662940f4
vendor: add go-grpc-middleware
...
Rebased to master PR #9994 . Fixed a Go format issue in
v3rpc/interceptor.go. Updated vendor to include go-grpc-middleware.
2018-08-10 13:15:35 -07:00
Jingyi Hu
dc01734c6b
etcdserver: add grpc interceptor to log info on incoming requests to
...
etcd server
To improve debuggability of etcd v3. Added a grpc interceptor to log
info on incoming requests to etcd server. The log output includes
remote client info, request content (with value field redacted), request
handling latency, response size, etc. Uses zap logger if available,
otherwise uses capnslog.
Also did some clean up on the chaining of grpc interceptors on server
side.
2018-08-10 11:01:07 -07:00
Iskander Sharipov
d0f800c930
etcdserver/api/v2discovery: simplify !(x == y) to x != y
...
Found using https://go-critic.github.io/overview#boolExprSimplify-ref
2018-07-28 23:47:17 +03:00
Joe Betz
750b87d622
Merge pull request #9924 from jpbetz/persist-lease-deadline
...
lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases
2018-07-24 09:39:57 -07:00
Joe Betz
2edb954bce
lease: Checkpoint lease TTLs to prevent indefinite auto-renewal of long lived leases
2018-07-23 16:12:34 -07:00
Gyuho Lee
643d791a11
etcdserver: add "etcd_server_go_version" metric
...
Currently, one has to look at server logs manually,
to see what Go version was used to build etcd server.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-23 09:15:22 -07:00
Gyuho Lee
51af6a062f
etcdserver: clean up code format
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-21 16:03:16 -07:00
Gyuho Lee
57ec2226cc
etcdserver: support zap.Logger in FD monitoring routine
...
Keep replacing capnslog
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-20 14:59:03 -07:00
Joe Betz
bbe2d777b1
lease: Add LessorConfig and wire zap logger into Lessor
2018-07-17 13:10:34 -07:00
Joe Betz
75ac18cd2d
lease: Add and lease checkpoint protobuf types
2018-07-17 11:54:51 -07:00
Gyuho Lee
ddf45cb958
etcdserver: remove configuration print methods
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-09 11:26:23 -07:00
Gyuho Lee
9934034bb1
etcdserver: remove unnecessary if-statement
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:25:00 -07:00
Gyuho Lee
e714dd01b3
etcdserver: remove unnecessary type conversion
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:08:56 -07:00
Gyuho Lee
37000cc4b8
etcdserver: add "etcd_server_slow_read_indexes_total"
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-02 12:58:35 -07:00
Gyuho Lee
4733a1db5c
etcdserver: clarify read index warnings
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-02 12:53:35 -07:00
Joe Betz
4b51b6de49
*: Add progress notify request watch request
2018-06-27 16:46:13 -07:00
Gyuho Lee
339cda03b3
etcdserver/api/v3rpc: remove duplicate gRPC logger set
...
Fix
=== RUN TestEmbedEtcd
==================
WARNING: DATA RACE
Write at 0x000001df86d0 by goroutine 711:
github.com/coreos/etcd/embed.(*Config).setupLogging.func1()
/go/src/github.com/coreos/etcd/vendor/google.golang.org/grpc/grpclog/loggerv2.go:68 +0x16c
sync.(*Once).Do()
/usr/local/go/src/sync/once.go:44 +0xe1
github.com/coreos/etcd/embed.(*Config).setupLogging()
in gRPC proxy tests.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-18 09:41:37 -07:00
Gyuho Lee
b241e383fd
Merge pull request #9858 from gyuho/lll
...
etcdserver: clean up election tick timeout log output
2018-06-15 13:40:44 -07:00
Gyuho Lee
52ffe9f79a
etcdserver: clean up election tick timeout log output
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:35:25 -07:00
Gyuho Lee
929d390520
etcdserver: log quota only once
...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:13:44 -07:00
Gyuho Lee
8990126c17
rafthttp: add "RaftDropHeartbeat" failpoint
...
To simulate network partition locally.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:10:58 -07:00
Gyuho Lee
a133e9fc8c
etcdserver: remove TODO from "warnOfExpensiveGenericRequest"
...
Metric is already added.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-12 09:37:00 -07:00
Joe Betz
a6fad51603
etcdserver: Fix txn request 'took too long' warnings to use loggable request stringer
2018-06-11 16:58:48 -07:00
Gyuho Lee
849d3f2ac9
Merge pull request #9826 from jpbetz/response_sizes
...
etcdserver: Add response byte size and range response count to took too long warning
2018-06-11 11:13:24 -07:00
Joe Betz
b47e148d5d
etcdserver: Add response byte size and range response count to took too long warning
2018-06-11 10:02:30 -07:00