1659 Commits

Author SHA1 Message Date
Yuchen Zhou
cc5cc3ae40 etcdserver: change protobuf field type from int to int64 (#12000) 2020-08-13 15:55:41 -07:00
Gyuho Lee
5bc8f1650c etcdserver: add OS level FD metrics
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.

e.g.

os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 18:40:03 -07:00
cfc4n
ba7ff1eea9 auth: Customize simpleTokenTTL settings.
see https://github.com/etcd-io/etcd/issues/11978 for more detail.
2020-06-25 20:17:49 +08:00
Gyuho Lee
2e601c4611
Merge pull request #12058 from spzala/automated-cherry-pick-of-#11818-upstream-release-3.3
Automated cherry pick of #11818
2020-06-24 20:41:21 -07:00
cfc4n
c4db372810 etcdserver:FDUsage set ticker to 10 minute from 5 seconds. This ticker will check File Descriptor Requirements ,and count all fds in used. And recorded some logs when in used >= limit/5*4. Just recorded message. If fds was more than 10K,It's low performance due to FDUsage() works. So need to increase it.
see https://github.com/etcd-io/etcd/issues/11969 for more detail.
2020-06-24 13:21:30 +08:00
Hitoshi Mitake
585814082b etcdserver: don't let InternalAuthenticateRequest have password 2020-06-23 14:16:44 -04:00
Gyuho Lee
3bf09a5859
Merge pull request #11758 from jingyih/automated-cherry-pick-of-#11754-upstream-release-3.3
Automated cherry pick of #11754 on release-3.3
2020-06-21 23:21:55 -07:00
Gyuho Lee
924b8128c2 *: make sure snapshot save downloads SHA256 checksum
ref. https://github.com/etcd-io/etcd/pull/11896

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-05-18 02:27:01 -07:00
Gyuho Lee
9caec0d124 etcdserver,wal: fix inconsistencies in WAL and snapshot
ref. https://github.com/etcd-io/etcd/issues/10219

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-05-18 02:26:57 -07:00
Changxin Miao
8781e1d44c etcdserver: watch stream got closed once one request is not permitted (#11708) 2020-04-06 07:09:15 -07:00
tangcong
294e714489 *: fix cherry-pick conflict 2020-04-06 10:47:14 +08:00
tangcong
27dffc6d01 etcdserver: print warn log when failed to apply request 2020-04-06 09:20:45 +08:00
tangcong
140bf5321d *: fix auth revision corruption bug 2020-04-06 09:16:06 +08:00
Gyuho Lee
d9027cecf2 etcdserver/api/v3rpc: handle api version metadata, add metrics
ref.
https://github.com/etcd-io/etcd/pull/11687

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-03-18 17:18:31 -07:00
Gyuho Lee
30aaceb1c3 etcdserver/api/etcdhttp: log server-side /health checks
ref.
https://github.com/etcd-io/etcd/pull/11704

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-03-18 16:28:18 -07:00
Jingyi Hu
5a4821721e etcdserver: remove auth validation loop
Remove auth validation loop in v3_server.raftRequest(). Re-validation
when error ErrAuthOldRevision occurs should be handled on client side.
2019-11-20 16:45:48 -08:00
Maxim Vladimirskiy
95095f8406 etcdserver: Remove infinite loop in doSerialize
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
2019-11-20 16:45:47 -08:00
Jingyi Hu
7c164a8948 etcdserver: wait purge file loop during shutdown
To prevent the purge file loop from accidentally acquiring the file lock
and remove the files during server shutdowm.
2019-10-30 16:47:06 -07:00
Wenjia Zhang
e7888805e1 Add cluster version fix #11233, #11254, #11265 2019-10-16 13:27:07 -07:00
Gyuho Lee
5c19bd24f0 etcdserver/*: add "etcd_cluster_version" metric
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-10-15 18:05:33 -07:00
Jingyi Hu
81fc7c23c2 *: fix gofmt 2019-08-19 20:22:15 -07:00
Gyuho Lee
e5c2dff346 etcdserver: detect leader change on reads
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-14 09:32:10 -07:00
Gyuho Lee
5a678bb4e3 etcdserver/api/v3rpc: support watch fragmentation
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-14 01:22:29 -07:00
Gyuho Lee
d167714b36 *: regenerate proto
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-14 01:22:23 -07:00
Gyuho Lee
9f7294f1e0 etcdserver/etcdserverpb/rpc.proto: add watch progress/fragment
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-14 01:17:29 -07:00
Gyuho Lee
08124105ad *: use new adt.IntervalTree interface
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-09 11:15:49 -07:00
Gyuho Lee
4527f4c4b0 etcdserver: add "etcd_server_snapshot_apply_inflights_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 15:13:14 -07:00
Gyuho Lee
f179d4d6a3 etcdserver: improve heartbeat send failures logging
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-02 10:02:28 -07:00
Sam Batschelet
43386ac29b *: Change gRPC proxy to expose etcd server endpoint /metrics
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-04-11 17:07:40 -04:00
James Shubin
7814718c73 etcdserver: Use panic instead of fatal on no space left error
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.

This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().

This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.

Fixes: https://github.com/etcd-io/etcd/issues/10588

This is a cherry-picked version of upstream: 368f70a37cf25b432f01921d3f05a3bc0357297a
2019-03-29 17:45:48 -04:00
Gyuho Lee
957700f444 etcdserver: add "etcd_server_read_indexes_failed_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:22:02 -07:00
Gyuho Lee
8491137b55 etcdserver: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 17:54:30 -07:00
Jingyi Hu
9eee0b078e etcdserver: remove duplicated imports
Removed duplicated imports of package 'context' in server.go
2018-09-13 20:44:03 -07:00
Gyuho Lee
d1acb5a5c8 etcdserver: add "etcd_server_id"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-29 14:50:17 -07:00
Gyuho Lee
73c1100b04 etcdserver: clarify read index wait timeout warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-29 14:38:59 -07:00
Gyuho Lee
0dc4632e28 Merge pull request #9861 from gyuho/race
etcdserver/api/v3rpc: remove duplicate gRPC logger set
2018-08-17 22:32:10 -04:00
Jingyi Hu
264bb51a9a etcdserver: code clean up
Code clean up in interceptor.go
2018-08-14 17:08:45 -07:00
Jingyi Hu
c6c0d03522 vendor: add go-grpc-middleware
Rebased to master PR #9994.  Fixed a Go format issue in
v3rpc/interceptor.go.  Updated vendor to include go-grpc-middleware.
2018-08-14 17:08:45 -07:00
Jingyi Hu
94f81368ae etcdserver: add grpc interceptor to log info on incoming requests to etcd server
To improve debuggability of etcd v3. Added a grpc interceptor to log
info on incoming requests to etcd server. The log output includes
remote client info, request content (with value field redacted), request
handling latency, response size, etc. Uses zap logger if available,
otherwise uses capnslog.

Also did some clean up on the chaining of grpc interceptors on server
side.
2018-08-14 16:20:13 -07:00
Gyuho Lee
ea40e9f059 etcdserver: add "etcd_server_go_version" metric
Currently, one has to look at server logs manually,
to see what Go version was used to build etcd server.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-23 16:39:24 -07:00
Wenjia
7f421efe48
remove "github.com/gogo/protobuf/plugin/stringer" 2018-07-19 17:15:32 -07:00
Gyuho Lee
d509620793 etcdserver: rename to "heartbeat_send_failures_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-19 16:58:14 -07:00
Gyuho Lee
e43224c3b6 etcdserver: add "etcd_server_slow_apply_total"
{"level":"warn","ts":1527101858.6985068,"caller":"etcdserver/util.go:115","msg":"apply request took too long","took":0.114101529,"expected-duration":0.1,"prefix":"","request":"header:<ID:1029181977902852337> put:<key:\"\\000\\000...

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-19 16:52:37 -07:00
Gyuho Lee
4c7bf51030 etcdserver: add "etcd_server_heartbeat_failures_total"
{"level":"warn","ts":1527101858.4149103,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.025771662}
{"level":"warn","ts":1527101858.4149644,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.034015766}

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-19 16:51:08 -07:00
Gyuho Lee
72c51d3e12 etcdserver: add "etcd_server_quota_backend_bytes"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-03 13:26:49 -07:00
Gyuho Lee
4481238224 etcdserver: add "etcd_server_slow_read_indexes_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-03 13:00:08 -07:00
Gyuho Lee
82e670766a etcdserver: clarify read index warnings
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-03 12:53:21 -07:00
Joe Betz
b7c19232bc etcdserver: Fix txn request 'took too long' warnings to use loggable request stringer 2018-06-12 09:33:33 -07:00
Joe Betz
07f833ae3e etcdserver: Add response byte size and range response count to took too long warning 2018-06-11 11:26:26 -07:00
Joe Betz
ef154094b3 etcdserver: Replace value contents with value_size in request took too long warning 2018-06-08 09:49:43 -07:00