1814 Commits

Author SHA1 Message Date
Jingyi Hu
7dc5451fae *: Change etcdserver API to support raft learner
- Added isLearner flag to MemberAddRequest in Cluster API.
- Added isLearner field to StatusResponse in Maintenance API.
- Added MemberPromote rpc to Cluster API.
2019-05-14 13:09:17 -07:00
Sam Batschelet
1411c585be etcdserver: fix typo in log message
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-05-10 09:54:00 -04:00
shivaramr
9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Jingyi Hu
cca0d5c1be
Merge pull request #10672 from nolouch/fix-probing-log
api/rafthttp: fix the probing status log print
2019-04-24 23:05:23 -07:00
nolouch
decc0d5f43 api/rafthttp: fix the probing status print
Signed-off-by: nolouch <nolouch@gmail.com>
2019-04-23 19:51:34 +08:00
Gyuho Lee
877f11bed8 etcdserver: improve heartbeat send failures logging
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-04-19 10:58:17 -07:00
Sam Batschelet
9915d02022 *: Change gRPC proxy to expose etcd server endpoint /metrics
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-04-10 16:09:32 -04:00
James Shubin
368f70a37c etcdserver: Use panic instead of fatal on no space left error
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.

This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().

This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.

Fixes: https://github.com/etcd-io/etcd/issues/10588
2019-03-27 15:24:33 -04:00
johncming
bd41f74168 etcdserver/api/rafthttp: fix the location of close http body. 2019-03-11 22:20:38 +08:00
zhoulin xie
a943ad0ee4 client/keys_bench_test.go: Fix some misspells
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-28 14:36:06 -05:00
Gyuho Lee
8d1a62e7ef *: use default log configuration for server
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-02-21 10:57:26 -08:00
WizardCXY
e6c6d8492e *: add flag to let etcd use the new boltdb freelistType feature 2019-02-14 11:07:08 +08:00
Hitoshi Mitake
72dd4a18c5 *: add a new option --enable-grpc-gateway for enabling/disabling grpc gateway 2019-01-23 03:26:34 +09:00
Xiang Li
2063b358c8
Merge pull request #10218 from mailgun/maxim/develop
Remove infinite loop in doSerialize
2019-01-09 10:38:25 -08:00
johncming
e8f46ce341 etcdserver: add a test to verify not to send duplicated append responses 2019-01-09 10:37:43 +08:00
Sam Batschelet
577d7c0df2 e2e: update test to reflect (ST1005) update.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Sam Batschelet
a82703b69e *: error strings should not end with punctuation or a newline (ST1005)
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Xiang Li
6511829d1f
Merge pull request #10374 from johncming/deprecated
api/rafthttp: remove deprecated req.Cancel.
2019-01-08 14:33:25 -08:00
Gyuho Lee
442c863413
Merge pull request #10377 from johncming/cancel-pos
api/v2auth: remove defer in loop.
2019-01-08 09:43:06 -08:00
Xiang Li
b04633fd8e
Merge pull request #10375 from johncming/redundant-parentheses
etcdserver: remove redundant parentheses.
2019-01-07 18:38:26 -08:00
caoming
e96dbfb973 api/v2auth: remove defer in loop. 2019-01-08 08:56:55 +08:00
caoming
5060560f92 api/v2store: use camel case instead of snake case. 2019-01-07 10:35:23 +08:00
caoming
802e2aaadd etcdserver: remove redundant parentheses. 2019-01-07 10:27:52 +08:00
caoming
4651f49a5c api/rafthttp: remove deprecated req.Cancel. 2019-01-07 10:12:47 +08:00
caoming
b2e0e760a0 etcdserver: add missing lg assignment. 2019-01-05 09:24:48 +08:00
lsytj0413
792aad932f refactor(*): fix golint warning 2018-12-24 11:43:10 +08:00
Xiang Li
3faed211e5 *: add flags to setup backend related config 2018-11-26 15:50:26 -08:00
Gyuho Lee
291768af0f etcdserver/*: add "etcd_cluster_version" metric
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-11-13 21:49:12 -08:00
Maxim Vladimirskiy
91e583cba6 etcdserver: Remove infinite loop in doSerialize
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
2018-11-12 23:28:24 +03:00
Shin'ya Ueoka
aa4313a55a *: fix github links 2018-11-10 11:14:18 +09:00
Gyuho Lee
0f0919c19c
Merge pull request #10159 from gyuho/version-log
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee
d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee
59dd78dde8 etcdserver: clear message in cluster version decision
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Gyuho Lee
601d8b4677 etcdserver/api/etcdhttp: remove unused "HandleHealth" function
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee
004e04a1d1 etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee
884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee
7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee
4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee
47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
nolouch
6ea54195a6 client/integration: try to fix tests 2018-09-18 01:44:57 +08:00
nolouch
c15fb607f6 server: broadcast leader changed 2018-09-17 14:15:04 +08:00
nolouch
f3f6427586 server: prevent blocking 2018-09-14 16:08:29 +08:00
nolouch
4de27039cb server: drop read request if found leader changed 2018-09-14 15:58:35 +08:00
Gyuho Lee
8560221091 etcdserver: fix gofmt warnings with Go 1.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 21:45:12 -07:00
Gyuho Lee
0ef9ef3c74 *: rerun "gofmt"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 18:25:39 -07:00
Gyuho Lee
1399bc69ce etcdserver: update import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:55 -07:00
Sam Batschelet
af85949b41
Merge pull request #10024 from gyuho/became-inactive
etcdserver/api/rafthttp: clarify "became inactive" warning
2018-08-24 22:12:16 -04:00
Sam Batschelet
24ee22ab48
Merge pull request #10026 from gyuho/read-index
etcdserver: clarify read index wait timeout warnings
2018-08-24 22:11:58 -04:00
Gyuho Lee
38711761a1 etcdserver: clarify read index wait timeout warnings
"read index" doesn't tell much about the root cause.
Most likely, the local follower node is having slow
network, thus timing out waiting to receive read
index response from leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:59:41 -07:00
Gyuho Lee
156ff6461d etcdserver/api/rafthttp: clarify "became inactive" warning
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:45:53 -07:00