821 Commits

Author SHA1 Message Date
Yicheng Qin
de669be6d6 Merge pull request #3683 from yichengq/raft-block
etcdserver: fix raft state machine may block
2015-10-20 09:44:34 -07:00
Yicheng Qin
ab5df57ecf etcdserver: fix raft state machine may block
When snapshot store requests raft snapshot from etcdserver apply loop,
it may block on the channel for some time, or wait some time for KV to
snapshot. This is unexpected because raft state machine should be unblocked.

Even worse, this block may lead to deadlock:
1. raft state machine waits on getting snapshot from raft memory storage
2. raft memory storage waits snapshot store to get snapshot
3. snapshot store requests raft snapshot from apply loop
4. apply loop is applying entries, and waits raftNode loop to finish
messages sending
5. raftNode loop waits peer loop in Transport to send out messages
6. peer loop in Transport waits for raft state machine to process message

Fix it by changing the logic of getSnap to be asynchronously creation.
2015-10-20 09:19:34 -07:00
Hitoshi Mitake
1b0c65c299 etcdserver: don't allow methods other than GET in /debug/vars
Currently, /debug/vars seems to allow all types of methods e.g. PUT,
POST, etc. However, this path is a readonly stuff so it should allow
GET only.
2015-10-20 17:19:42 +09:00
Xiang Li
32dd4d5de3 Merge pull request #3657 from xiang90/fix_remove
etcdserver: skip updating attr if the member does not exist
2015-10-19 13:35:57 -07:00
Yicheng Qin
1f21ccf166 rafthttp: support sending v3 snapshot message
Use snapshotSender to send v3 snapshot message. It puts raft snapshot
message and v3 snapshot into request body, then sends it to the target peer.
When it receives http.StatusNoContent, it knows the message has been
received and processed successfully.

As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot
message, then respond with http.StatusNoContent.
2015-10-13 23:11:28 -07:00
Yicheng Qin
207c92b627 rafthttp: build transport inside pkg instead of passed-in
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.
2015-10-11 21:42:37 -07:00
Yicheng Qin
233e717e2f rafthttp: expose struct to set configuration
transport takes too many arguments and the new function is unable to
read. Change the way to set fields in transport struct directly.
2015-10-11 09:02:16 -07:00
Xiang Li
98e30ca7c2 etcdserver: skip updating attr if the member does not exist 2015-10-08 14:07:16 -07:00
Yicheng Qin
f74ff9b867 Merge pull request #3644 from mitake/test-race
etcdserver, test: don't access testing.T in time.AfterFunc()'s own go…
2015-10-07 08:34:58 -07:00
Hitoshi Mitake
68dd3ee621 etcdserver, test: don't access testing.T in time.AfterFunc()'s own goroutine
time.AfterFunc() creates its own goroutine and calls the callback
function in the goroutine. It can cause datarace like the problem
fixed in the commit de1a16e0f107c4e1ffcc7128f7f343baf9631e30 . This
commit also fixes the potential dataraces of tests in
etcdserver/server_test.go .
2015-10-06 11:37:08 +09:00
Yicheng Qin
8c94ae0ee3 etcdserver: get existing snapshot instead of requesting one
This fixes the problem that proposal cannot be applied.

When start the etcdserver.run loop, it expects to get the latest
existing snapshot. It should not attempt to request one because the loop
is the entity to create the snapshot.
2015-10-05 14:32:16 -07:00
Yicheng Qin
36f4303fc3 storage/etcdserver: update KV.Snapshot function
When using Snapshot function, it is expected:
1. know the size of snapshot before writing data
2. split snapshot-ready phase and write-data phase. so we could cut
snapshot first and write data later.

Update its interface to fit the requirement of etcdserver.
2015-10-03 10:15:23 -07:00
Yicheng Qin
8c0db94fef Merge pull request #3631 from yichengq/create-snapshot
etcdserver: support to create raft snapshot at apply loop
2015-10-03 10:03:27 -07:00
Yicheng Qin
18c568bc82 etcdserver: print out correct restored cluster info
Before this PR, it always prints nil because cluster info has not been
covered when print:

```
2015-10-02 14:00:24.353631 I | etcdserver: loaded cluster information
from store: <nil>
```
2015-10-02 16:11:32 -07:00
Yicheng Qin
bfe9502f4f etcdserver: support to create raft snapshot at apply loop
and snapStore could trigger it to create the latest raft snapshot.
2015-10-02 13:17:56 -07:00
Yicheng Qin
ccce61bda9 Merge pull request #3614 from yichengq/snapshot-store
etcdserver: add snapshotStore and raftStorage
2015-10-01 19:35:34 -07:00
Yicheng Qin
2276328720 etcdserver: add snapshotStore and raftStorage
snapshotStore is the store of snapshot, and it supports to get latest snapshot
and save incoming snapshot.

raftStorage supports to get latest snapshot when v3demo is open.
2015-10-01 19:00:59 -07:00
Xiang Li
715fdfb669 Merge pull request #3093 from mwitkow-io/feature/httpd_metrics
add `events` metrics in etcdhttp.
2015-10-01 12:10:58 -07:00
Michal Witkowski
1b2dc1c796 metrics: add events metrics in etcdhttp. 2015-10-01 08:11:42 +01:00
Yicheng Qin
a535cf2cad Merge pull request #3610 from yichengq/load-storage
etcdserver: restore v3 storage when restart
2015-09-29 11:58:38 -07:00
Yicheng Qin
49d262185d Merge pull request #3590 from yichengq/discovery-log
etcdmain: improve log when join discovery fails
2015-09-29 08:02:18 -07:00
Yicheng Qin
5d906a0acc etcdserver: restore v3 storage when restart
To load the previous data.
2015-09-29 00:14:27 -07:00
Yicheng Qin
939aa96a34 etcdmain: improve log when join discovery fails
Before this PR, the log is

```
2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or
misconfigured
```

It is quite hard for people to understand what happens.

Now we print out the exact reason for the failure, and explains the way
to handle it.
2015-09-28 23:23:50 -07:00
Xiang Li
6c05a01ec6 Merge pull request #3604 from gyuho/replace_netutil_BasicAuth
etcdhttp/auth: BasicAuth method in standard pkg
2015-09-28 15:55:46 -07:00
Gyu-Ho Lee
e16f81838b etcdhttp/auth: BasicAuth method in standard pkg
I created a new PR from https://github.com/coreos/etcd/pull/3598.
This is for `TODO: use the standard lib BasicAuth method when we move to
Go 1.4.` [1]. `BasicAuth` method got into Go standard package a year ago. [2]

---
1. https://github.com/coreos/etcd/blob/master/pkg/netutil/netutil.go#L126-L138
2. https://codereview.appspot.com/76540043/
2015-09-28 14:02:55 -07:00
Xiang Li
1226838381 etcdhttp: add Content-Type: application/json header to version handler 2015-09-25 15:14:13 -07:00
Gyu-Ho Lee
85f4475f62 httptypes/errors: HTTPError.WriteTo returns error
Squashing all commits into this one
(from https://github.com/coreos/etcd/pull/357).

Thanks,
2015-09-25 08:06:26 -07:00
Xiang Li
2540a3fb7e etcdsever: mismatch error uses the same format as the corresponding flags 2015-09-21 19:32:10 -07:00
Xiang Li
ea3dbfed60 Merge pull request #3408 from MSamman/extend-auth-api
etcdserver: extend auth api
2015-09-21 11:51:19 -07:00
Xiang Li
3b70bf87c3 etcdmain: better logging when user forget to set initial flags 2015-09-21 10:43:26 -07:00
Mohammad Samman
6ae1f6c6e4 etcdserver: extend auth api
allow recursive query on users and roles to get more detail

Fixes #3278
2015-09-21 00:51:18 -07:00
Yicheng Qin
cedad49dcf Merge pull request #3543 from mitake/reconfig-remove
etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode
2015-09-17 18:22:53 -07:00
Hitoshi Mitake
f8859a980d etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode
Like the commit 6974fc63ed87, this commit lets etcdserver forbid
removing started member if quorum cannot be preserved after
reconfiguration if the option -strict-reconfig-check is passed to
etcd. The removal can cause deadlock if unstarted members have wrong
peer URLs.
2015-09-18 10:09:57 +09:00
Xiang Li
ec4142576e Merge pull request #3534 from xiang90/grpc_err
etcdserver: better v3 api error handling
2015-09-16 12:32:28 -07:00
Jonathan Boulle
7848ac3979 *: add missing license headers 2015-09-15 14:09:01 -07:00
Xiang Li
94f4069a25 etcdserver: better v3 api error handling 2015-09-15 11:20:06 -07:00
Yicheng Qin
352cd768c6 etcdserver: fix shadow declaration 2015-09-14 23:25:16 -07:00
Yicheng Qin
05c74bd890 etcdserver: rename db file into a formal directory
and rename it to a formal name
2015-09-14 22:41:40 -07:00
Yicheng Qin
51f1ee055e Merge pull request #3526 from yichengq/snapshot
etcdserver: forbid to unset v3 demo once used
2015-09-14 21:36:39 -07:00
Yicheng Qin
1f0fb3d9aa etcdserver: forbid to unset v3 demo once used
After enabling v3 demo, it may change the underlying data organization
for v3 store. So we forbid to unset --experimental-v3demo once it has
been used.
2015-09-14 21:27:11 -07:00
Xiang Li
94f784826a *: support v3 compaction 2015-09-14 19:59:36 -07:00
Xiang Li
e0d8923f7b Merge pull request #3524 from xiang90/grpc_error
etcdserver: use gRPC error instead of error message in header
2015-09-14 16:38:44 -07:00
Xiang Li
7183387110 etcdserver: use gRPC error instead of error message in header 2015-09-14 16:11:13 -07:00
Gyu-Ho Lee
c2dcf7431e etcdserver, store: fix grammars in comments (a->an existing)
I found some grammatical errors in comments.

This pull request was submitted https://github.com/coreos/etcd/pull/3513.
I am resubmitting following the correct guidlines.
2015-09-14 13:41:13 -07:00
Xiang Li
c7b4c67436 Merge pull request #3514 from xiang90/v3_raft
support clustered v3 api
2015-09-14 09:35:02 -07:00
Xiang Li
4c81615cef etcdserver: initial support for cluster-wide v3 request 2015-09-13 08:32:01 -07:00
Xiang Li
600456f4ba etcdserverpb: update proto file for raftInternalRequest
We needs to assign each raftInternalRequest an ID for getting
the response after it goes through raft.

We also needs an empty response for error case.
2015-09-13 08:28:10 -07:00
Hitoshi Mitake
dad32646eb etcdserver: enhance test cases for isReadyToAddNewMember
- a case of a cluster with even number members
 - a case of an empty cluster
2015-09-13 12:30:10 +09:00
Jonathan Boulle
d9cf752060 etcdserver: add test for isReadyToAddNewMember
Also fixed check for special case of one-member cluster
2015-09-13 11:16:08 +09:00
Hitoshi Mitake
6974fc63ed etcdserver: avoid deadlock caused by adding members with wrong peer URLs
Current membership changing functionality of etcd seems to have a
problem which can cause deadlock.

How to produce:
 1. construct N node cluster
 2. add N new nodes with etcdctl member add, without starting the new members

What happens:
After finishing add N nodes, a total number of the cluster becomes 2 *
N and a quorum number of the cluster becomes N + 1. It means
membership change requires at least N + 1 nodes because Raft treats
membership information in its log like other ordinal log append
requests.

Assume the peer URLs of the added nodes are wrong because of miss
operation or bugs in wrapping program which launch etcd. In such a
case, both of adding and removing members are impossible because the
quorum isn't preserved. Of course ordinal requests cannot be
served. The cluster would seem to be deadlock.

Of course, the best practice of adding new nodes is adding one node
and let the node start one by one. However, the effect of this problem
is so serious. I think preventing the problem forcibly would be
valuable.

Solution:
This patch lets etcd forbid adding a new node if the operation changes
quorum and the number of changed quorum is larger than a number of
running nodes. If etcd is launched with a newly added option
-strict-reconfig-check, the checking logic is activated. If the option
isn't passed, default behavior of reconfig is kept.

Fixes https://github.com/coreos/etcd/issues/3477
2015-09-13 09:31:53 +09:00