Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Yicheng Qin	4ccbcb91c8	rafthttp: add functions to create listener and roundTripper This moves the code to create listener and roundTripper for raft communication to the same place, and use explicit functions to build them. This prevents possible development errors in the future.	2015-11-04 11:12:46 -08:00
Yicheng Qin	32819f6b3f	etcdserver: use roundTripper to request peerURL It uses roundTripper instead of Transport because roundTripper is sufficient for its requirements.	2015-11-04 10:49:42 -08:00
Yicheng Qin	0eee88a3d9	etcdserver: use timeout transport as peer transport This pairs with remote timeout listeners. etcd uses timeout listener, and times out the accepted connections if there is no activity. So the idle connections may time out easily. Becaus timeout transport doesn't reuse connections, it prevents using timeouted connection. This fixes the problem that etcd fail to get version of peers.	2015-11-03 07:58:03 -08:00
Xiang Li	fe165de1d1	Merge pull request #3794 from yichengq/fix-proxy-term etcdmain: fix parsing discovery error	2015-11-02 17:33:47 -08:00
Yicheng Qin	9757dcd3a2	etcdmain: fix parsing discovery error The discovery error is wrapped into a struct now, and cannot be compared to predefined errors. Correct the comparison behavior to fix the problem.	2015-11-02 17:23:06 -08:00
Yicheng Qin	263b270708	etcdserver: commit v3 storage before releasing WAL This ensures that v3 storage could always find the following log entries when restart.	2015-10-26 21:06:08 -07:00
Yicheng Qin	15ed6d8268	etcdserver: save consistent index into v3 storage This helps to recover consistent index when restart in the future.	2015-10-24 09:27:24 -07:00
Yicheng Qin	cacc0d6432	etcdserver: restore KV snapshot when receiving snapshot When a slow follower receives the snapshot sent from the leader, it should rename the snapshot file to the default KV file path, and restore KV snapshot. Have tested it manually and it works pretty well.	2015-10-23 08:43:26 -07:00
Yicheng Qin	de669be6d6	Merge pull request #3683 from yichengq/raft-block etcdserver: fix raft state machine may block	2015-10-20 09:44:34 -07:00
Yicheng Qin	ab5df57ecf	etcdserver: fix raft state machine may block When snapshot store requests raft snapshot from etcdserver apply loop, it may block on the channel for some time, or wait some time for KV to snapshot. This is unexpected because raft state machine should be unblocked. Even worse, this block may lead to deadlock: 1. raft state machine waits on getting snapshot from raft memory storage 2. raft memory storage waits snapshot store to get snapshot 3. snapshot store requests raft snapshot from apply loop 4. apply loop is applying entries, and waits raftNode loop to finish messages sending 5. raftNode loop waits peer loop in Transport to send out messages 6. peer loop in Transport waits for raft state machine to process message Fix it by changing the logic of getSnap to be asynchronously creation.	2015-10-20 09:19:34 -07:00
Xiang Li	32dd4d5de3	Merge pull request #3657 from xiang90/fix_remove etcdserver: skip updating attr if the member does not exist	2015-10-19 13:35:57 -07:00
Xiang Li	d90a47656e	etcdserver: use Histogram for proposal_durations	2015-10-17 12:48:25 -07:00
Yicheng Qin	1f21ccf166	rafthttp: support sending v3 snapshot message Use snapshotSender to send v3 snapshot message. It puts raft snapshot message and v3 snapshot into request body, then sends it to the target peer. When it receives http.StatusNoContent, it knows the message has been received and processed successfully. As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot message, then respond with http.StatusNoContent.	2015-10-13 23:11:28 -07:00
Yicheng Qin	207c92b627	rafthttp: build transport inside pkg instead of passed-in rafthttp has different requirements for connections created by the transport for different usage, and this is hard to achieve when giving one http.RoundTripper. Pass into pkg the data needed to build transport now, and let rafthttp build its own transports.	2015-10-11 21:42:37 -07:00
Yicheng Qin	233e717e2f	rafthttp: expose struct to set configuration transport takes too many arguments and the new function is unable to read. Change the way to set fields in transport struct directly.	2015-10-11 09:02:16 -07:00
Xiang Li	98e30ca7c2	etcdserver: skip updating attr if the member does not exist	2015-10-08 14:07:16 -07:00
Yicheng Qin	8c0db94fef	Merge pull request #3631 from yichengq/create-snapshot etcdserver: support to create raft snapshot at apply loop	2015-10-03 10:03:27 -07:00
Yicheng Qin	18c568bc82	etcdserver: print out correct restored cluster info Before this PR, it always prints nil because cluster info has not been covered when print: ``` 2015-10-02 14:00:24.353631 I \| etcdserver: loaded cluster information from store: <nil> ```	2015-10-02 16:11:32 -07:00
Yicheng Qin	bfe9502f4f	etcdserver: support to create raft snapshot at apply loop and snapStore could trigger it to create the latest raft snapshot.	2015-10-02 13:17:56 -07:00
Yicheng Qin	2276328720	etcdserver: add snapshotStore and raftStorage snapshotStore is the store of snapshot, and it supports to get latest snapshot and save incoming snapshot. raftStorage supports to get latest snapshot when v3demo is open.	2015-10-01 19:00:59 -07:00
Yicheng Qin	a535cf2cad	Merge pull request #3610 from yichengq/load-storage etcdserver: restore v3 storage when restart	2015-09-29 11:58:38 -07:00
Yicheng Qin	5d906a0acc	etcdserver: restore v3 storage when restart To load the previous data.	2015-09-29 00:14:27 -07:00
Yicheng Qin	939aa96a34	etcdmain: improve log when join discovery fails Before this PR, the log is ``` 2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or misconfigured ``` It is quite hard for people to understand what happens. Now we print out the exact reason for the failure, and explains the way to handle it.	2015-09-28 23:23:50 -07:00
Hitoshi Mitake	f8859a980d	etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode Like the commit 6974fc63ed87, this commit lets etcdserver forbid removing started member if quorum cannot be preserved after reconfiguration if the option -strict-reconfig-check is passed to etcd. The removal can cause deadlock if unstarted members have wrong peer URLs.	2015-09-18 10:09:57 +09:00
Yicheng Qin	352cd768c6	etcdserver: fix shadow declaration	2015-09-14 23:25:16 -07:00
Yicheng Qin	05c74bd890	etcdserver: rename db file into a formal directory and rename it to a formal name	2015-09-14 22:41:40 -07:00
Yicheng Qin	51f1ee055e	Merge pull request #3526 from yichengq/snapshot etcdserver: forbid to unset v3 demo once used	2015-09-14 21:36:39 -07:00
Yicheng Qin	1f0fb3d9aa	etcdserver: forbid to unset v3 demo once used After enabling v3 demo, it may change the underlying data organization for v3 store. So we forbid to unset --experimental-v3demo once it has been used.	2015-09-14 21:27:11 -07:00
Gyu-Ho Lee	c2dcf7431e	etcdserver, store: fix grammars in comments (a->an existing) I found some grammatical errors in comments. This pull request was submitted https://github.com/coreos/etcd/pull/3513. I am resubmitting following the correct guidlines.	2015-09-14 13:41:13 -07:00
Xiang Li	c7b4c67436	Merge pull request #3514 from xiang90/v3_raft support clustered v3 api	2015-09-14 09:35:02 -07:00
Xiang Li	4c81615cef	etcdserver: initial support for cluster-wide v3 request	2015-09-13 08:32:01 -07:00
Hitoshi Mitake	6974fc63ed	etcdserver: avoid deadlock caused by adding members with wrong peer URLs Current membership changing functionality of etcd seems to have a problem which can cause deadlock. How to produce: 1. construct N node cluster 2. add N new nodes with etcdctl member add, without starting the new members What happens: After finishing add N nodes, a total number of the cluster becomes 2 * N and a quorum number of the cluster becomes N + 1. It means membership change requires at least N + 1 nodes because Raft treats membership information in its log like other ordinal log append requests. Assume the peer URLs of the added nodes are wrong because of miss operation or bugs in wrapping program which launch etcd. In such a case, both of adding and removing members are impossible because the quorum isn't preserved. Of course ordinal requests cannot be served. The cluster would seem to be deadlock. Of course, the best practice of adding new nodes is adding one node and let the node start one by one. However, the effect of this problem is so serious. I think preventing the problem forcibly would be valuable. Solution: This patch lets etcd forbid adding a new node if the operation changes quorum and the number of changed quorum is larger than a number of running nodes. If etcd is launched with a newly added option -strict-reconfig-check, the checking logic is activated. If the option isn't passed, default behavior of reconfig is kept. Fixes https://github.com/coreos/etcd/issues/3477	2015-09-13 09:31:53 +09:00
Xiang Li	d94e712d91	*: support wal dir	2015-09-01 09:54:27 -07:00
Yicheng Qin	8f6bf029f8	etcdserver: specify request timeout error due to connection lost It specifies request timeout error possibly caused by connection lost, and print out better log for user to understand. It handles two cases: 1. the leader cannot connect to majority of cluster. 2. the connection between follower and leader is down for a while, and it losts proposals. log format: ``` 20:04:19 etcd3 \| 2015-08-25 20:04:19.368126 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost 20:04:19 etcd3 \| 2015-08-25 20:04:19.368227 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost ```	2015-08-26 12:38:37 -07:00
Xiang Li	6b23a8131f	*: test gofmt with -s and fix reported issues	2015-08-21 18:52:16 -07:00
Yicheng Qin	2d5b95c49f	etcdserver: use ReqTimeout only We cannot refer RTT value from heartbeat interval, so CommitTimeout is invalid. Remove it and use ReqTimeout instead.	2015-08-17 14:54:25 -07:00
Xiang Li	f199a484af	*: only print out major.minor version for cluster version	2015-08-15 08:30:06 -07:00
Yicheng Qin	c229e6e655	etcdserver: improve error message when timeout due to leader fail	2015-08-13 15:46:21 -07:00
Yicheng Qin	0fdb77aea2	etcdserver: go back to marshal request in 2.1 way It fixes the problem that 2.1 cannot roll upgrade to 2.2 smoothly because 2.1 cannot understand the bytes marshalled at 2.2.	2015-08-13 13:41:52 -07:00
Yicheng Qin	27170e67b9	etcdserver: specify timeout caused by leader election Before this PR, the timeout caused by leader election returns: ``` 14:45:37 etcd2 \| 2015-08-12 14:45:37.786349 E \| etcdhttp: got unexpected response error (etcdserver: request timed out) ``` After this PR: ``` 15:52:54 etcd1 \| 2015-08-12 15:52:54.389523 E \| etcdhttp: etcdserver: request timed out, possibly due to leader down ```	2015-08-12 16:53:18 -07:00
Yicheng Qin	5a91937367	etcdserver: adjust commit timeout based on config It uses heartbeat interval and election timeout to estimate the commit timeout for internal requests. This PR helps etcd survive under high roundtrip-time environment, e.g., globally-deployed cluster.	2015-08-11 21:09:03 -07:00
Xiang Li	a718329ad3	Merge pull request #3248 from xiang90/v3 initial v3 demo	2015-08-10 13:59:03 -07:00
Brandon Philips	fb1951204c	etcdserver: move atomics to make etcd work on arm64 Follow the simple rule in the atomic package: "On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned." Tested on a system with /proc/cpuinfo reporting: processor : 0 model name : ARMv7 Processor rev 1 (v7l) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc0d CPU revision : 1	2015-08-08 18:11:41 -07:00
Xiang Li	9ff7075ce8	etcdserver: use v3server interface	2015-08-08 10:39:04 -07:00
Xiang Li	f004b4dac7	*: etcdserver supports v3 demo	2015-08-08 05:58:29 -07:00
Xiang Li	58503817ec	etcdserver: internal request union	2015-08-05 07:47:10 -07:00
Yicheng Qin	6fc9dbfe56	Merge pull request #3114 from yichengq/clean-raft-init etcdserver: clean up start and stop logic of raft	2015-07-27 14:19:25 -07:00
Yicheng Qin	7696dd3280	etcdserver: clean up start and stop logic of raft kill TODO and make it more readable.	2015-07-27 13:24:26 -07:00
Yicheng Qin	b7892b20c1	etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout This makes code more readable and reasonable.	2015-07-23 10:09:28 -07:00
Yicheng Qin	7f95780bfb	etcdserver: init raft internal var early Its `stopped`/`done` should be created always before being used in defer in server loop. It fixes the race detected when running TestSyncTrigger.	2015-06-29 15:34:15 -07:00

1 2 3 4 5 ...

411 Commits