Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Yicheng Qin	51f1ee055e	Merge pull request #3526 from yichengq/snapshot etcdserver: forbid to unset v3 demo once used	2015-09-14 21:36:39 -07:00
Yicheng Qin	1f0fb3d9aa	etcdserver: forbid to unset v3 demo once used After enabling v3 demo, it may change the underlying data organization for v3 store. So we forbid to unset --experimental-v3demo once it has been used.	2015-09-14 21:27:11 -07:00
Gyu-Ho Lee	c2dcf7431e	etcdserver, store: fix grammars in comments (a->an existing) I found some grammatical errors in comments. This pull request was submitted https://github.com/coreos/etcd/pull/3513. I am resubmitting following the correct guidlines.	2015-09-14 13:41:13 -07:00
Xiang Li	c7b4c67436	Merge pull request #3514 from xiang90/v3_raft support clustered v3 api	2015-09-14 09:35:02 -07:00
Xiang Li	4c81615cef	etcdserver: initial support for cluster-wide v3 request	2015-09-13 08:32:01 -07:00
Hitoshi Mitake	6974fc63ed	etcdserver: avoid deadlock caused by adding members with wrong peer URLs Current membership changing functionality of etcd seems to have a problem which can cause deadlock. How to produce: 1. construct N node cluster 2. add N new nodes with etcdctl member add, without starting the new members What happens: After finishing add N nodes, a total number of the cluster becomes 2 * N and a quorum number of the cluster becomes N + 1. It means membership change requires at least N + 1 nodes because Raft treats membership information in its log like other ordinal log append requests. Assume the peer URLs of the added nodes are wrong because of miss operation or bugs in wrapping program which launch etcd. In such a case, both of adding and removing members are impossible because the quorum isn't preserved. Of course ordinal requests cannot be served. The cluster would seem to be deadlock. Of course, the best practice of adding new nodes is adding one node and let the node start one by one. However, the effect of this problem is so serious. I think preventing the problem forcibly would be valuable. Solution: This patch lets etcd forbid adding a new node if the operation changes quorum and the number of changed quorum is larger than a number of running nodes. If etcd is launched with a newly added option -strict-reconfig-check, the checking logic is activated. If the option isn't passed, default behavior of reconfig is kept. Fixes https://github.com/coreos/etcd/issues/3477	2015-09-13 09:31:53 +09:00
Xiang Li	d94e712d91	*: support wal dir	2015-09-01 09:54:27 -07:00
Yicheng Qin	8f6bf029f8	etcdserver: specify request timeout error due to connection lost It specifies request timeout error possibly caused by connection lost, and print out better log for user to understand. It handles two cases: 1. the leader cannot connect to majority of cluster. 2. the connection between follower and leader is down for a while, and it losts proposals. log format: ``` 20:04:19 etcd3 \| 2015-08-25 20:04:19.368126 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost 20:04:19 etcd3 \| 2015-08-25 20:04:19.368227 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost ```	2015-08-26 12:38:37 -07:00
Xiang Li	6b23a8131f	*: test gofmt with -s and fix reported issues	2015-08-21 18:52:16 -07:00
Yicheng Qin	2d5b95c49f	etcdserver: use ReqTimeout only We cannot refer RTT value from heartbeat interval, so CommitTimeout is invalid. Remove it and use ReqTimeout instead.	2015-08-17 14:54:25 -07:00
Xiang Li	f199a484af	*: only print out major.minor version for cluster version	2015-08-15 08:30:06 -07:00
Yicheng Qin	c229e6e655	etcdserver: improve error message when timeout due to leader fail	2015-08-13 15:46:21 -07:00
Yicheng Qin	0fdb77aea2	etcdserver: go back to marshal request in 2.1 way It fixes the problem that 2.1 cannot roll upgrade to 2.2 smoothly because 2.1 cannot understand the bytes marshalled at 2.2.	2015-08-13 13:41:52 -07:00
Yicheng Qin	27170e67b9	etcdserver: specify timeout caused by leader election Before this PR, the timeout caused by leader election returns: ``` 14:45:37 etcd2 \| 2015-08-12 14:45:37.786349 E \| etcdhttp: got unexpected response error (etcdserver: request timed out) ``` After this PR: ``` 15:52:54 etcd1 \| 2015-08-12 15:52:54.389523 E \| etcdhttp: etcdserver: request timed out, possibly due to leader down ```	2015-08-12 16:53:18 -07:00
Yicheng Qin	5a91937367	etcdserver: adjust commit timeout based on config It uses heartbeat interval and election timeout to estimate the commit timeout for internal requests. This PR helps etcd survive under high roundtrip-time environment, e.g., globally-deployed cluster.	2015-08-11 21:09:03 -07:00
Xiang Li	a718329ad3	Merge pull request #3248 from xiang90/v3 initial v3 demo	2015-08-10 13:59:03 -07:00
Brandon Philips	fb1951204c	etcdserver: move atomics to make etcd work on arm64 Follow the simple rule in the atomic package: "On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned." Tested on a system with /proc/cpuinfo reporting: processor : 0 model name : ARMv7 Processor rev 1 (v7l) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc0d CPU revision : 1	2015-08-08 18:11:41 -07:00
Xiang Li	9ff7075ce8	etcdserver: use v3server interface	2015-08-08 10:39:04 -07:00
Xiang Li	f004b4dac7	*: etcdserver supports v3 demo	2015-08-08 05:58:29 -07:00
Xiang Li	58503817ec	etcdserver: internal request union	2015-08-05 07:47:10 -07:00
Yicheng Qin	6fc9dbfe56	Merge pull request #3114 from yichengq/clean-raft-init etcdserver: clean up start and stop logic of raft	2015-07-27 14:19:25 -07:00
Yicheng Qin	7696dd3280	etcdserver: clean up start and stop logic of raft kill TODO and make it more readable.	2015-07-27 13:24:26 -07:00
Yicheng Qin	b7892b20c1	etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout This makes code more readable and reasonable.	2015-07-23 10:09:28 -07:00
Yicheng Qin	7f95780bfb	etcdserver: init raft internal var early Its `stopped`/`done` should be created always before being used in defer in server loop. It fixes the race detected when running TestSyncTrigger.	2015-06-29 15:34:15 -07:00
Antoine Grondin	270487d340	etcdserver: use Infof to print formatted argument	2015-06-14 20:22:21 +07:00
Xiang Li	e0f9796653	etcdserver: use leveled logging Leveled logging for etcdserver pkg.	2015-06-09 13:53:07 -07:00
Xiang Li	4a72d3a8bb	etcdserver: refactore member.go	2015-05-21 09:19:29 -07:00
Xiang Li	db7db689a6	etcdserver: check cluster version compability when joining	2015-05-19 10:19:41 -07:00
Xiang Li	9f8342dba4	etcdserver: do not get local version via HTTP	2015-05-13 17:19:32 -07:00
Yicheng Qin	a6a649f1c3	etcdserver: stop exposing Cluster struct After this PR, only cluster's interface Cluster is exposed, which makes code much cleaner. And it avoids external packages to rely on cluster struct in the future.	2015-05-13 10:01:25 -07:00
Yicheng Qin	032db5e396	*: extract types.Cluster from etcdserver.Cluster The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster is used for flag parsing and etcdserver config. There is no need to expose etcdserver.Cluster public, which contains lots of etcdserver internal details and methods. This is the first step for it.	2015-05-12 14:53:11 -07:00
Xiang Li	e866314b94	etcdserver: support update cluster version through raft 1. Persist the cluster version change through raft. When the member is restarted, it can recover the previous known decided cluster version. 2. When there is a new leader, it is forced to do a version checking immediately. This helps to update the first cluster version fast.	2015-05-12 11:44:34 -07:00
Xiang Li	94ffd72c7e	etcdserver: rename StoreAdminPrefix to StoreClusterPrefix We store cluster related key in StoreAdminPrefix for some historical reason. The previous API is called admin. But now, the admin name is gone and `cluster` is a more clear and correct name.	2015-04-29 12:05:51 -07:00
Xiang Li	6699107f61	*: add cluster version and cluster version detection. Cluster version is the min major.minor of all members in the etcd cluster. Cluster version is set to the min version that a etcd member is compatible with when first bootstrapp. During a rolling upgrades, the cluster version will be updated automatically. For example: ``` Cluster [a:1, b:1 ,c:1] -> clusterVersion 1 update a -> 2, b -> 2 after a detection Cluster [a:2, b:2 ,c:1] -> clusterVersion 1, since c is still 1 update c -> 2 after a detection Cluster [a:2, b:2 ,c:2] -> clusterVersion 2 ``` The API/raft component can utilize clusterVersion to determine if it can accept a client request or a raft RPC. We choose polling rather than pushing since we want to use the same logic for cluster version detection and (TODO) cluster version checking. Before a member actually joins a etcd cluster, it should check the version of the cluster. Push does not work since the other members cannot push version info to it before it actually joins. Moreover, we do not want our raft RPC system (which is doing the heartbeat pushing) to coordinate cluster version.	2015-04-29 11:31:59 -07:00
Yicheng Qin	1c1cccd236	rafthttp: stop etcd if it is found removed when stream dial The original process is stopping etcd only when pipeline message finds itself has been removed. After this PR, stream dial has this functionality too. It helps fast etcd stop, which doesn't need to wait for stream break to fall back to pipeline, and wait for election timeout to send out message to detect self removal.	2015-04-27 15:10:00 -07:00
Yicheng Qin	ebecee34e0	Merge pull request #2701 from yichengq/rafthttp-anon rafthttp: add remotes	2015-04-24 13:04:37 -07:00
Yicheng Qin	9f19b5660f	rafthttp: add AddRemote Add remotes to rafthttp, who help newly joined members catch up the progress of the cluster. It supports basic message sending to remote, and has no stream connection for simplicity. remotes will not be used after the latest peers have been added into rafthttp.	2015-04-24 11:49:23 -07:00
xiaost	cab1e9a723	etcdserver: skip noop entry in apply	2015-04-24 12:15:51 +08:00
Yicheng Qin	1d96de459a	etcdserver: init server stats before passing it as argument It is more reasonable to init the variable before passing it as an argument. It fixes a bug that etcdserver may panic on server stats when processing a message from rafthttp streamReader before server stats is initialized in server.Start().	2015-04-22 08:28:08 -07:00
Yicheng Qin	1811701427	Revert "etcdserver: fix cluster fallback recovery" This reverts commit cff005777a40bcf3a5bea3e87387273afe054ce1. Conflicts: etcdserver/server.go	2015-04-19 11:34:33 -07:00
Yicheng Qin	88224f6f4e	Revert "etcdserver: not apply stale conf change in cluster and transport" This reverts commit 40197f06987aac9c3a539e9022ad1f1e573326e7.	2015-04-19 11:08:03 -07:00
Xiang Li	98f8dfbc9d	etcdserver: prevExist=true + condition is compareAndSwap PrevExist indicates the key should exist. Condition compares with an existing key. So PrevExist+condition = CompareAndSwap not Update.	2015-04-14 23:44:06 -07:00
xiaost	eab2c2224a	etcdserver: fix minor bug in EtcdServer.send it seems to nothing serious. after deleted peers, the log may output: "etcdserver: send message to unknown receiver %s"	2015-04-13 20:35:58 +08:00
Yicheng Qin	7a7e1f7a7c	etcdserver: metrics and monitor number of file descriptor It exposes the metrics of file descriptor limit and file descriptor used. Moreover, it prints out warning when more than 80% of fd limit has been used. ``` 2015/04/08 01:26:19 etcdserver: 80% of the file descriptor limit is open [open = 969, limit = 1024] ```	2015-04-08 11:17:48 -07:00
Yicheng Qin	9e5743c816	etcdserver: stop raft node goroutine before stop server Stop raftNode goroutine before stopping server goroutine, so server.Stop does stop all underlying stuffs elegantly now. This fixes the problem that previous-round lock on WAL may not be released when etcd is restarted.	2015-04-01 11:20:51 -07:00
Yicheng Qin	dd92a2b484	Merge pull request #2556 from yichengq/fix-apply-conf etcdserver: not apply stale conf change	2015-03-27 14:00:30 -07:00
Yicheng Qin	40197f0698	etcdserver: not apply stale conf change in cluster and transport	2015-03-27 12:53:34 -07:00
Yicheng Qin	5e0077cc0c	etcdserver: print out extra files in data dir instead of erroring	2015-03-24 18:56:22 -07:00
Yicheng Qin	abcd828114	etcdserver: add join-existing check	2015-03-23 22:31:20 -07:00
Xiang Li	d015610da5	etcdserver: separate apply and raft routine	2015-03-10 13:34:24 -07:00

1 2 3 4 5 ...

385 Commits