355 Commits

Author SHA1 Message Date
Cole Gleason
f4f429d4e3 docs(cluster-size): remove outdated refrences to flag max-cluster-size 2014-06-16 09:41:37 -07:00
Brandon Philips
dc1f4adcd0 chore(server): bump to 0.4.3+git 2014-06-07 18:17:54 -07:00
Brandon Philips
9970141f76 chore(server): bump to 0.4.3 2014-06-07 18:17:05 -07:00
Brandon Philips
16c2bcf951 chore(server): go fmt
blame me for not running test first.
2014-06-07 18:03:22 -07:00
Brandon Philips
1c958f8fc3 fix(server): reduce the screaming heartbeat logs
Currently the only way we know that a peer isn't getting a heartbeat is
an edge triggered event from go raft on every missed heartbeat. This
means that we need to do some book keeping in order to do exponential
backoff.

The upside is that instead of screaming thousands of log lines before a
machine hits the default removal of 30 minutes it is only ~100.
2014-06-07 17:47:10 -07:00
Yicheng Qin
ed58193ebe chore(server): set DefaultRemoveDelay to 30mins
Its value was 5s before, which could remove the node insanely fast.
2014-06-06 16:57:35 -07:00
Yicheng Qin
fbcfe8e1c4 Merge pull request #807 from Shopify/raft-server-stats-struct-field-tag-fix
style(server): changed a LeaderInfo struct field from "startTime" to "StartTime"
2014-06-05 12:45:34 -07:00
Brandon Philips
a974bbfe4f chore(server): bump to 0.4.2+git 2014-06-02 15:26:06 -07:00
Brandon Philips
99dcc8c322 chore(server): bump back to 0.4.2 2014-06-02 15:25:03 -07:00
Brandon Philips
707174b56a chore(server): bump to 0.4.2+git 2014-06-02 14:19:52 -07:00
Brandon Philips
ce92cc3dc5 feat(CHANGELOG): bump to v0.4.2 2014-06-02 14:17:38 -07:00
Yicheng Qin
2387ef3f21 Merge pull request #819 from unihorn/97
fix(server): joinIndex is not set after recovery from full outage
2014-06-02 11:04:07 -07:00
Yicheng Qin
d5bfca9465 Merge pull request #814 from unihorn/91
fix(server/v2): set correct content-type for etcdError response
2014-06-02 10:38:36 -07:00
Yicheng Qin
d7768635fd fix(server): set joinIndex when recovered 2014-05-31 10:03:39 -07:00
Yicheng Qin
4bebb538eb fix(standby_server): able to join the cluster containing itself
Standby server will switch to peer server if it finds that
it has been contained in the cluster.
2014-05-30 14:03:49 -07:00
Yicheng Qin
db4c5e0eaa fix(server/v2): set correct content-type for etcdError response
"net/http".Error reset the content type, so we get rid of it and
write our own one.
2014-05-29 14:18:50 -07:00
marc.barry
673d90728e style(server): changed a LeaderInfo struct field from "startTime" to "StartTime"
Changed the LeaderInfo struct "start time" field from "startTime" to "StartTime" so that it is an exported identifier. This required adding the `json:"startTime"` structure field tag so that the encoding/json package correctly performs JSON encoding (i.e. the correct property name --> startTime).
2014-05-21 11:19:56 -04:00
Brandon Philips
22c944d8ef chore(server): bump 0.4.0+git 2014-05-20 20:55:57 -07:00
Brandon Philips
a2d16b52bb chore(server): bump to 0.4.1 2014-05-20 20:46:46 -07:00
Brandon Philips
62560f9959 fix(server): add user facing remove API
This was accidently removed as we refactored the standy stuff. Re-add this
user facing remove endpoint that matches the config endpoints.
2014-05-20 20:01:10 -07:00
Brandon Philips
cc37c58103 chore(server): bump to 0.4.0+git 2014-05-20 17:10:28 -07:00
Brandon Philips
07d1eb0edb chore(server): bump to 0.4.0 2014-05-20 17:09:22 -07:00
Xiang Li
1e7a7b11dd Merge pull request #799 from xiangli-cmu/deny_unknow_peer
hack(server): notify removed peers when they try to become candidates
2014-05-20 13:37:14 -07:00
Yicheng Qin
934c28d498 fix(peer_server): set store and registry when setting raft server
New raft server needs new store and registry.
2014-05-20 13:12:12 -07:00
Xiang Li
189fece683 hack(server): notify removed peers when they try to become candidates
A peer might be removed during a network partiton. When it comes back it
will not have received any of the log entries that would have notified
it of its removal and go onto propose a vote. This will disrupt the
cluster and the cluster should give the machine feedback that it is no
longer a member.

The term of a denied vote is MaxUint64. The notification of the removal
is a raft event. These two modification are quick heck.

In reaction to this notification the machine should shutdown. In this
case the shutdown just moves it towards becoming a standby server.
2014-05-20 10:17:32 -07:00
Brandon Philips
1084e51320 Merge pull request #786 from unihorn/91
feat(standby_server): write cluster info to disk
2014-05-18 10:08:52 -07:00
Yicheng Qin
84f71b6c87 chore(standby_server): remove error return
because standby server should be started in best efforts.
2014-05-16 18:07:49 -04:00
Yicheng Qin
71679bcf56 feat(standby_server): make atomic move for file
to avoid the risk of writing out a corrupted file.
2014-05-16 01:00:07 -04:00
Yicheng Qin
a824be4c14 feat(standby_server): save/load Running into disk 2014-05-16 00:10:15 -04:00
Yicheng Qin
35cc81e22f feat(standby_server): save/load syncInterval to disk 2014-05-15 23:57:58 -04:00
Yicheng Qin
716496ec42 chore(standby_server): still sleep for the first time 2014-05-15 23:18:59 -04:00
Yicheng Qin
b7d9fdbd39 feat(standby_server): write cluster info to disk
For better fault tolerance and availability.
2014-05-15 07:47:15 -04:00
Brandon Philips
7cf8a4a8d0 Merge pull request #779 from unihorn/89
feat: implement standby mode
2014-05-14 10:03:03 -07:00
Yicheng Qin
851026362a chore(standby_server): let syncInterval represent in second unit
This is done to keep consistency with other namings.
2014-05-14 10:13:05 -04:00
Yicheng Qin
f6591b95c7 chore(standby): minor changes based on comments 2014-05-13 22:19:52 -04:00
Yicheng Qin
403f709ebd chore(cluster_config): set default timeout to 5s
Or the leader death could let the standbys down for a rather long time.
2014-05-13 16:13:44 -04:00
Yicheng Qin
c0027bfc78 feat(cluster_config): change field from int to float64
This is modified for better flexibility, especially for testing.
2014-05-12 22:42:18 -04:00
Yicheng Qin
6a64141962 fix(TestV1Watch): ensure server has started 2014-05-09 15:42:18 -07:00
Yicheng Qin
5367c1c998 chore(standby): minor changes based on comments 2014-05-09 15:38:03 -07:00
Yicheng Qin
c6b1a738c3 feat(option): add cluster config option
It will be used when creating a brand-new cluster.
2014-05-09 15:22:11 -07:00
Yicheng Qin
6d4f018887 chore(cluster_config): rename SyncClusterInterval to SyncInterval
for better naming
2014-05-09 13:28:21 -07:00
Yicheng Qin
765cd5d8b3 refactor(find_cluster): make it simpler 2014-05-09 02:27:04 -07:00
Yicheng Qin
baadf63912 feat: implement standby mode
Change log:
1. PeerServer
- estimate initial mode from its log through removedInLog variable
- refactor FindCluster to return the estimation
- refactor Start to call FindCluster explicitly
- move raftServer start and cluster init from FindCluster to Start
- remove stopNotify from PeerServer because it is not used anymore
2. Etcd
- refactor Run logic to fit the specification
3. ClusterConfig
- rename promoteDelay to removeDelay for better naming
- add SyncClusterInterval field to ClusterConfig
- commit command to set default cluster config when cluster is created
- store cluster config info into key space for consistency
- reload cluster config when reboot
4. add StandbyServer
5. Error
- remove unused EcodePromoteError
2014-05-09 01:56:55 -07:00
Yicheng Qin
f1c13e2d9d Merge pull request #774 from unihorn/83
feat(join): check cluster conditions before join
2014-05-08 14:08:38 -07:00
Yicheng Qin
6c950eaf97 Merge pull request #772 from unihorn/81
feat(peer_server): stop service when removed
2014-05-08 14:02:09 -07:00
Yicheng Qin
5c7a963cf0 chore(peer_server): adjust code to make it more clear 2014-05-08 13:20:46 -07:00
Yicheng Qin
c92231c91a Merge branch 'master' of github.com:coreos/etcd
Conflicts:
	server/peer_server_handlers.go
2014-05-08 13:17:51 -07:00
Yicheng Qin
e960a0e03c chore(client): minor changes based on comments
The changes are made on error handling, comments and constant.
2014-05-08 13:15:10 -07:00
Yicheng Qin
015d228b04 Merge pull request #763 from unihorn/77
fix(raft_server_stats): set startTime when init
2014-05-08 12:28:44 -07:00
Yicheng Qin
b3e66ee980 fix(TestV2Watch): ensure server has started 2014-05-08 12:18:08 -07:00