252 Commits

Author SHA1 Message Date
Xiang Li
0ccd09532b etcd: add joinThroughFollower test 2014-09-03 09:19:52 -07:00
Xiang Li
a0c0638744 tests: remove unnecessary test 2014-09-03 09:19:51 -07:00
Xiang Li
60c8dbe0c9 etcd: rewrite kill_leader and kill_random test 2014-09-03 09:19:51 -07:00
Yicheng Qin
02ced2c2d7 v1: deprecate v1 support
Etcd moves to 0.5 without the support of v1.
2014-09-03 09:19:49 -07:00
Rob Strong
494d2c67aa fix(peer_server) set content type to application/json in admin 2014-06-21 13:13:10 -04:00
Yicheng Qin
25e69d9659 fix(multi_node_kill_all_and_recovery_test): ensure cluster is up 2014-06-02 14:43:51 -07:00
Yicheng Qin
e04a188358 fix(remove_node_test): remove unnecessary cluster configuration
The cluster configuration operation is originally to make sure
the instance won't be added back automatically between removal and
check for the number of existing peer-mode instances. But this
could make some node removed before the removal command.

Use longer sync interval instead to avoid this problem.
2014-06-02 13:30:19 -07:00
Yicheng Qin
7cb126967c fix(simple_snapshot_test): enlarge reasonable index range 2014-05-31 10:42:31 -07:00
Yicheng Qin
444e017c05 fix(remove_node_test): ensure cluster config is activated 2014-05-31 10:32:03 -07:00
Yicheng Qin
356675b70f fix(multi_node_kill_all_and_recovery_test): ensure cluster running 2014-05-31 10:15:03 -07:00
Yicheng Qin
37796ed84c tests: add TestMultiNodeKillAllAndRecorveryAndRemoveLeader
This one breaks because it doesn't set joinIndex correctly.
2014-05-31 10:01:45 -07:00
Yicheng Qin
ca29691543 tests(standby_test): comments 2014-05-30 18:36:23 -07:00
Yicheng Qin
4bebb538eb fix(standby_server): able to join the cluster containing itself
Standby server will switch to peer server if it finds that
it has been contained in the cluster.
2014-05-30 14:03:49 -07:00
Xiang Li
aaedf32c04 fix(test/remove_node_test.go) fix a deadlock in the test
The go-etcd client waits for the response from the paused node. And the test waits for the reponse to continue.
Actually we do not even need that small test, since we will check the machine status afterwards.
2014-05-20 14:34:59 -07:00
Yicheng Qin
9e5b12f591 tests(remove_node): add TestRemovePausedNode 2014-05-20 11:01:14 -07:00
Yicheng Qin
71679bcf56 feat(standby_server): make atomic move for file
to avoid the risk of writing out a corrupted file.
2014-05-16 01:00:07 -04:00
Yicheng Qin
b7d9fdbd39 feat(standby_server): write cluster info to disk
For better fault tolerance and availability.
2014-05-15 07:47:15 -04:00
Yicheng Qin
fc77b3e9e6 fix(simple_snapshot_test): enlarge reasonable index range 2014-05-13 22:28:28 -04:00
Yicheng Qin
403f709ebd chore(cluster_config): set default timeout to 5s
Or the leader death could let the standbys down for a rather long time.
2014-05-13 16:13:44 -04:00
Yicheng Qin
5367c1c998 chore(standby): minor changes based on comments 2014-05-09 15:38:03 -07:00
Yicheng Qin
6d4f018887 chore(cluster_config): rename SyncClusterInterval to SyncInterval
for better naming
2014-05-09 13:28:21 -07:00
Yicheng Qin
baadf63912 feat: implement standby mode
Change log:
1. PeerServer
- estimate initial mode from its log through removedInLog variable
- refactor FindCluster to return the estimation
- refactor Start to call FindCluster explicitly
- move raftServer start and cluster init from FindCluster to Start
- remove stopNotify from PeerServer because it is not used anymore
2. Etcd
- refactor Run logic to fit the specification
3. ClusterConfig
- rename promoteDelay to removeDelay for better naming
- add SyncClusterInterval field to ClusterConfig
- commit command to set default cluster config when cluster is created
- store cluster config info into key space for consistency
- reload cluster config when reboot
4. add StandbyServer
5. Error
- remove unused EcodePromoteError
2014-05-09 01:56:55 -07:00
Yicheng Qin
af33d61774 Merge pull request #775 from unihorn/84
refactor(tests/server_utils): use etcd instance
2014-05-08 11:53:46 -07:00
Yicheng Qin
04f09d2fd0 feat(peer_server): add State field to machineMessage
State field indicates the state of each machine.
For now, its value could be follower or leader.
2014-05-08 10:25:39 -07:00
Yicheng Qin
7dce4c8fbb refactor(tests/server_utils): use etcd instance
Remove duplicated etcd start code.
2014-05-07 11:49:03 -07:00
Xiang Li
b56aa62bcc Merge pull request #773 from unihorn/82
tests(snapshot): expand reasonable range for index
2014-05-07 13:02:41 -04:00
Yicheng Qin
c4cd86e094 tests(snapshot): expand reasonable range for index
snapshot file was createed with name '0_503.ss' and '0_1010.ss' when testing.
2014-05-07 09:41:36 -07:00
Yicheng Qin
17e299995c refactor(peer_server): remove standby mode in peer server 2014-05-07 09:10:09 -07:00
Yicheng Qin
ece25833aa Merge pull request #738 from unihorn/68
feat(peer_server): forbid rejoining with different name
2014-04-18 11:49:36 -07:00
Yicheng Qin
000e3ba651 chore(rejoin_test): rewrite some printout 2014-04-18 10:48:14 -07:00
Yicheng Qin
b742af56ec Merge pull request #723 from unihorn/63
tests: add TestJoinThroughFollower
2014-04-17 19:44:46 -07:00
Yicheng Qin
b17703a9e4 chore(tests/join): adjust output 2014-04-17 19:28:58 -07:00
Yicheng Qin
0c95e1eabb feat(peer_server): forbid rejoining with different name
Or it will confuse the cluster, especially the heartbeat between nodes.
2014-04-17 15:46:33 -07:00
Yicheng Qin
732fb7c160 tests(rejoin): add TestReplaceWithDifferentPeerAddress
The functionality has not been implemented yet.
2014-04-17 10:17:26 -07:00
Yicheng Qin
273c293645 fix(server): rejoin cluster with different ip 2014-04-17 10:16:30 -07:00
Yicheng Qin
67600603c5 chore: rename proxy mode to standby mode
It makes the name more reasonable.
2014-04-17 08:04:42 -07:00
Yicheng Qin
adf4acf947 chore: gofmt go files 2014-04-15 09:42:25 -07:00
Yicheng Qin
d88b52c5f3 fix(tests/v1_migration): correct HTTP response
The bug is introduced in 03839ca8 due to the mistake.
2014-04-15 09:25:14 -07:00
Yicheng Qin
8bcfb2ecaf Merge pull request #707 from unihorn/62
fix(peer_server): recover from outage with discovery
2014-04-14 13:58:43 -07:00
Yicheng Qin
03839ca806 fix(peer_server): recover from outage with discovery
This patch also contains the refactor of find cluster process.
It is changed based on @xiangli-cmu 's commits in 627 issue.
2014-04-14 13:56:47 -07:00
Yicheng Qin
de9c318436 tests: add TestJoinThroughFollower 2014-04-14 13:41:45 -07:00
Xiang Li
bc70cdc242 tests(snapshot_test) loose the timing assumption for snapshot test
Test run slowly on drone after open race option.
2014-04-11 19:49:57 -04:00
Xiang Li
fc84da29e8 fix(internal_version_test.go) protect the checkedVersion by a lock 2014-04-10 23:35:55 -04:00
Xiang Li
2817baf3f8 fix(discovery_test.go) protect the garbageHandler by a lock 2014-04-10 23:28:40 -04:00
Yicheng Qin
28f19dec60 feat(server): make header-only requests work 2014-04-07 13:51:33 -07:00
Yicheng Qin
1c40f327be test(snapshot): test restart with snapshot 2014-04-04 16:45:02 -07:00
Yicheng Qin
0a4b6570e1 chore(tests): start TLS cluster slowly to evade problem 2014-04-04 10:57:11 -07:00
Xiang Li
5b4a473f14 Merge pull request #667 from unihorn/53
chore(tests): test TestTLSMultiNodeKillAllAndRecovery now
2014-03-27 22:40:09 -04:00
Yicheng Qin
4e747d24dd chore(tests): test TestTLSMultiNodeKillAllAndRecovery now
It is fixed.
2014-03-27 17:06:18 -07:00
Tomás Senart
b6053d6a86 Making code formatting consistent.
$ gofmt -s -w  && goimports -w
2014-03-27 14:19:08 +01:00