Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Gyu-Ho Lee	c1e3172e3a	etcdserver/api/v3rpc: add default grpc health service	2017-06-20 10:48:06 -07:00
Anthony Romano	8d7c29c732	etcdserver, etcdserverpb: Txn.Compare range_end support	2017-06-16 12:13:27 -07:00
Anthony Romano	1acc8090e3	Merge pull request #8110 from heyitsanthony/fix-test-sync-timeout etcdserver: use RecorderStream for TestSyncTimeout to avoid missing action	2017-06-15 20:49:10 -07:00
Anthony Romano	e962b0c849	Merge pull request #7909 from heyitsanthony/unptr-cfg etcdserver, embed, integration: don't use pointer for ServerConfig	2017-06-15 20:47:30 -07:00
Gyu-Ho Lee	5e059fd8dc	*: use metadata Incoming/OutgoingContext Fix https://github.com/coreos/etcd/issues/7888. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-06-15 16:41:23 -07:00
Anthony Romano	aa0e6b26c0	etcdserver: use RecorderStream for TestSyncTimeout to avoid missing action	2017-06-15 13:43:53 -07:00
Anthony Romano	dcf52bbfac	etcdserver, embed, integration: don't use pointer for ServerConfig ServerConfig is owned by etdcserver and unshared, so don't pass or store by pointer. Also removes duplicated field 'snapCount'.	2017-06-15 13:02:13 -07:00
Anthony Romano	4445996a38	Merge pull request #8084 from heyitsanthony/update-protobuf vendor: update github.com/{gogo,golang}/protobuf	2017-06-12 19:09:49 -07:00
Anthony Romano	4ebeba0e18	*: regen protofiles with latest protobuf tools	2017-06-12 15:14:43 -07:00
Anthony Romano	7ff5b05004	etcdserver: better warning when initial-cluster doesn't match advertise urls The old error was not clear about what URLs needed to be added, sometimes truncating the list. To make it clearer, print out the missing entries for --initial-cluster and print the full list of initial advertise peers. Fixes #8079 and #7927	2017-06-12 14:14:16 -07:00
Anthony Romano	d173b09a1b	etcdserver: use same ReadView for read-only txns A read-only txn isn't serialized by raft, but it uses a fresh read txn for every mvcc access prior to executing its request ops. If a write txn modifies the keys matching the read txn's comparisons, the read txn may return inconsistent results. To fix, use the same read-only mvcc txn for the duration of the etcd txn. Probably gets a modest txn speedup as well since there are fewer read txn allocations.	2017-06-09 09:20:38 -07:00
Anthony Romano	2caae60004	Merge pull request #8062 from heyitsanthony/revert-v2machines v2http: put back /v2/machines and mark as non-deprecated	2017-06-08 12:01:58 -07:00
Gyu-Ho Lee	45fd8279f0	etcdserver: add leaseExpired debugging metrics Fix https://github.com/coreos/etcd/issues/8050. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-06-08 10:36:25 -07:00
Anthony Romano	c2dadbd9f8	v2http: put back /v2/machines and mark as non-deprecated This reverts commit 2bb33181b6c8fbe8109fc668a19ce4ab46c605ec. python-etcd seems to depend on /v2/machines and the maintainer vanished. Plus, it is prefixed with /v2/ so it probably can't be deprecated anyway.	2017-06-08 09:39:11 -07:00
Hitoshi Mitake	fa4903c83c	Merge pull request #8031 from mitake/lease-revoke-auth protecting lease revoking with auth	2017-06-08 13:34:14 +09:00
Hitoshi Mitake	0c655902f2	auth, etcdserver: protect revoking lease with auth Currently clients can revoke any lease without permission. This commit lets etcdserver protect revoking with write permission. This commit adds a mechanism for generating internal token. It is used for indicating that LeaseRevoke was issued internally so it should be able to delete any attached keys.	2017-06-07 17:46:14 -07:00
Anthony Romano	fb086ef13f	v3rpc: dedup resp.Header == nil checks	2017-06-07 09:25:42 -07:00
Anthony Romano	8542f2e673	v3rpc: use map for translating errors to grpc errors Switch statement had poor coverage, use a map instead	2017-06-06 16:55:44 -07:00
Anthony Romano	887db5a3db	*: fix go tool vet -all -shadow errors	2017-06-03 21:32:36 -07:00
Anthony Romano	0c923bdf11	Merge pull request #8010 from heyitsanthony/json-txn e2e: test txn over grpc json	2017-06-01 10:01:41 -07:00
Anthony Romano	d8210da505	v3rpc: treat nil txn request op as error Fixes #7889	2017-05-31 12:39:52 -07:00
Anthony Romano	a20e667c5b	Merge pull request #7967 from heyitsanthony/purge-snapdb etcdserver: purge old snap.db files	2017-05-30 16:15:11 -07:00
fanmin shi	68a72c6b6e	v3rpc: change grpc max recv size as needed.	2017-05-25 11:01:51 -07:00
fanmin shi	9e7740011b	etcdserver: add --max-request-bytes flag	2017-05-25 11:01:38 -07:00
fanmin shi	b003734be6	Merge pull request #7976 from fanminshi/make_maxOpsPerTxn_configurable etcdserver: add --max-txn-ops flag	2017-05-25 10:34:17 -07:00
fanmin shi	e9f464debc	integration: creation of cluster now takes maxTxnOps	2017-05-24 14:48:44 -07:00
fanmin shi	ae7ddfb483	etcdserver: add --max-txn-ops flag --max-txn-ops allows users to define the maximum transaction operations for each txn request. it defaults at 128. Fixes #7826	2017-05-24 10:32:32 -07:00
Anthony Romano	c1c9a2c96c	etcdserver: close mvcc.KV on init error path Scheduled compaction will panic if KV is not stopped before closing the backend.	2017-05-23 10:41:37 -07:00
Anthony Romano	ab16fa1f07	etcdserver: purge old snap.db files Lots of garbage db files in #7957. Should purge.	2017-05-22 15:44:21 -07:00
Hitoshi Mitake	4cd5e7ebb2	Merge pull request #7809 from mitake/auth-watch protect watch with auth	2017-05-20 13:23:30 +09:00
Hitoshi Mitake	939912c425	clientv3, etcdserver: support auth in Watch()	2017-05-20 11:34:45 +09:00
Anthony Romano	33c375dc44	*: fill out blank package godocs Mostly one-liner short descriptions, but also includes some typo fixes and some examples.	2017-05-18 09:41:13 -07:00
Xiang	32c252f003	etcdserver: more logging on snapshot close path	2017-05-17 14:48:52 -07:00
Anthony Romano	f6cd4d4f5b	snap, etcdserver: tighten up snapshot path handling Computing the snapshot file path is error prone; snapshot recovery was constructing file paths missing a path separator so the snapshot would never be loaded. Instead, refactor the backend path handling to use helper functions where possible.	2017-05-11 13:46:59 -07:00
fanmin shi	47f5b7c3ad	Merge pull request #7876 from fanminshi/fix_7628 etcdserver: renaming db happens after snapshot persists to wal and snap files	2017-05-09 16:15:41 -07:00
fanmin shi	dfdaf082c5	etcdserver: add a test to ensure renaming db happens before persisting wal and snap files	2017-05-09 14:00:22 -07:00
fanmin shi	8b7b7222dd	etcdserver: renaming db happens after snapshot persists to wal and snap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen after snapshot is persisted to wal and snap file. this guarantees wal and snap files are newer than db. on server restart, etcd server checks if snap index > db consistent index. if yes, etcd server attempts to load xxx.snap.db where xxx=snap index if there is any and panic other wise. FIXES #7628	2017-05-09 14:00:12 -07:00
Iwasaki Yudai	010ffc0692	v3rpc: remove duplicated error case for lease.ErrLeaseNotFound	2017-05-08 20:09:41 -07:00
fanmin shi	e33b10a666	etcdserver: add a test to ensure config change also update ConsistIndex	2017-05-02 16:51:40 -07:00
fanmin shi	5533c3058a	etcdserver: apply() sets consistIndex for any entry type previously, apply() doesn't set consistIndex for EntryConfChange type. this causes a misalignment between consistIndex and applied index where EntryConfChange entry results setting applied index but not consistIndex. suppose that addMember() is called and leader reflects that change. 1. applied index and consistIndex is now misaligned. 2. a new follower node joined. 3. leader sends the snapshot to follower where the applied index is the snapshot metadata index. 4. follower node saves the snapshot and database(includes consistIndex) from leader. 5. restarting follower loads snapshot and database. 6. follower checks snapshot metadata index(same as applied index) and database consistIndex, finds them don't match, and then panic. FIXES #7834	2017-05-02 14:57:36 -07:00
Anthony Romano	3ce31acda4	v3client: wrap watch ctxs with blank ctx Printing the values in ctx.String() will data race if the value is mutable and doesn't implement String(), which seems to be common. Instead, just return a fixed string instead of computing it; v3client watches don't need as much flexibility for creating separate strings, so separate ctx strings probably aren't necessary at this point. Fixes #7811	2017-04-25 15:03:06 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
Anthony Romano	2bb33181b6	v2http: remove deprecated /v2/machines path	2017-04-22 03:11:21 -07:00
Anthony Romano	393e4335b7	*: put gateway stubs into their own packages Fixes #7773	2017-04-19 13:09:06 -07:00
Anthony Romano	d24a763a12	Merge pull request #7771 from heyitsanthony/remove-2.0-version etcdserver: remove 2.0 StatusNotFound version check	2017-04-19 00:57:19 -07:00
Hitoshi Mitake	d3456b5ecd	Merge pull request #7759 from mitake/fix-7724 *: simply ignore ErrAuthNotEnabled in clientv3 if auth is not enabled	2017-04-19 16:07:18 +09:00
Anthony Romano	3d8e2e1171	etcdserver: remove 2.0 StatusNotFound version check	2017-04-18 20:22:56 -07:00
Hitoshi Mitake	e1306bff8f	*: simply ignore ErrAuthNotEnabled in clientv3 if auth is not enabled Fix https://github.com/coreos/etcd/issues/7724	2017-04-19 11:27:14 +09:00
Anthony Romano	714b48a4b4	etcdserver: initialize raftNode with constructor raftNode was being initialized in start(), which was causing hangs when trying to stop the etcd server since the stop channel would not be initialized in time for the stop call. Instead, setup non-configurable bits in a constructor. Fixes #7668	2017-04-18 09:33:59 -07:00

1 2 3 4 5 ...

1606 Commits