Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
fanmin shi	e33b10a666	etcdserver: add a test to ensure config change also update ConsistIndex	2017-05-02 16:51:40 -07:00
fanmin shi	5533c3058a	etcdserver: apply() sets consistIndex for any entry type previously, apply() doesn't set consistIndex for EntryConfChange type. this causes a misalignment between consistIndex and applied index where EntryConfChange entry results setting applied index but not consistIndex. suppose that addMember() is called and leader reflects that change. 1. applied index and consistIndex is now misaligned. 2. a new follower node joined. 3. leader sends the snapshot to follower where the applied index is the snapshot metadata index. 4. follower node saves the snapshot and database(includes consistIndex) from leader. 5. restarting follower loads snapshot and database. 6. follower checks snapshot metadata index(same as applied index) and database consistIndex, finds them don't match, and then panic. FIXES #7834	2017-05-02 14:57:36 -07:00
Gyu-Ho Lee	fdf445b5a0	Merge pull request #7848 from gyuho/close-grpcc embed: fix blocking Close before gRPC server start	2017-05-01 18:44:20 -07:00
Anthony Romano	f065d8e258	Merge pull request #7845 from heyitsanthony/single-node-docker Documentation: add documentation for single node docker etcd	2017-05-01 16:42:19 -07:00
Gyu-Ho Lee	b0e9d24fb6	embed: fix blocking Close before gRPC server start If 'StartEtcd' returns before starting gRPC server (e.g. mismatch snapshot, misconfiguration), receiving from grpcServerC blocks forever. This patch just closes the channel to not block on grpcServerC, and proceeds to next stop operations in Close. This was masking the issues in https://github.com/coreos/etcd/issues/7834 Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-05-01 16:41:13 -07:00
Anthony Romano	b1720b779c	Merge pull request #7846 from heyitsanthony/build-aci-annotate scripts: annotate with acbuild with supports-systemd-notify	2017-05-01 16:04:03 -07:00
Anthony Romano	6c1ce697a6	scripts: annotate with acbuild with supports-systemd-notify Fixes #7840	2017-05-01 12:59:08 -07:00
Anthony Romano	3f1f5e5215	Merge pull request #7844 from heyitsanthony/v2-docker-tag Documentation/v2: pin docker guide to use latest 2.3.x	2017-05-01 12:54:03 -07:00
Anthony Romano	b8f08d400d	Documentation: add documentation for single node docker etcd Fixes #7843	2017-05-01 12:36:16 -07:00
Anthony Romano	066f9bf7e3	Documentation/v2: pin docker guide to use latest 2.3.x	2017-05-01 11:46:39 -07:00
Gyu-Ho Lee	f0ca65a95d	version: bump up to 3.2.0-rc.0+git	2017-04-28 11:06:53 -07:00
Gyu-Ho Lee	7e6d876385	version: bump up to 3.2.0-rc.0 v3.2.0-rc.0	2017-04-28 10:09:39 -07:00
Gyu-Ho Lee	7239249155	Merge pull request #7837 from gyuho/tls-errors integration: match more TLS errors for wrong certs	2017-04-28 10:08:34 -07:00
Gyu-Ho Lee	cfeab9324e	integration: match more TLS errors for wrong certs Fix https://github.com/coreos/etcd/issues/7835. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-28 10:03:29 -07:00
Gyu-Ho Lee	77fd369b1c	Merge pull request #7832 from gyuho/doc-for-3.2 Documentation: add upgrade to 3.2 doc	2017-04-27 21:27:26 -07:00
Gyu-Ho Lee	cbd7ef4ee6	Documentation: add upgrade to 3.2 doc Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-27 14:39:42 -07:00
Gyu-Ho Lee	747993de08	Merge pull request #7829 from gyuho/certs pkg/transport: reload TLS certificates for every client requests	2017-04-27 14:36:53 -07:00
Gyu-Ho Lee	96d6f05391	Merge pull request #7831 from gyuho/cc pkg/wait: add comment and make List private	2017-04-27 13:45:25 -07:00
Gyu-Ho Lee	22943e7e06	integration: test TLS reload Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-27 13:32:09 -07:00
Xiang Li	d818ef2c76	pkg/wait: add comment and make List private	2017-04-27 13:25:02 -07:00
Tony Grosinger	4e21f87e3d	pkg/transport: reload TLS certificates for every client requests This changes the baseConfig used when creating tls Configs to utilize the GetCertificate and GetClientCertificate functions to always reload the certificates from disk whenever they are needed. Always reloading the certificates allows changing the certificates via an external process without interrupting etcd. Fixes #7576 Cherry-picked by Gyu-Ho Lee <gyuhox@gmail.com> Original commit can be found at https://github.com/coreos/etcd/pull/7784	2017-04-27 11:22:03 -07:00
Anthony Romano	c309d745a6	Merge pull request #7819 from heyitsanthony/fix-elect-compact concurrency: use current revisions for election	2017-04-27 11:01:44 -07:00
Anthony Romano	2a3229c00a	Merge pull request #7808 from heyitsanthony/auto-bom CI BOM checking	2017-04-27 09:24:59 -07:00
Anthony Romano	3e7bd47cd5	travis: add bill-of-materials checking Fixes #7780	2017-04-26 16:29:48 -07:00
Anthony Romano	2059c8e9e7	vendor: revendor speakeasy to include unix license file updates BOM	2017-04-26 16:29:48 -07:00
Anthony Romano	b77de97136	test: bill of materials check pass	2017-04-26 16:29:47 -07:00
Gyu-Ho Lee	633a0a847b	Merge pull request #7824 from gyuho/certs *: test expired certs in client	2017-04-26 13:31:17 -07:00
Gyu-Ho Lee	f674a1b583	clientv3/integration: test client dial with expired certs Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-26 12:32:46 -07:00
Gyu-Ho Lee	7cb860a31b	integration/fixtures: add expired certs Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-26 12:22:54 -07:00
Anthony Romano	d2e69b339f	Merge pull request #7816 from heyitsanthony/v3client-blankctx v3client: wrap watch ctxs with blank ctx	2017-04-25 21:53:14 -07:00
Gyu-Ho Lee	41e77c9db6	Merge pull request #7818 from gyuho/doc Documentation: require Go 1.8+ for build	2017-04-25 21:46:07 -07:00
Anthony Romano	50f29bd661	concurrency: use current revisions for election Watching from the leader's ModRevision could cause live-locking on observe retry loops when the ModRevision is less than the compacted revision. Instead, start watching the leader from at least the store revision of the linearized read used to detect the current leader. Fixes #7815	2017-04-25 20:15:50 -07:00
Anthony Romano	6486be673b	integration: test Observe can read leaders set prior to compaction	2017-04-25 20:03:49 -07:00
Gyu-Ho Lee	4959663f90	Documentation: require Go 1.8+ for build Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 17:04:54 -07:00
fanmin shi	c49a87bd04	Merge pull request #7672 from fanminshi/integrate_runner_to_tester etcd-tester: integrate etcd runner into etcd tester	2017-04-25 15:22:29 -07:00
fanmin shi	60b9adc267	Merge pull request #7812 from fanminshi/refactor_runner etcd-runner: fix runner and minor refactoring.	2017-04-25 15:21:57 -07:00
Anthony Romano	3ce31acda4	v3client: wrap watch ctxs with blank ctx Printing the values in ctx.String() will data race if the value is mutable and doesn't implement String(), which seems to be common. Instead, just return a fixed string instead of computing it; v3client watches don't need as much flexibility for creating separate strings, so separate ctx strings probably aren't necessary at this point. Fixes #7811	2017-04-25 15:03:06 -07:00
Gyu-Ho Lee	96aaeee4f5	Merge pull request #7814 from gyuho/aaa etcdserver: do not block on raft stopping	2017-04-25 15:00:06 -07:00
fanmin shi	a9e04061b1	etcd-runner: integrate etcd runner in to etcd tester etcd tester runs etcd runner as a separate binary. it signals sigstop to the runner when tester wants to stop stressing. it signals sigcont to the runner when tester wants to start stressing. when tester needs to clean up, it signals sigint to runner. FIXES #7026	2017-04-25 14:53:23 -07:00
fanmin shi	77fbe10dfc	etcd-runner: add --prefix flag, allows inf round, and minor vars refactoring in watch runner.	2017-04-25 14:18:42 -07:00
fanmin shi	debc69e1f2	etcd-runner: pass in lock name as a command arg for lock_racer.	2017-04-25 14:18:42 -07:00
fanmin shi	72fb756af3	etcd-runner: add lease ttl as a flag and fatal when err in lease-runner.	2017-04-25 14:18:42 -07:00
fanmin shi	d57ad8ec8d	etcd-runner: add barrier, observe !ok handling, and election name arg to election-runner.	2017-04-25 14:17:59 -07:00
fanmin shi	fa85445ef8	etcd-runner: add rate limiting in doRounds()	2017-04-25 14:00:52 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	2af1605db3	Merge pull request #7810 from gyuho/sync-with-apply etcdserver: ensure waitForApply sync with applyAll	2017-04-25 13:21:30 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
fanmin shi	b94b8b5707	etcd-runner: move root cmd into command package this allows easier sharing of global variable for sub commands.	2017-04-25 10:19:20 -07:00
Anthony Romano	fbbc4a4979	Merge pull request #7732 from heyitsanthony/lease-err-ka clientv3: don't halt lease client if there is a lease error	2017-04-25 07:06:31 -07:00
Anthony Romano	2fd6df922a	integration: close proxy's lease client	2017-04-24 23:49:45 -07:00

1 2 3 4 5 ...

11293 Commits