Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Jingyi Hu	39680c381e	Merge pull request #10908 from tbg/log etcdserver: fix createConfChangeEnts	2019-07-19 17:45:27 -07:00
Xiang Li	9a69aa17c8	Merge pull request #10614 from jmillikin-stripe/cert-allowed-san-flags etcdmain, pkg: Support peer and client TLS auth based on SAN fields.	2019-07-19 12:02:28 -07:00
Tobias Schottdorf	eb4d9b640a	etcdserver: fix createConfChangeEnts It created a sequence of conf changes that could intermittently cause an empty set of voters, which Raft asserts against as of #10889. This fixes TestCtlV2BackupSnapshot and TestCtlV2BackupV3Snapshot, see: https://github.com/etcd-io/etcd/issues/10700#issuecomment-512358126	2019-07-19 17:13:08 +02:00
Tobias Grieger	3c5e2f51e4	Merge pull request #10892 from tbg/rawnode-everywhere-attempt3 raft: use RawNode for node's event loop; clean up bootstrap	2019-07-19 14:30:08 +02:00
Tobias Schottdorf	caa48bcc3d	raft: remove TestNodeBoundedLogGrowthWithPartition It has a data race between the test's call to `reduceUncommittedSize` and a corresponding call during Ready handling in `(*node).run()`. The corresponding RawNode test still verifies the functionality, so instead of fixing the test we can remove it.	2019-07-19 12:35:14 +02:00
Tobias Schottdorf	500af91653	raft: restore ability to bootstrap RawNode We are worried about breaking backwards compatibility for any application out there that may have relied on the old behavior. Their RawNode invocation would have been broken by the removal of the peers argument so it would not have changed silently; an associated comment tells callers how to fix it.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c9491d7861	raft: clean up bootstrap This is the first (maybe not last) step in cleaning up the bootstrap code around StartNode. Initializing a Raft group for the first time is awkward, since a configuration has to be pulled from thin air. The way this is solved today is unclean: The app is supposed to pass peers to StartNode(), we add configuration changes for them to the log, immediately pretend that they are applied, but actually leave them unapplied (to give the app a chance to observe them, though if the app did decide to not apply them things would really go off the rails), and then return control to the app. The app will then process the initial Readys and as a result the configuration will be persisted to disk; restarts of the node then use RestartNode which doesn't take any peers. The code that did this lived awkwardly in two places fairly deep down the callstack, though it was really only necessary in StartNode(). This commit refactors things to make this more obvious: only StartNode does this dance now. In particular, RawNode does not support this at all any more; it expects the app to set up its Storage correctly. Future work may provide helpers to make this "preseeding" of the Storage more user-friendly. It isn't entirely straightforward to do so since the Storage interface doesn't provide the right accessors for this purpose. Briefly speaking, we want to make sure that a non-bootstrapped node can never catch up via the log so that we can implicitly use one of the "skipped" log entries to represent the configuration change into the bootstrap configuration. This is an invasive change that affects all consumers of raft, and it is of lower urgency since the code (post this commit) already encapsulates the complexity sufficiently.	2019-07-19 10:02:02 +02:00
Tobias Schottdorf	c62b7048b5	raft: use RawNode for node's event loop It has always bugged me that any new feature essentially needed to be tested twice due to the two ways in which apps can use raft (`node` and `RawNode`). Due to upcoming testing work for joint consensus, now is a good time to rectify this somewhat. This commit removes most logic from `(node).run` and uses `RawNode` internally. This simplifies the logic and also lead (via debugging) to some insight on how the semantics of the approaches differ, which is now documented in the comments.	2019-07-19 09:59:59 +02:00
Jingyi Hu	233be58056	Merge pull request #10839 from needkane/pr raft: update log info and annotation	2019-07-18 23:26:44 -07:00
Tobias Grieger	62f4fb3c5e	Merge pull request #10903 from tbg/inflights raft: return non-nil Inflights in raft status	2019-07-18 17:50:28 +02:00
Tobias Schottdorf	6b0322549f	raft: replace StatusWithoutProgress with BasicStatus Now that a Config is also added to the full status, the old name did not convey the intention, which was to get a Status without an associated allocation.	2019-07-18 16:28:37 +02:00
Xiang Li	f498392ca7	Merge pull request #10898 from tbg/dep scripts: fail explicitly in updatedep.sh when gopath.proto exists	2019-07-17 16:47:51 -07:00
Xiang Li	7d2e57216a	Merge pull request #10900 from yuzeming/master integration: add WaitGroup to TestV3WatchCurrentPutOverlap	2019-07-17 16:46:25 -07:00
yzm	3737979532	move wg.Wait() after loop	2019-07-17 16:31:48 -07:00
Tobias Schottdorf	7ce934cbec	raft: return active config in Status This is useful for debug purposes, and more so once we support joint quorums.	2019-07-17 14:29:45 +02:00
Tobias Schottdorf	26a1e60eab	raft: return non-nil Inflights in raft status Recent refactoring to the String() method of `Progress` hit an NPE because we return nil Inflights as part of the Raft status. Just fix this at the source and properly populate the Raft status instead of teaching String() to ignore nil. A real Progress always has a non-nil Inflights.	2019-07-17 12:53:28 +02:00
yzm	d87bd2c87c	integration: add WaitGroup to prevent calling t.Fatalf after TestV3WatchCurrentPutOverlap function return It could cause a panic when it happens Fixes #10886	2019-07-16 10:25:35 -07:00
Tobias Grieger	9fba06ba3b	Merge pull request #10889 from tbg/joint-conf-change-logic raft: internally support joint consensus	2019-07-16 16:02:16 +02:00
Tobias Schottdorf	aa158f36b9	raft: internally support joint consensus This commit introduces machinery to safely apply joint consensus configuration changes to Raft. The main contribution is the new package, `confchange`, which offers the primitives `Simple`, `EnterJoint`, and `LeaveJoint`. The first two take a list of configuration changes. `Simple` only declares success if these configuration changes (applied atomically) change the set of voters by at most one (i.e. it's fine to add or remove any number of learners, but change only one voter). `EnterJoint` makes the configuration joint and then applies the changes to it, in preparation of the caller returning later and transitioning out of the joint config into the final desired configuration via `LeaveJoint()`. This commit streamlines the conversion between voters and learners, which is now generally allowed whenever the above conditions are upheld (i.e. it's not possible to demote a voter and add a new voter in the context of a Simple configuration change, but it is possible via EnterJoint). Previously, we had the artificial restriction that a voter could not be demoted to a learner, but had to be removed first. Even though demoting a learner is generally less useful than promoting a learner (the latter is used to catch up future voters), demotions could see use in improved handling of temporary node unavailability, where it is desired to remove voting power from a down node, but to preserve its data should it return. An additional change that was made in this commit is to prevent the use of empty commit quorums, which was previously possible but for no good reason; this: Closes #10884. The work left to do in a future PR is to actually expose joint configurations to the applications using Raft. This will entail mostly API design and the addition of suitable testing, which to be carried out ergonomically is likely to motivate a larger refactor. Touches #7625.	2019-07-16 15:36:04 +02:00
Tobias Schottdorf	14625b847c	scripts: have genproto.sh clean up after itself We don't want it to leave gopath.proto around for reasons detailed in the previous commit (messing up vgo).	2019-07-16 14:01:04 +02:00
Tobias Schottdorf	f63984bb33	scripts: fail explicitly in updatedep.sh when gopath.proto exists I had been dealing with these intermittent failures for a while and finally figured out why. The real solution is making genproto.sh less ugly but that won't happen for a while.	2019-07-16 13:54:09 +02:00
Xiang Li	5a734e79f5	Merge pull request #10891 from changkun/raft raft/rafttest: simulate async send in node test	2019-07-15 11:49:06 -07:00
Changkun Ou	856097181b	raft/rafttest: simulate async send in node test In order to cover message can well be received when a node is paused, this commit sends message async using goroutine and random sleep. This change makes recvms is possible to cache message during node.pause is triggered.	2019-07-13 16:22:33 +02:00
Gyuho Lee	e56e8471ec	Merge pull request #10888 from tbg/test-ski test: allow failures in linux-amd64-integration-4-cpu	2019-07-11 09:24:06 -07:00
Tobias Schottdorf	b7327b1cd8	test: allow failures in linux-amd64-integration-4-cpu This run should certainly pass, but it's consistently the one that fails with a regularity that essentially blocks the CI pipeline. Someone needs to take a look at #10700, but in the meantime, the show must go on.	2019-07-11 16:40:54 +02:00
Tobias Grieger	b2274efee0	Merge pull request #10864 from tbg/learner-snap raft: allow voter to become learner through snapshot	2019-07-11 15:48:09 +02:00
John Millikin	95f3138b5f	tests: Use more deterministic error message in TestEtcdPeerNameAuth	2019-07-10 14:24:20 +09:00
John Millikin	c6686734b1	tests: Use 'localhost' to match SAN of `integration/fixtures/server.crt`	2019-07-10 13:33:14 +09:00
John Millikin	91472797ff	pkg: Remove stray printfs	2019-07-10 13:33:14 +09:00
John Millikin	5824421f8b	etcdman, pkg: Rename new flags to 'hostname'	2019-07-10 09:30:02 +09:00
John Millikin	9a53601a18	etcdmain, pkg: Support peer and client TLS auth based on SAN fields. Etcd currently supports validating peers based on their TLS certificate's CN field. The current best practice for creation and validation of TLS certs is to use the Subject Alternative Name (SAN) fields instead, so that a certificate might be issued with a unique CN and its logical identities in the SANs. This commit extends the peer validation logic to use Go's `(*"crypto/x509".Certificate).ValidateHostname` function for name validation, which allows SANs to be used for peer access control. In addition, it allows name validation to be enabled on clients as well. This is used when running Etcd behind an authenticating proxy, or as an internal component in a larger system (like a Kubernetes master).	2019-07-10 09:30:02 +09:00
Tobias Grieger	eb7dd97135	Merge pull request #10882 from tbg/pr-string raft: optimize string representation of Progress	2019-07-09 16:27:35 +02:00
Tobias Schottdorf	95024fa3cc	raft: optimize string representation of Progress Make it less verbose by omitting the values for the steady state. Also rearrange the order so that information that is typically more relevant is printed first.	2019-07-09 11:22:37 +02:00
Xiang Li	0af16979f8	Merge pull request #10879 from lzhfromustc/master etcdserver: modify a read operation to avoid potential race	2019-07-08 14:48:43 -07:00
lzhfromustc	d35f6647bc	Use newbe instead of s.be to avoid potential race `s.cluster.SetBackend(s.be)` is not in critical section. Using `newbe` instead of `s.be` can avoid potential data race.	2019-07-08 14:24:52 -07:00
Tobias Schottdorf	6f009d211f	raft: allow voter to become learner through snapshot At the time of writing, we don't allow configuration changes to change voters to learners directly, but note that a snapshot may compress multiple changes to the configuration into one: the voter could have been removed, then readded as a learner and the snapshot reflects both changes. In that case, a voter receives a snapshot telling it that it is now a learner. In fact, the node has to accept that snapshot, or it is permanently cut off from the Raft log. I think this just wasn't realized in the original work, but this is just my guess since there generally is very little rationale on the various decisions made. I also generally haven't been able to figure out whether the decision to prevent voters from becoming learners without first having been removed was motivated by some particular concern, or if it just wasn't deemed necessary. I suspect it is the latter because demoting a voter seems perfectly safe. See https://github.com/etcd-io/etcd/pull/8751#issuecomment-342028091.	2019-07-08 09:32:24 +02:00
Tobias Grieger	48f5bb6d28	Merge pull request #10865 from tbg/multi-conf-change raft: centralize configuration change application	2019-07-03 21:57:57 +02:00
Tobias Schottdorf	6697adfff8	raft/tracker: pull Voters and Learners into Config struct This is helpful to quickly print the configuration log messages without having to specify Voters and Learners separately. It will also come in handy for joint quorums because it allows holding on to voters and learners as a unit, which is useful for unit testing.	2019-07-03 21:26:42 +02:00
Tobias Schottdorf	b171e1c78b	raft: centralize configuration change application Put all the logic related to applying a configuration change in one place in preparation for adding joint consensus. This inspired various TODOs. I had to rewrite TestSnapshotSucceedViaAppResp since it was relying on a snapshot applied to the leader, which is now prevented.	2019-07-03 21:26:42 +02:00
kane	4f7d83a249	raft: update log info and annotation	2019-07-02 23:43:56 -04:00
Xiang Li	1f40b6642f	Merge pull request #10850 from Koprvhdix/role-remove-document-fix Documentation: change `etcdctl role remove` to `etcdctl role delete`	2019-07-01 15:34:55 -07:00
Xiang Li	d506962fec	Merge pull request #10848 from spzala/raftthesis10831 raftdoc: fix raft thesis link	2019-06-28 12:43:32 -07:00
Xiang Li	ecba4492f2	Merge pull request #10866 from lzhfromustc/master clientv3: Fixed a missing block bug	2019-06-28 11:54:50 -07:00
lzhfromustc	8194aa3f03	Fixed a missing block bug Description: w.mu is locked at line 385 and unlocked at line 396. Among 5 return statements in this function, 4 are below line 396 but there is 1 return at line 387. Fix: Add w.mu.Unlock() before that return at line 387.	2019-06-28 11:27:13 -07:00
Clockworkai	c34de2aef4	Documentation: change `etcdctl role remove` to `etcdctl role delete` This is a document error. With running `etcdctl role --help`, we can find that it should be delete, not remove. Fixes #10849	2019-06-26 09:03:08 +08:00
Sahdev P. Zala	655ab0ac6a	raftdoc: fix raft thesis link The current link does not work and not valid anymore per stanford support. Replace all current refs with a link that is used by the https://raft.github.io/ Fixes # https://github.com/etcd-io/etcd/issues/10831	2019-06-24 19:01:00 -04:00
Tobias Grieger	948e276ca7	Merge pull request #10807 from tbg/extract-prs raft: extract 'tracker' package	2019-06-21 22:50:06 +02:00
Tobias Schottdorf	f9c2d00fb3	raft: extract 'tracker' package Mechanically extract `progressTracker`, `Progress`, and `inflights` to their own package named `tracker`. Add lots of comments in the progress, and take the opportunity to rename and clarify various fields.	2019-06-21 22:15:00 +02:00
Xiang Li	6953ccc135	Merge pull request #10837 from tbg/ci-20m test: s/20m/30m/g	2019-06-21 08:54:26 +08:00
Tobias Schottdorf	362dfb4d08	test: s/20m/30m/g Every other test build times out due to the 20 minute test timeout. I doesn't seem like tests are actually hanging, it's more that 20 minutes just isn't enough to run the tests any more.	2019-06-20 23:44:25 +02:00

1 2 3 4 5 ...

15272 Commits