15446 Commits

Author SHA1 Message Date
ethan
b5ee1de076
cleanup: correct summary message in put.go 2019-08-12 21:13:58 +08:00
Tobias Grieger
029401ab81
Merge pull request #11005 from tbg/interactiontest
raft/rafttest: introduce datadriven testing
2019-08-12 11:52:52 +02:00
Tobias Schottdorf
e8090e57a2 raft/rafttest: introduce datadriven testing
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
2019-08-12 11:13:51 +02:00
Tobias Grieger
56ad881b1b
Merge pull request #11015 from gyuho/typo
raft: fix typo
2019-08-10 17:26:29 +02:00
Jingyi Hu
8e2225b4f1
Merge pull request #11016 from ethan-daocloud/patch-1
etcd-dump-logs: correct logging message word
2019-08-10 04:56:50 -07:00
ethan
867b31e01a
etcd-dump-logs: correct logging message word
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-10 17:01:57 +08:00
Gyuho Lee
6c87b21821 raft: fix typo
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-09 21:26:48 -07:00
Tobias Schottdorf
f57c16c271 vendor: bump datadriven
Picks up some fixes for papercuts.
2019-08-10 00:02:59 +02:00
Gyuho Lee
4a4629fd9f
Merge pull request #10957 from Hanaasagi/fix-metric-name-typo
test: fix metric name typo
2019-08-09 13:23:26 -07:00
Gyuho Lee
5e90267d1b CHANGELOG: update 3.3 + 3.4 with raft changes
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-09 11:32:10 -07:00
Tobias Grieger
4cec8dddc6
Merge pull request #11003 from tbg/interaction/restore
raft: fix restoring joint configurations
2019-08-09 20:13:04 +02:00
Xiang Li
84c69cca76
Merge pull request #10970 from nilsocket/minorFix1
raft : remove unnecessary, if check
2019-08-09 11:03:01 -07:00
Tobias Schottdorf
37ab5bdd21 raft: fix restoring joint configurations
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.

This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.

This is then used both in `(*raft).restore` as well as in `newRaft`.
2019-08-09 19:28:43 +02:00
Tobias Schottdorf
a5f785a232 confchange: clean up unnecessary block 2019-08-09 19:28:43 +02:00
Tobias Grieger
7948f39790
Merge pull request #11004 from tbg/interaction/unused-type
raft/tracker: visit Progress in stable order
2019-08-09 12:32:04 +02:00
Gyuho Lee
5ce1856cce
Merge pull request #11010 from etcd-io/wenjiaswe-patch-1
functional: Update functional test README.md
2019-08-08 20:57:26 -07:00
Wenjia
ab9e3d9829
functional: Update functional test README.md 2019-08-08 18:40:15 -07:00
Gyuho Lee
0b2b25e1c1 CHANGELOG: update metrics
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 15:08:10 -07:00
Gyuho Lee
046c705f97
Merge pull request #11009 from gyuho/snapshot
*: add inflight snapshot metrics
2019-08-08 13:56:14 -07:00
Gyuho Lee
06b82c200f etcdserver: add "etcd_server_snapshot_apply_inflights_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:52 -07:00
Gyuho Lee
a4badc33a3 integration: test snapshot inflights metrics
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:52 -07:00
Gyuho Lee
46bddacacb etcdserver/api: add "etcd_network_snapshot_send_inflights_total", "etcd_network_snapshot_receive_inflights_total"
Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:48 -07:00
Gyuho Lee
43ce2eefaa
Merge pull request #10995 from yuzeming/patch-3
agent: fix data race and deadlock
2019-08-08 12:22:20 -07:00
Zeming YU
c762a3d7f7 agent: fix a data race and deadlock
add 1-size buffer for `errc`  to avoid deadlock of child goroutine
add a local variable to a void data race in `err`
when `case <-stream.Context().Done():` is taken
2019-08-08 11:05:30 -07:00
Sahdev Zala
e745649cce
Merge pull request #10960 from spzala/readmesec
README: update security reference
2019-08-08 13:31:33 -04:00
Gyuho Lee
c4b8ec5369 CHANGELOG: update links, raft updates
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 09:20:16 -07:00
Gyuho Lee
bfcd590f05
Merge pull request #11000 from retroflexer/doc-fix-broken-links
doc: Fix broken links referring to readthedocs.io
2019-08-08 09:12:39 -07:00
Gyuho Lee
1c65af7acf
Merge pull request #11006 from gyuho/functional
functiona: fix flaky tests
2019-08-08 09:08:23 -07:00
Gyuho Lee
72e00cea3a functional/agent: copy file, instead of renaming
To retain failure logs in CI testing.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 08:12:05 -07:00
Gyuho Lee
d1c7be24b0 functional/rpcpb: make client log less verbose
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 08:06:48 -07:00
Gyuho Lee
0926a434b7 functional.yaml: try lower snapshot count for flaky tests, error threshold
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 08:06:48 -07:00
Tobias Grieger
a41bd303ec
Merge pull request #10998 from tbg/learners-vote
raft: let learners vote
2019-08-08 11:26:17 +02:00
Tobias Schottdorf
1b3e0821a7 raft/tracker: visit Progress in stable order
This is helpful for upcoming testing work which allows datadriven
testing of the interaction of multiple nodes. This testing requires
determinism to work correctly.
2019-08-08 09:37:33 +02:00
Jingyi Hu
03bd10076f
Merge pull request #10955 from lzhfromustc/master
Avoid potential double lock of tsafeSet
2019-08-07 15:50:05 -07:00
lzhfromustc
0e7173b447
pkg/types: Avoid potential double lock of tsafeSet.
(tsafeSet).Sub and (tsafeSet).Equals can cause double lock bug if ts and other is pointing the same variable

gofmt the code and add some comments
2019-08-07 15:08:00 -07:00
Gyuho Lee
158354755a test: output etcd server logs when functional tests fail
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-07 10:16:15 -07:00
Tobias Schottdorf
9553994cd7 raft/auorum: remove unused type 2019-08-07 18:53:01 +02:00
retroflexer
742f928c6a Broken link in runtime-configuration.md
See the issue created here:
https://github.com/etcd-io/etcd/issues/10989#issuecomment-518726038

doc: fix broken links referring to etcd.redhatdocs.io

Adding links to internal Documentation within github.com.

Update runtime-configuration.md

Update runtime-configuration.md

Update CHANGELOG-3.3.md

Remove extra space

Keep the formatting similar to original
2019-08-07 10:50:21 -04:00
Tobias Schottdorf
c30c2e345b raft: let learners vote
It turns out that that learners must be allowed to cast votes.

This seems counter- intuitive but is necessary in the situation in which
a learner has been promoted (i.e. is now a voter) but has not learned
about this yet.

For example, consider a group in which id=1 is a learner and id=2 and
id=3 are voters. A configuration change promoting 1 can be committed on
the quorum `{2,3}` without the config change being appended to the
learner's log. If the leader (say 2) fails, there are de facto two
voters remaining. Only 3 can win an election (due to its log containing
all committed entries), but to do so it will need 1 to vote. But 1
considers itself a learner and will continue to do so until 3 has
stepped up as leader, replicates the conf change to 1, and 1 applies it.

Ultimately, by receiving a request to vote, the learner realizes that
the candidate believes it to be a voter, and that it should act
accordingly. The candidate's config may be stale, too; but in that case
it won't win the election, at least in the absence of the bug discussed
in:
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.
2019-08-07 12:03:18 +02:00
Jingyi Hu
0d85aa1b41
Merge pull request #10993 from yuzeming/patch-1
integration: fix a data race about `err`
2019-08-06 15:58:21 -07:00
Gyuho Lee
88f4b83ba9 mvcc: fix typo in test
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-06 15:09:05 -07:00
Gyuho Lee
a996a8a912
Merge pull request #10990 from gyuho/grpc
vendor: update gRPC to latest
2019-08-06 15:08:11 -07:00
Gyuho Lee
877aa2497e
Merge pull request #10994 from yuzeming/patch-2
v3rpc: fix a typo `err`
2019-08-06 15:06:27 -07:00
Zeming YU
181419256d integration: fix a data race about err
don't share `err` between goroutines
2019-08-06 14:58:15 -07:00
Zeming YU
3edb569ad3
v3rpc: fix a typo err
don't read return value in child goroutine which causes data race.
2019-08-06 14:04:58 -07:00
Gyuho Lee
017b6c424e stream: Prevent panic when newAttemptLocked fails to get a transport for the new attempt
Testing https://github.com/grpc/grpc-go/pull/2958

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-06 13:08:47 -07:00
Gyuho Lee
f5f400b14a vendor: update gRPC to latest
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-06 10:20:02 -07:00
Gyuho Lee
44a00a33ef
Merge pull request #10987 from wenjiaswe/functional-test-fix
functional: update go.etcd.io/etcd link and go image registry for func…
2019-08-05 22:45:12 -07:00
Wenjia Zhang
f7397d0628 functional:update go.etcd.io/etcd link and go image registry for functional test 2019-08-05 22:19:45 -07:00
Gyuho Lee
bcaaeebc82
Merge pull request #10985 from etcd-io/wenjiaswe-update-functional-readme
functional test: Update functional README.md
2019-08-05 21:45:09 -07:00