Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Pavel Kalinnikov	68af01ca6e	raft: add MaxInflightBytes to Config This commit introduces the max inflight bytes setting at the Config level, and tests that raft flow control honours it. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-13 23:05:16 +01:00
Pavel Kalinnikov	8c9c557d85	raft: factor out payloadsSize helper Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-13 23:05:16 +01:00
Pavel Kalinnikov	7bda0d7773	raft/tracker: add MaxInflightBytes to ProgressTracker This commit plumbs the max total byte size of the Inflights type higher up the stack to the ProgressTracker. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-13 23:05:16 +01:00
Pavel Kalinnikov	bfb7b16f4f	raft/tracker: add byte size limit to Inflights type The Inflights type has limits on the message size and the number of inflight messages. However, a single large entry that exceeds the size limit can still be sent. In combination with the max messages count limit, many large messages can be sent in a row and overflow the receiver. In effect, the "max" values act as "target" rather than hard limits. This commit adds an additional soft limit on the total size of inflight messages, which catches such situations and prevents the receiver overflow. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-13 23:05:16 +01:00
Nathan VanBenschoten	0f9d7a4f95	raft: make Message.Snapshot nullable, halve struct size This commit makes the rarely used `raftpb.Message.Snapshot` field nullable. In doing so, it reduces the memory size of a `raftpb.Message` message from 264 bytes to 128 bytes — a 52% reduction in size. While this commit does not change the protobuf encoding, it does change how that encoding is used. `(gogoproto.nullable) = false` instruct the generated proto marshaling logic to always encode a value for the field, even if that value is empty. `(gogoproto.nullable) = true` instructs the generated proto marshaling logic to omit an encoded value for the field if the field is nil. This raises compatibility concerns in both directions. Messages encoded by new binary versions without a `Snapshot` field will be decoded as an empty field by old binary versions. In other words, old binary versions can't tell the difference. However, messages encoded by old binary versions with an empty Snapshot field will be decoded as a non-nil, empty field by new binary versions. As a result, new binary versions need to be prepared to handle such messages. While Message.Snapshot is not intentionally part of the external interface of this library, it was possible for users of the library to access it and manipulate it. As such, this change may be considered a breaking change. Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>	2022-11-09 17:35:52 +00:00
Pavel Kalinnikov	1ea13494eb	raft/tracker: rename and comment MsgApp paused field Make the field name and comment clearer on the fact that it's used both in StateProbe and StateReplicate. The old name ProbeSent was slightly confusing, and also triggered thinking that it's used only in StateProbe. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:39 +00:00
Pavel Kalinnikov	4969aa81ae	raft: send empty appends when replication is paused When Inflights to a particular node is full, i.e. MaxInflightMsgs for the append messages flow is saturated, it is still necessary to continue sending MsgApp to ensure progress. Currently this is achieved by "forgetting" the first in-flight message in the window, which frees up quota for one new MsgApp. This new message is constructed in such a way that it potentially has multiple entries, or a large entry. The effect of this is that the in-flight limitations can be exceeded arbitrarily, for as long as the flow to this node continues being saturated. In particular, if a follower is stuck, the leader will keep sending entries to it. This commit makes the MsgApp empty when Inflights is saturated, and prevents the described leakage of Entries to slow followers. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:39 +00:00
Pavel Kalinnikov	3bc3d2071e	raft: extract Progress update on MsgApp to a method Previously, Progress update on MsgApp send was scattered across raft.go and tracker/progress.go. This commit better encapsulates this logic in the Progress type. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:38 +00:00
Pavel Kalinnikov	d5ac7b833f	raft: cleanup maybeSendAppend method - avoid large indented blocks, leave the main block unindented - declare pb.Message inlined in the sending call Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>	2022-11-08 22:21:38 +00:00
Benjamin Wang	f64bed6033	Merge pull request #14698 from ahrtr/raft_warn_20221107 raft: change the log from debug to warning when uncommitted size exceeds threshold	2022-11-07 19:57:33 +08:00
Benjamin Wang	3e07097d77	Merge pull request #14545 from nvanbenschoten/nvanbenschoten/simplifyAutoLeave raft: simplify auto-leave joint config on entry application logic	2022-11-07 17:20:26 +08:00
Benjamin Wang	a671e3ebd1	raft: change the log from debug to warning when uncommitted size exceeds max threshold Signed-off-by: Benjamin Wang <wachao@vmware.com>	2022-11-07 17:17:48 +08:00
王霄霄	aac5feec94	raft: remove duplicate letter in comment. Signed-off-by: Wang Xiaoxiao 1141195807@qq.com Signed-off-by: 王霄霄 <1141195807@qq.com>	2022-10-22 19:13:40 +08:00
Nathan VanBenschoten	419ee8a9c6	raft: panic on self-addressed messages These are nonsensical and a network implementation is not required to handle them correctly, so panic instead of sending them out. Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>	2022-10-06 20:25:07 -04:00
Nathan VanBenschoten	c50e728518	raft: simplify auto-leave joint config on entry application logic This commit simplifies the logic added in 37c7e4d to auto-leave joint configurations. It does so by making the following adjustments to the code: - remove the `oldApplied <= r.pendingConfIndex` condition. This does not seem necessary. When a node first attempts to auto-leave a joint config, it will bump `r.pendingConfIndex` when proposing. In cases where `oldApplied >= r.pendingConfIndex`, the proposal must have already been applied. Reviewers should double check this. - use raft.Step instead of custom proposal code. This code was already present in stepLeader, so there was no reason to duplicate it. This would have avoided bugs like the one we fixed in #14538. - use `confChangeToMsg` to generate message, to centralize the creation of all `MsgProp{EntryConfChange}` messages. Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>	2022-10-03 02:11:56 -04:00
Nathan VanBenschoten	bd34388721	raft: broadcast MsgApp on auto-leave joint config proposal This commit ensures that the raft leader eagerly broadcasts a MsgApp to each follower when initiating an automatic transition out of a joint configuration. This had been missed previously, which could lead to delayed completion of an auto-transition. Signed-off-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>	2022-09-29 12:33:20 -04:00
Benjamin Wang	31d9664cb5	Merge pull request #14413 from tbg/raft-single-voter raft: don't emit unstable CommittedEntries	2022-09-22 08:43:37 +08:00
Tobias Grieger	9ad36eecab	fixup! address comments Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 09:01:42 +02:00
Tobias Grieger	3c3e30a30e	Revert "raft: directly update leader in advance" This reverts commit d73a986e4edb15ef9dbfc994f1cbf5e96694d877, which was added only for benchmarking purposes. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 09:01:42 +02:00
Tobias Grieger	67c3522893	raft: directly update leader in advance This makes the alternative option of implementing the leader's self-ack of entry append the default. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 09:01:42 +02:00
Tobias Grieger	169f4c3cc7	raft: don't emit unstable CommittedEntries See https://github.com/etcd-io/etcd/issues/14370. When run in a single-voter configuration, prior to this PR raft would emit `HardState`s that would emit a proposed `Entry` simultaneously in `CommittedEntries` and `Entries`. To be correct, this requires users of the raft library to enforce an ordering between appending to the log and notifying the client about `CommittedEntries` also present in `Entries`. This was easy to miss. Walk back this behavior to arrive at a simpler contract: what's emitted in `CommittedEntries` is truly committed, i.e. present in stable storage on a quorum of voters. This in turn pessimizes the single-voter case: rather than fully handling an `Entry` in just one `Ready`, now two are required, and in particular one has to do extra work to save on allocations. We accept this as a good tradeoff, since raft primarily serves multi-voter configurations. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 08:59:37 +02:00
Tobias Grieger	3ad363d070	raft: always mark leader as RecentActive RecentActive is now initialized to true in `becomeLeader`. Both configuration changes and CheckQuorum make sure not to break this, so we now now that the leader is always RecentActive. Signed-off-by: Tobias Grieger <tobias.b.grieger@gmail.com>	2022-09-20 08:59:37 +02:00
demoManito	a9c3d56508	etcd: remove redundant type conversion Signed-off-by: demoManito <1430482733@qq.com>	2022-09-20 11:26:02 +08:00
Abirdcfly	08a9d1da07	chore: remove duplicate word in comments Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2022-08-27 13:39:48 +08:00
howz97	f9c9bfa44c	fix comment in raft.go	2022-04-02 14:27:33 +08:00
dengziming	a286f5bb99	MINOR: Fix typos(hearbeat -> heartbeat)	2021-08-07 11:41:13 +08:00
Lili Cosic	ddd390af01	raft/raft.go: Log unhandled errors	2021-06-02 11:41:26 +02:00
wpedrak	758ff0163c	raft: postpone MsgReadIndex until first commit in the term Fixes #12680	2021-03-23 12:28:42 +01:00
Piotr Tabor	87258efd90	Integration tests: Use zaptest.Logger based testing.TB Thanks to this the logs: - are automatically printed if the test fails. - are in pretty consistent format. - are annotated by 'member' information of the cluster emitting them. Side changes: - Set propert default got DefaultWarningApplyDuration (used to be '0') - Name the members based on their 'place' on the list (as opposed to 'random')	2021-03-09 18:19:51 +01:00
Tobias Grieger	73c50b869a	Merge pull request #12637 from BusyJay/check-outgoingvoters-when-restoring raft: check `VotersOutgoing` for snapshot	2021-02-16 09:43:08 +01:00
Tobias Grieger	c1e8d3a63f	Clarify documentation of probing - Add a large detailed comment about the use and necessity of both the follower and leader probing optimization - fix the log message in stepLeader that previously mixed up the log term for the rejection and the index of the append - improve the test via subtests - add some verbiage in findConflictByTerm around first index	2021-02-15 09:47:18 +01:00
qupeng	6828517965	raft: implement fast log rejection Signed-off-by: qupeng <qupeng@pingcap.com>	2021-02-10 15:48:32 -05:00
Jay Lee	f947c815d0	raft: check `VotersOutgoing` for snapshot Close #12631. Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2021-01-21 16:09:37 +08:00
Piotr Tabor	5472b3336b	Merge pull request #12525 from sakateka/remove_raft.peers raft tests: Remove Config.peers and Config.learners	2021-01-19 16:00:54 +01:00
Sergey Kacheev	ccfd00f687	raft: specify voters and learners via snapshot	2021-01-16 13:03:47 +07:00
Piotr Tabor	bf6f173d5e	Document Raft.send method. The change makes it explicit that sending messages does not happen immidietely and is subject to proper persist & then send protocol on the application side. See: https://github.com/etcd-io/etcd/issues/12589#issuecomment-752867024 for more context.	2021-01-15 12:35:58 +01:00
Piotr Tabor	e62417297d	: Rename of imports of raft (as its now a module) % find -name '.go' -o -name '.md' -o -name '.sh' \| xargs sed -i --follow-symlinks 's\|etcd/v3/raft\|etcd/raft/v3\|g'	2020-10-16 13:58:18 +02:00
Jay	26b89fd418	raft: don't campaign with pending snapshot (#12163 ) Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-26 00:04:46 -07:00
Jay	d0e4fe56a5	raft: check pending conf change before campaign (#12134 ) * raft: check conf change before campaign Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * raft: extract hup function Signed-off-by: Jay Lee <BusyJayLee@gmail.com> * raft: check pending conf change for transferleader Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-22 17:04:48 -07:00
Jay	cc656718fa	raft: correct pendingConfIndex check for AutoLeave (#12137 ) Close #12136 Signed-off-by: Jay Lee <BusyJayLee@gmail.com>	2020-07-20 16:49:22 -07:00
Zhihong Yu	7cc2f8a411	raft: break out of nested loop when id is found (#11870 ) Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-05-12 16:59:22 -07:00
Brandon Philips	96cce208c2	go.mod: use go.etcd.io/etcd/v3 versioning This change makes the etcd package compatible with the existing Go ecosystem for module versioning. Used this tool to update package imports: https://github.com/KSubedi/gomove	2020-04-28 00:57:35 +00:00
Fullstop000	7eae024ead	raft: only redirect msg produced by own node (#11466 ) Signed-off-by: Fullstop000 <fullstop1005@gmail.com>	2020-04-06 20:27:46 -07:00
qupeng	6f850a65a1	raft: cleanup read index code (#11528 ) Signed-off-by: qupeng <qupeng@pingcap.com>	2020-03-03 09:20:25 -08:00
Tobias Schottdorf	0544f33248	raft: clarify ApplyConfChange contract for rejected conf changes Apps typically maintain the raft configuration as part of the state machine. As a result, they want to be able to reject configuration change entries at apply time based on the state on which the entry is supposed to be applied. When this happens, the app should not call ApplyConfChange, but the comments did not make this clear. As a result, it was tempting to pass an empty pb.ConfChange or it's V2 version instead of not calling ApplyConfChange. However, an empty V1 or V2 proto aren't noops when the configuration is joint: an empty V1 change is treated internally as a single configuration change for NodeID zero and will cause a panic when applied in a joint state. An empty V2 proto is treated as a signal to leave a joint state, which means that the app's config and raft's would diverge. The comments updated in this commit now ask users to not call ApplyConfState when they reject a conf change. Apps that never use joint consensus can keep their old behavior since the distinction only matters when in a joint state, but we don't want to encourage that.	2020-02-25 12:45:45 +01:00
Tobias Schottdorf	37c7e4d1d8	raft: fix auto-transitioning out of joint config The code doing so was undertested and buggy: it would launch multiple attempts to transition out when the conf change was not the last element in the log. This commit fixes the problem and adds a regression test. It also reworks the code to handle a former untested edge case, in which the auto-transition append is refused. This can't happen any more with the current version of the code because this proposal has size zero and is special cased in increaseUncommittedSize. Last but not least, the auto-leave proposal now also bumps pendingConfIndex, which was not done previously due to an oversight.	2020-02-25 12:35:51 +01:00
qupeng	eaa0612e02	raft: abort leader transferring if the target is demoted (#11417 ) Signed-off-by: qupeng <qupeng@pingcap.com>	2019-12-20 12:07:52 +08:00
Wine93	5f42161750	raft: fixed some typos and simplify minor logic	2019-08-25 04:46:29 +00:00
Tobias Schottdorf	306e75a96f	raft: add a batch of interaction-driven conf change tests Verifiy the behavior in various v1 and v2 conf change operations. This also includes various fixups, notably it adds protection against transitioning in and out of new configs when this is not permissible. There are more threads to pull, but those are left for future commits.	2019-08-16 09:38:44 +02:00
Tobias Schottdorf	4e19150676	raft: proactively probe newly added followers When the leader applied a new configuration that added voters, it would not immediately probe these voters, delaying when they would be caught up. I noticed this while writing an interaction-driven test, which has now been cleaned up and completed.	2019-08-14 20:53:34 +02:00

1 2 3 4 5 ...

424 Commits