Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Fu Wei	d63ca43092	Merge 4db8df677c618b462145fce7cb926c072a0ce932 into c86c93ca2951338115159dcdd20711603044e1f1	2024-09-25 21:36:55 -07:00
redwrasse	d4df7a902e	Replaces a number of error equality checks with errors.Is Signed-off-by: redwrasse <mail@redwrasse.io>	2024-09-03 16:02:24 -07:00
Benjamin Wang	b8b0cf83d1	Skip leadership check if the etcd instance is active processing heartbeat Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>	2024-08-09 17:02:02 +01:00
Clement	d820cd2b56	etcdserver: change the snapshot + compact into sync operation Signed-off-by: Clement <gh.2lgqz@aleeas.com>	2024-07-05 01:27:30 +08:00
Baek	60e3f45469	Adds all feature_gate from component-base. We'll likely use most of the feature_gate package from component-base. Also this commit moves the pkg from server/internal/pkg to pkg/. Signed-off-by: Baek <seungtackbaek@google.com>	2024-06-15 05:34:58 +00:00
Baek	69ebaaebca	featuregate: adds EtcdServer.FeatureEnabled interface. The interface can be used throughout the etcd server binary to check if the feature is enabled or not. Note that this commit also copies necessary FeatureGate interface from k8s component-base. Signed-off-by: Baek <seungtackbaek@google.com>	2024-06-15 05:34:58 +00:00
Max Neverov	c64c996c03	Revert quorum calculation: `(active - 1) < 1+((len(m)-1)/2)` calculates quorum after a member is deleted. Signed-off-by: Max Neverov <neverov.max@gmail.com>	2024-04-17 07:55:24 +02:00
Max Neverov	3b16aae947	Fix remove member failed. Signed-off-by: Max Neverov <neverov.max@gmail.com>	2024-04-17 07:55:24 +02:00
Ivan Valdes	14523bdc21	etcdserver: rename MemberId() to MemberID() to address var-naming Signed-off-by: Ivan Valdes <ivan@vald.es>	2024-03-18 17:18:29 -07:00
Ivan Valdes	c613b78e6c	etcdserver: address golangci var-naming issues Signed-off-by: Ivan Valdes <ivan@vald.es>	2024-03-18 17:17:07 -07:00
Siyuan Zhang	3565a822de	Add VerifyTxConsistency to backend. Signed-off-by: Siyuan Zhang <sizhang@google.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com> Update server/storage/backend/verify.go Co-authored-by: Benjamin Wang <benjamin.wang@broadcom.com>	2024-02-22 11:31:16 -08:00
Ishan Tyagi	16a5e1da71	Added a error log when learner is not sync with etcd leader. Signed-off-by: ishan16696 <ishan.tyagi@sap.com>	2024-01-30 15:42:11 +05:30
YaoC	f7ab7adf29	server: fix learner metric incorrect issue Signed-off-by: YaoC <chengyao09@hotmail.com>	2024-01-12 09:36:33 +00:00
Marek Siarkowicz	a2eb17c809	Merge pull request #17199 from serathius/dont-flock Don't flock snapshot files	2024-01-08 15:03:29 +01:00
Marek Siarkowicz	3471ef133d	Add an e2e test and robustness failpoint around recovering from snapshot backend Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2024-01-04 15:25:24 +01:00
Marek Siarkowicz	7f8346b3f2	Don't flock snapshot files Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2024-01-04 14:53:44 +01:00
Marek Siarkowicz	1e8d66ef95	Add beforeOpenSnapshotBackend failpoint Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-12-20 15:36:54 +01:00
Benjamin Wang	67f17166bf	Safeguard lease operations by double checking the leadership 1. ignore old leader's leases revoking request 2. double check current member's leadership before perform lease renew request 3. etcdserve: ensure current member's leadership before performing lease checkpoint request Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>	2023-12-15 17:53:36 +00:00
Benjamin Wang	36b2523669	added some log messages for better diagnosis Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>	2023-12-13 18:43:22 +00:00
Neil Shen	fb769c4306	server: ignore raft messages if member id mismatch Ignore Raft messages when the `To` field mismatches the local member ID. In cases where incorrect Raft messages are dispatched, potentially due to a malfunctioning switch, this proactive check prevents panics, such as "tocommit is out of range". Signed-off-by: Neil Shen <overvenus@gmail.com>	2023-12-07 11:57:45 +08:00
Marek Siarkowicz	bc697bc26e	Revert "Switch to validating v3 when v2 and v3 are synchronized" This reverts commit 4fe46f92030e4381e6f9bf95adbb22a08282d297. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-12-03 18:12:09 +01:00
Marek Siarkowicz	03d551243b	Merge pull request #17015 from serathius/extract-membership-applier Extract membership applier	2023-11-27 19:59:21 +01:00
Marek Siarkowicz	4fe46f9203	Switch to validating v3 when v2 and v3 are synchronized Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-24 17:46:33 +01:00
Marek Siarkowicz	2ad21558ac	Remove shouldApplyV3 from the v3 applier Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-24 16:13:25 +01:00
Marek Siarkowicz	d22c00ccee	Extract membership applier Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-24 15:57:15 +01:00
Marek Siarkowicz	7fdb33065d	Move duplicated shouldApplyV3 logic up into apply method Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-24 10:21:14 +01:00
Marek Siarkowicz	093666f450	Cleanup v2 applier Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-23 15:41:13 +01:00
Marek Siarkowicz	c72ff1e69c	Remove syncing the v2 store TTLs Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-23 14:55:01 +01:00
Marek Siarkowicz	dd7a4d28a8	Remove code used to make v2 proposals Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-19 22:39:33 +01:00
Marek Siarkowicz	b4fd31f254	Remove code for setting cluster version via V2 API Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-11-19 15:28:52 +01:00
Chao Chen	1324f03254	add existing http health check handler e2e test Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-10-18 12:42:23 -07:00
Benjamin Wang	628b45c099	test: add a test case to verify consistent memberlist on bootstrap Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-09-28 20:04:47 +01:00
Wei Fu	aa97484166	*: enable goimports in verify-lint Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-09-21 21:14:09 +08:00
chenyahui	c0aa3b613b	Use any instead of interface{} Signed-off-by: chenyahui <cyhone@qq.com>	2023-09-17 17:41:58 +08:00
Geeta Gharpure	8729417cee	Preserve the order of steps done for snapshot Signed-off-by: Geeta Gharpure <geetagh@amazon.com>	2023-08-22 19:12:37 +00:00
Geeta Gharpure	59332dc194	Update to generate v2 snapshot from v3 state Signed-off-by: Geeta Gharpure <geetagh@amazon.com>	2023-08-21 19:18:11 +00:00
Jes Cok	52748f60f3	all: stop using math/rand.Seed Fixes #16428. Signed-off-by: Jes Cok <xigua67damn@gmail.com>	2023-08-20 16:34:44 +08:00
Wei Fu	4db8df677c	feature: add new compactor based revision count What would you like to be added? Add new compactor based revision count, instead of fixed interval time. In order to make it happen, the mvcc store needs to export `CompactNotify` function to notify the compactor that configured number of write transactions have occured since previsious compaction. The new compactor can get the revision change and delete out-of-date data in time, instead of waiting with fixed interval time. The underly bbolt db can reuse the free pages as soon as possible. Why is this needed? In the kubernetes cluster, for instance, argo workflow, there will be batch requests to create pods , and then there are also a lot of pod status's PATCH requests, especially when the pod has more than 3 containers. If the burst requests increase the db size in short time, it will be easy to exceed the max quota size. And then the cluster admin get involved to defrag, which may casue long downtime. So, we hope the ETCD can delete the out-of-date data as soon as possible and slow down the grow of total db size. Currently, both revision and periodic are based on time. It's not easy to use fixed interval time to face the unexpected burst update requests. The new compactor based on revision count can make the admin life easier. For instance, let's say that average of object size is 50 KiB. The new compactor will compact based on 10,000 revisions. It's like that ETCD can compact after new 500 MiB data in, no matter how long ETCD takes to get new 10,000 revisions. It can handle the burst update requests well. There are some test results: * Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000 ``` enchmark put --rate=100 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 570 MiB \| 208 MiB \| \| Periodic(1m) \| 232 MiB \| 165 MiB \| \| Periodic(30s) \| 151 MiB \| 127 MiB \| \| NewRevision(retension:10000) \| 195 MiB \| 187 MiB \| * Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000 ``` bnchmark put --rate=150 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 \ --delta-val-size=1024 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 718 MiB \| 554 MiB \| \| Periodic(1m) \| 297 MiB \| 246 MiB \| \| Periodic(30s) \| 185 MiB \| 146 MiB \| \| NewRevision(retension:10000) \| 186 MiB \| 178 MiB \| * Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000 ``` bnchmark put --rate=200 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 \ --delta-val-size=4096 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 874 MiB \| 221 MiB \| \| Periodic(1m) \| 357 MiB \| 260 MiB \| \| Periodic(30s) \| 215 MiB \| 151 MiB \| \| NewRevision(retension:10000) \| 182 MiB \| 176 MiB \| For the burst requests, we needs to use short periodic interval. Otherwise, the total size will be large. I think the new compactor can handle it well. Additional Change: Currently, the quota system only checks DB total size. However, there could be a lot of free pages which can be reused to upcoming requests. Based on this proposal, I also want to extend current quota system with DB's InUse size. If the InUse size is less than max quota size, we should allow requests to update. Since the bbolt might be resized if there is no available continuous pages, we should setup a hard limit for the overflow, like 1 GiB. ```diff // Quota represents an arbitrary quota against arbitrary requests. Each request @@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool { return true } // TODO: maybe optimize Backend.Size() - return b.be.Size()+int64(cost) < b.maxBackendBytes + + // Since the compact comes with allocatable pages, we should check the + // SizeInUse first. If there is no continuous pages for key/value and + // the boltdb continues to resize, it should not increase more than 1 + // GiB. It's hard limitation. + // + // TODO: It should be enabled by flag. + if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) { + return false + } + return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes } ``` And it's likely to disable NOSPACE alarm if the compact can get much more free pages. It can reduce downtime. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-08-16 23:35:08 +08:00
Chao Chen	6cdc9ae4fe	server/etcdserver/raft.go: 1. rename confChangeCh to raftAdvancedC 2. rename waitApply to confChanged 3. add comments and test assertion Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-06-26 22:42:44 -07:00
Benjamin Wang	ad3b6ee4c6	etcdserver: wait for raft is notified on confChange before responding to client Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-06-26 13:40:51 -07:00
Geeta Gharpure	550aa152a7	Verify consistent index is latest at the time of snapshot Signed-off-by: Geeta Gharpure <geetagh@amazon.com>	2023-06-19 16:00:04 +00:00
Chao Chen	f31d0eafb9	tests/e2e: add graceful shutdown test Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-05-09 17:08:53 -07:00
Chao Chen	caed563e08	fix flaking auth member remove test Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-04-03 17:41:08 -07:00
Wei Fu	22bdc91302	server/etcdserver: add log for terminating monitors Adding log for terminating monitors is to make the debug easier. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-03-11 15:07:17 +08:00
James Blair	275e10bcf7	Return default snapshot count to 10,000. The huge (100k+) value was justified when storev2 was being dumped completely with every snapshot. With storev2 being decomissioned we can checkpoint more frequently for faster recovery. Signed-off-by: James Blair <mail@jamesblair.net>	2023-03-06 20:21:03 +13:00
guozhao	de8d6b3792	etcdserver: use time.Ticker instead of time.After Using time.After will create a new Timer in each cycle, In these cases , it is better to use time.Ticker. Signed-off-by: guozhao <guozhao@360.cn>	2023-01-17 16:58:13 +08:00
Benjamin Wang	8ed20e85d2	etcdserver: return membership.ErrIDNotFound when the memberID not found When promoting a learner, we need to wait until the leader's applied ID catches up to the commitId. Afterwards, check whether the learner ID exist or not, and return `membership.ErrIDNotFound` directly in the API if the member ID not found, to avoid the request being unnecessarily delivered to raft. Signed-off-by: Benjamin Wang <wachao@vmware.com>	2023-01-17 06:18:15 +08:00
Piotr Tabor	6f899a7b40	Merge pull request #15052 from ptabor/20221228-goimports-fix ./scripts/fix.sh: Takes care of goimports across the whole project.	2022-12-29 11:31:22 +01:00
Piotr Tabor	9e1abbab6e	Fix goimports in all existing files. Execution of ./scripts/fix.sh Signed-off-by: Piotr Tabor <ptab@google.com>	2022-12-29 09:41:31 +01:00
KiloG	101a2a61ea	etcdserver: fix typo in comment etcdserver: fix typo in comment	2022-12-28 18:41:08 +08:00

1 2 3 4

160 Commits