Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Wei Fu	4db8df677c	feature: add new compactor based revision count What would you like to be added? Add new compactor based revision count, instead of fixed interval time. In order to make it happen, the mvcc store needs to export `CompactNotify` function to notify the compactor that configured number of write transactions have occured since previsious compaction. The new compactor can get the revision change and delete out-of-date data in time, instead of waiting with fixed interval time. The underly bbolt db can reuse the free pages as soon as possible. Why is this needed? In the kubernetes cluster, for instance, argo workflow, there will be batch requests to create pods , and then there are also a lot of pod status's PATCH requests, especially when the pod has more than 3 containers. If the burst requests increase the db size in short time, it will be easy to exceed the max quota size. And then the cluster admin get involved to defrag, which may casue long downtime. So, we hope the ETCD can delete the out-of-date data as soon as possible and slow down the grow of total db size. Currently, both revision and periodic are based on time. It's not easy to use fixed interval time to face the unexpected burst update requests. The new compactor based on revision count can make the admin life easier. For instance, let's say that average of object size is 50 KiB. The new compactor will compact based on 10,000 revisions. It's like that ETCD can compact after new 500 MiB data in, no matter how long ETCD takes to get new 10,000 revisions. It can handle the burst update requests well. There are some test results: * Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000 ``` enchmark put --rate=100 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 570 MiB \| 208 MiB \| \| Periodic(1m) \| 232 MiB \| 165 MiB \| \| Periodic(30s) \| 151 MiB \| 127 MiB \| \| NewRevision(retension:10000) \| 195 MiB \| 187 MiB \| * Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000 ``` bnchmark put --rate=150 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 \ --delta-val-size=1024 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 718 MiB \| 554 MiB \| \| Periodic(1m) \| 297 MiB \| 246 MiB \| \| Periodic(30s) \| 185 MiB \| 146 MiB \| \| NewRevision(retension:10000) \| 186 MiB \| 178 MiB \| * Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000 ``` bnchmark put --rate=200 --total=300000 --compact-interval=0 \ --key-space-size=3000 --key-size=256 --val-size=10240 \ --delta-val-size=4096 ``` \| Compactor \| DB Total Size \| DB InUse Size \| \| -- \| -- \| -- \| \| Revision(5min,retension:10000) \| 874 MiB \| 221 MiB \| \| Periodic(1m) \| 357 MiB \| 260 MiB \| \| Periodic(30s) \| 215 MiB \| 151 MiB \| \| NewRevision(retension:10000) \| 182 MiB \| 176 MiB \| For the burst requests, we needs to use short periodic interval. Otherwise, the total size will be large. I think the new compactor can handle it well. Additional Change: Currently, the quota system only checks DB total size. However, there could be a lot of free pages which can be reused to upcoming requests. Based on this proposal, I also want to extend current quota system with DB's InUse size. If the InUse size is less than max quota size, we should allow requests to update. Since the bbolt might be resized if there is no available continuous pages, we should setup a hard limit for the overflow, like 1 GiB. ```diff // Quota represents an arbitrary quota against arbitrary requests. Each request @@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool { return true } // TODO: maybe optimize Backend.Size() - return b.be.Size()+int64(cost) < b.maxBackendBytes + + // Since the compact comes with allocatable pages, we should check the + // SizeInUse first. If there is no continuous pages for key/value and + // the boltdb continues to resize, it should not increase more than 1 + // GiB. It's hard limitation. + // + // TODO: It should be enabled by flag. + if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) { + return false + } + return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes } ``` And it's likely to disable NOSPACE alarm if the compact can get much more free pages. It can reduce downtime. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-08-16 23:35:08 +08:00
Chao Chen	e6c8bf82e0	add uds test cases into e2e TestAuthority Signed-off-by: Chao Chen <chaochn@amazon.com>	2023-06-08 15:54:30 -07:00
AngstyDuck	a7344da7d3	server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined Signed-off-by: AngstyDuck <solsticedante@gmail.com>	2023-05-09 23:10:44 +08:00
Marek Siarkowicz	549087cd69	server: Fix defer function closure escape Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-30 13:37:31 +02:00
Marek Siarkowicz	bf12179a5a	server: Add --listen-client-http-urls flag to allow running grpc server separate from http server Difference in load configuration for watch delay tests show how huge the impact is. Even with random write scheduler grpc under http server can only handle 500 KB with 2 seconds delay. On the other hand, separate grpc server easily hits 10, 100 or even 1000 MB within 100 miliseconds. Priority write scheduler that was used in most previous releases is far worse than random one. Tests configured to only 5 MB to avoid flakes and taking too long to fill etcd. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-30 09:49:45 +02:00
Marek Siarkowicz	419a56e51a	server: Pick one address that all grpc gateways connect to Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-30 09:49:45 +02:00
Marek Siarkowicz	d1f674d624	server: Extract resolveUrl helper function Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-30 09:49:43 +02:00
Marek Siarkowicz	85c48c4a60	server: Separate client listener grouping from serving Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-30 09:48:46 +02:00
Marek Siarkowicz	372042c374	refactor: Use proper variable names for urls Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2023-03-13 14:48:01 +01:00
Tero Saarni	588b98d085	Add TLSv1.3 support. Added optional TLS min/max protocol version and command line switches to set versions for the etcd server. If max version is not explicitly set by the user, let Go select the max version which is currently TLSv1.3. Previously max version was set to TLSv1.2. Signed-off-by: Tero Saarni <tero.saarni@est.tech>	2023-01-30 16:16:53 +02:00
Bogdan Kanivets	7e8ebf7727	server: added duplicate warning-unary-request-duration flag --warning-unary-request-duration is a duplicate of --experimental-warning-unary-request-duration experimental-warning-unary-request-duration will be removed in v3.7. fixes https://github.com/etcd-io/etcd/issues/13783 Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>	2022-11-18 18:06:00 +08:00
Benjamin Wang	b48641e5f2	Merge pull request #14348 from VinozzZ/add-integration-test-for-tracing embed: add integration test for distributed tracing	2022-10-13 02:11:59 +08:00
Benjamin Wang	5746d6eb86	etcdserver: added more debug log for the purgeFile goroutine Signed-off-by: Benjamin Wang <wachao@vmware.com>	2022-10-12 17:32:33 +08:00
demoManito	72cf0cc04a	etcd: modify declaring empty slices declare an empty slice to var s []int replace s :=[]int{}, https://github.com/golang/go/wiki/CodeReviewComments#declaring-empty-slices Signed-off-by: demoManito <1430482733@qq.com>	2022-09-16 14:41:14 +08:00
Benjamin Wang	74506738b8	Refactor the keepAliveListener and keepAliveConn Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod` by default, so if you want to warp multiple layers of net.Listener, the `keepaliveListener` should be the one which is closest to the original `net.Listener` implementation, namely `TCPListener`. Signed-off-by: Benjamin Wang <wachao@vmware.com>	2022-08-18 04:24:05 +08:00
Yingrong Zhao	ea2f299ba0	embed: add integration test for distributed tracing To verify distributed tracing feature is correctly setup, this PR adds an integration test for this feature. In the process of writing the test, I discovered a goroutine leak due to the TraceProvider not being closed. This PR fixs this issue as well. Signed-off-by: Yingrong Zhao <yingrong.zhao@gmail.com>	2022-08-15 11:19:10 -04:00
Marek Siarkowicz	d44bbff278	server: Make corrtuption check optional and period configurable Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2022-07-26 09:31:15 +02:00
Marek Siarkowicz	c58ec9fe13	server: Refactor compaction checker Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2022-07-25 13:59:30 +02:00
Benjamin Wang	053ba95ed5	set max concurrent streams to the http2 server The default max stream is 250 in http2. When there are more then 250 streams, the client side may be blocked until some previous streams are released. So we need to support configuring a larger `MaxConcurrentStreams`. Signed-off-by: Benjamin Wang <wachao@vmware.com>	2022-07-06 03:43:46 +08:00
杨金珏	6220174687	support custom `grpc.MaxConcurrentStreams` There is no update on the original PR (see below) for more then 2 weeks. So Benjamin(@ahrtr) continues to work on the PR. The first step is to rebase the PR, because there are lots of conflicts with the main branch. The change to go.mod and go.sum reverted, because they are not needed. The e2e test cases are also reverted, because they are not correct. ``` https://github.com/etcd-io/etcd/pull/14081 ``` Signed-off-by: nic-chen <chenjunxu6@gmail.com> Signed-off-by: Benjamin Wang <wachao@vmware.com>	2022-07-06 03:43:46 +08:00
Piotr Tabor	a1fe0b8ea3	Merge pull request #14116 from ptabor/20220613-embed-errors Embed server should log errors (and not get stuck)	2022-06-15 22:37:40 +02:00
Piotr Tabor	fcc8fce4d2	Expand logging in case of embed server not being able to successfully start. So far the errors were directed to Etcd.Errc (channel) that is not being consumed in practice. Signed-off-by: Piotr Tabor <ptab@google.com>	2022-06-15 13:50:17 +02:00
Marek Siarkowicz	7c35dadc25	server: Extract corruption detection to dedicated struct Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>	2022-06-13 18:19:24 +02:00
Piotr Tabor	651de5a057	Rename EtcdServer.Id with EtcdServer.MemberId. It was misleading and error prone vs. ClusterId.	2022-05-20 14:32:04 +02:00
Marek Siarkowicz	26f42e7a9e	server: Apply review comments and split basic handler	2022-05-05 09:52:14 +02:00
Marek Siarkowicz	722ec487df	server: Split metrics and health code	2022-05-05 09:52:14 +02:00
Marek Siarkowicz	600ee13ac0	server: Cover V3 health with tests	2022-05-05 09:52:14 +02:00
ahrtr	3dcbbf62d9	Move clientconfig into clientv3 so that it can be reused by both etcdctl and v3 discovery	2022-03-12 06:38:41 +08:00
Piotr Tabor	088807c08e	Merge pull request #13565 from ahrtr/remove_peer_serve_client_requests Updated servePeers to remove the grpc server	2022-03-01 16:24:42 +01:00
ahrtr	2f36e0c62b	Change discovery url to endpoints Currently the discovery url is just one endpoint. But actually it should be the same as the etcdctl, which means that it should be a list of endpoints. When one endpoint is down, the clientv3 can fail over to the next endpoint automatically.	2022-02-24 09:11:41 +08:00
ahrtr	ebc86d12c0	support v3 discovery to bootstrap a new etcd cluster	2022-02-21 23:22:49 +08:00
ahrtr	a879ccf152	updated servePeers to remvoe the grpc server	2022-01-27 16:22:01 +08:00
ahrtr	1713dc67b5	etcd server shouldn't wait for the ready notification infinitely on startup	2022-01-27 16:19:20 +08:00
Marek Siarkowicz	ee5ef42c5c	server: --enable-v2 and --enable-v2v3 is decomissioned	2022-01-14 13:19:30 +01:00
Marek Siarkowicz	7d10899d7f	server: Require either cluster version v3.6 or --experimental-enable-lease-checkpoint-persist to persist lease remainingTTL To avoid inconsistant behavior during cluster upgrade we are feature gating persistance behind cluster version. This should ensure that all cluster members are upgraded to v3.6 before changing behavior. To allow backporting this fix to v3.5 we are also introducing flag --experimental-enable-lease-checkpoint-persist that will allow for smooth upgrade in v3.5 clusters with this feature enabled.	2021-12-02 12:26:47 +01:00
Sam Batschelet	63a1cc3fe4	add --experimental-max-learner flag Signed-off-by: Sam Batschelet <sbatsche@redhat.com>	2021-11-09 09:52:00 -05:00
Eng Zer Jun	2a151c8982	*: move from io/ioutil to io and os packages The io/ioutil package has been deprecated as of Go 1.16, see https://golang.org/doc/go1.16#ioutil. This commit replaces the existing io/ioutil functions with their new definitions in io and os packages. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2021-10-28 00:05:28 +08:00
Marek Siarkowicz	90932324b1	client: Add grpc authority header integration tests	2021-09-29 12:42:16 +02:00
Sam Batschelet	a4a82cc982	Merge pull request #13248 from lilic/add-sampling-rate server: Add sampling rate to distributed tracing	2021-08-30 08:31:00 -04:00
Lili Cosic	810f489017	server: Add sampling rate to distributed tracing ExperimentalDistributedTracingSamplingRatePerMillion is the number of samples to collect per million spans. Defaults to 0.	2021-08-30 13:55:35 +02:00
Marek Siarkowicz	83a325ac46	server: Move all functions needed for storage bootstrap to storage package This is prerequestite to move storage bootstrap, splitted to separate PR to make it easier to review.	2021-08-03 13:09:15 +02:00
Alexey Roytman	2a26f7ae4c	etcdserver: configure "expensive" requests duration When a unary request takes more than predefined duration, this request is defined as "expensive" and a warning is printed. The expensive request duration is hard-coded to 300 ms. It can be not enough for example for transactions with a lot of operations. The warnings just blow up the log files and reduce throughput. This fix allows user to configure the "expensive" request duration. Signed-off-by: Alexey Roytman <roytman@il.ibm.com>	2021-07-27 08:33:44 +03:00
AlexStocks	184b0e5d49	add sleep interval	2021-05-24 16:22:00 +08:00
Piotr Tabor	85341e08f2	Merge pull request #12968 from serathius/logger-simplify server: Simplify passing logger setup by passing only logger	2021-05-15 15:58:00 +02:00
Marek Siarkowicz	41ed74824e	server: Simplify passing logger setup by passing only logger	2021-05-14 13:14:48 +02:00
Piotr Tabor	ead81df948	Disallow -v2-deprecation>'not-yet' combined with --enable-v2	2021-05-12 18:09:34 +02:00
Piotr Tabor	e0a8484c8f	Merge pull request #12941 from serathius/defrag etcdserver: Implement running defrag if freeable space will exceed provided threshold (on boot)	2021-05-12 09:26:56 +02:00
Marek Siarkowicz	efc8505739	etcdserver: Implement running defrag if freeable space will exceed privided threshold	2021-05-11 14:00:29 +02:00
Piotr Tabor	269f22c837	Deprecate V2 API: --enable-v2 and v2v3 Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5, in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7. Deprecation plan in: https://github.com/etcd-io/etcd/issues/12913	2021-05-10 16:19:52 +02:00
Lili Cosic	1a718a958e	Add initial Tracing with OpenTelemetry	2021-05-10 10:44:40 +02:00

1 2

68 Commits