mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Go to file

Wei Fu 4db8df677c feature: add new compactor based revision count

What would you like to be added?

Add new compactor based revision count, instead of fixed interval time.

In order to make it happen, the mvcc store needs to export
`CompactNotify` function to notify the compactor that configured number of
write transactions have occured since previsious compaction. The
new compactor can get the revision change and delete out-of-date data in time,
instead of waiting with fixed interval time. The underly bbolt db can
reuse the free pages as soon as possible.

Why is this needed?

In the kubernetes cluster, for instance, argo workflow, there will be batch
requests to create pods , and then there are also a lot of pod status's PATCH
requests, especially when the pod has more than 3 containers. If the burst
requests increase the db size in short time, it will be easy to exceed the max
quota size. And then the cluster admin get involved to defrag, which may casue
long downtime. So, we hope the ETCD can delete the out-of-date data as
soon as possible and slow down the grow of total db size.

Currently, both revision and periodic are based on time. It's not easy
to use fixed interval time to face the unexpected burst update requests.
The new compactor based on revision count can make the admin life easier.
For instance, let's say that average of object size is 50 KiB. The new
compactor will compact based on 10,000 revisions. It's like that ETCD can
compact after new 500 MiB data in, no matter how long ETCD takes to get
new 10,000 revisions. It can handle the burst update requests well.

There are some test results:

* Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000

```
enchmark put --rate=100 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 570 MiB       |       208 MiB |
|                   Periodic(1m) | 232 MiB       |       165 MiB |
|                  Periodic(30s) | 151 MiB       |       127 MiB |
|   NewRevision(retension:10000) | 195 MiB       |       187 MiB |

* Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000

```
bnchmark put --rate=150 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=1024
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 718 MiB       |       554 MiB |
|                   Periodic(1m) | 297 MiB       |       246 MiB |
|                  Periodic(30s) | 185 MiB       |       146 MiB |
|   NewRevision(retension:10000) | 186 MiB       |       178 MiB |

* Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000

```
bnchmark put --rate=200 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=4096
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 874 MiB       |       221 MiB |
|                   Periodic(1m) | 357 MiB       |       260 MiB |
|                  Periodic(30s) | 215 MiB       |       151 MiB |
|   NewRevision(retension:10000) | 182 MiB       |       176 MiB |

For the burst requests, we needs to use short periodic interval.
Otherwise, the total size will be large. I think the new compactor can
handle it well.

Additional Change:

Currently, the quota system only checks DB total size. However, there
could be a lot of free pages which can be reused to upcoming requests.
Based on this proposal, I also want to extend current quota system with DB's
InUse size.

If the InUse size is less than max quota size, we should allow requests to
update. Since the bbolt might be resized if there is no available
continuous pages, we should setup a hard limit for the overflow, like 1
GiB.

```diff
 // Quota represents an arbitrary quota against arbitrary requests. Each request
@@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool {
                return true
        }
        // TODO: maybe optimize Backend.Size()
-       return b.be.Size()+int64(cost) < b.maxBackendBytes
+
+       // Since the compact comes with allocatable pages, we should check the
+       // SizeInUse first. If there is no continuous pages for key/value and
+       // the boltdb continues to resize, it should not increase more than 1
+       // GiB. It's hard limitation.
+       //
+       // TODO: It should be enabled by flag.
+       if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) {
+               return false
+       }
+       return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes
 }
```

And it's likely to disable NOSPACE alarm if the compact can get much
more free pages. It can reduce downtime.

Signed-off-by: Wei Fu <fuweid89@gmail.com>

2023-08-16 23:35:08 +08:00

.devcontainer

Add initial .devcontainer for go.19.

2023-03-03 07:03:25 +13:00

.github

build(deps): bump github/codeql-action from 2.20.4 to 2.21.0

2023-07-24 17:56:26 +00:00

api

dependency: bump golang.org/x/net from v0.11.0 to v0.12.0

2023-07-10 18:43:30 +01:00

CHANGELOG

Add changelog for backport 13577 to 3.4&3.5.

2023-07-17 13:39:15 +08:00

client

Merge pull request #16224 from CaojiamingAlan/expose_isOptsWithFromKey_and_isOptsWithPrefix

2023-07-15 18:35:39 +01:00

contrib

Update golangci-lint from 1.49.0 to 1.53.3

2023-07-14 16:46:26 -03:00

Documentation

Documentation: add a roadmap

2023-07-25 15:54:22 +01:00

etcdctl

Bump github.com/mattn/go-runewidth to v0.0.15

2023-07-26 13:02:43 +05:30

etcdutl

Bump github.com/mattn/go-runewidth to v0.0.15

2023-07-26 13:02:43 +05:30

hack

ci: Introduce yamllint for actions workflow files

2023-05-26 16:34:23 +08:00

logos

Incorporate xkcd dependency meme into README

2023-04-25 14:49:50 +02:00

pkg

pkg/ioutil: deflake TestPageWriterRandom

2023-07-18 23:18:01 +08:00

scripts

Merge pull request #16230 from jmhbnz/align-arm64-commands

2023-07-18 12:04:33 +01:00

security

Update PSC members

2022-12-07 17:59:19 -05:00

server

feature: add new compactor based revision count

2023-08-16 23:35:08 +08:00

tests

Bump github.com/mattn/go-runewidth to v0.0.15

2023-07-26 13:02:43 +05:30

tools

feature: add new compactor based revision count

2023-08-16 23:35:08 +08:00

.gitignore

etcd-dump-logs: Fix order of imports..

2022-12-30 09:22:39 +01:00

.go-version

bump go version to 1.19.11

2023-07-12 16:47:49 +01:00

.header

.header: update to 'etcd Authors'

2016-05-12 20:56:50 -07:00

ADOPTERS.md

doc: move production users to a standard ADOPTERS file

2019-10-17 18:36:28 -04:00

bill-of-materials.json

Enable failpoint by default in integration tests

2023-06-21 23:13:46 -07:00

bill-of-materials.override.json

etcd: Replace ghodss/yaml with sigs.k8s.io/yaml

2019-05-02 12:34:36 +05:30

code-of-conduct.md

*: update project code of conduct

2019-09-18 14:09:18 -04:00

codecov.yml

ci: Introduce yamllint for actions workflow files

2023-05-26 16:34:23 +08:00

CONTRIBUTING.md

Move to community membership model closer to kubernetes one

2023-04-03 16:42:01 +02:00

DCO

docs(readme/contrib): clean up README, merge changes from CONTRIBUTING.md and split out DCO

2014-04-04 10:58:34 -07:00

Dockerfile

Consolidate etcd dockerfiles.

2023-04-26 17:09:25 +12:00

dummy.go

Split etcdctl into etcdctl (public API access) & etcdutl (direct surgery on files)

2021-05-17 11:54:03 +02:00

etcd.conf.yml.sample

Add example format for initial-cluster.

2023-05-23 21:15:38 +12:00

go.mod

Bump github.com/mattn/go-runewidth to v0.0.15

2023-07-26 13:02:43 +05:30

go.sum

Bump github.com/mattn/go-runewidth to v0.0.15

2023-07-26 13:02:43 +05:30

GOVERNANCE.md

Move to community membership model closer to kubernetes one

2023-04-03 16:42:01 +02:00

LICENSE

License project under the Apache License Version 2.0

2013-07-31 17:39:03 +00:00

MAINTAINERS

Add jmhbnz as etcd reviewer.

2023-07-13 21:46:04 +12:00

Makefile

Ensure goimports can be fixed individually.

2023-06-16 21:47:46 +12:00

Procfile

remove Procfile.v2 and Procfile.learner

2023-06-09 18:37:09 +05:30

README.md

Documentation: add a roadmap

2023-07-25 15:54:22 +01:00

README.md

etcd

Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases.

etcd is a distributed reliable key-value store for the most critical data of a distributed system, with a focus on being:

Simple: well-defined, user-facing API (gRPC)
Secure: automatic TLS with optional client cert authentication
Fast: benchmarked 10,000 writes/sec
Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.

etcd is used in production by many companies, and the development team stands behind it in critical deployment scenarios, where etcd is frequently teamed with applications such as Kubernetes, locksmith, vulcand, Doorman, and many others. Reliability is further ensured by rigorous robustness testing.

See etcdctl for a simple command line client.

_{Original image credited to xkcd.com/2347, alterations by Josh Berkus.}

Maintainers

MAINTAINERS strive to shape an inclusive open source project culture where users are heard and contributors feel respected and empowered. MAINTAINERS maintain productive relationships across different companies and disciplines. Read more about MAINTAINERS role and responsibilities.

Getting started

Getting etcd

The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, and Docker on the release page.

For more installation guides, please check out play.etcd.io and operating etcd.

Running etcd

First start a single-member cluster of etcd.

If etcd is installed using the pre-built release binaries, run it from the installation location as below:

/tmp/etcd-download-test/etcd

The etcd command can be simply run as such if it is moved to the system path as below:

mv /tmp/etcd-download-test/etcd /usr/local/bin/
etcd

This will bring up etcd listening on port 2379 for client communication and on port 2380 for server-to-server communication.

Next, let's set a single key, and then retrieve it:

etcdctl put mykey "this is awesome"
etcdctl get mykey

etcd is now running and serving client requests. For more, please check out:

etcd TCP ports

The official etcd ports are 2379 for client requests, and 2380 for peer communication.

Running a local etcd cluster

First install goreman, which manages Procfile-based applications.

Our Procfile script will set up a local example cluster. Start it with:

goreman start

This will bring up 3 etcd members infra1, infra2 and infra3 and optionally etcd grpc-proxy, which runs locally and composes a cluster.

Every cluster member and proxy accepts key value reads and key value writes.

Follow the comments in Procfile script to add a learner node to the cluster.

Install etcd client v3

go get go.etcd.io/etcd/client/v3

Next steps

Now it's time to dig into the full etcd API and other guides.

Read the full documentation.
Review etcd frequently asked questions.
Explore the full gRPC API.
Set up a multi-machine cluster.
Learn the config format, env variables and flags.
Find language bindings and tools.
Use TLS to secure an etcd cluster.
Tune etcd.

Contact

Email: etcd-dev
Slack: #etcd channel on Kubernetes (get an invite)
Community meetings

Community meetings

etcd contributors and maintainers meet every two weeks at 11:00 AM (USA Pacific) on Thursday.

An initial agenda will be posted to the shared Google docs a day before each meeting, and everyone is welcome to suggest additional topics or other agendas.

Meeting recordings are uploaded to official etcd YouTube channel.

Get calendar invitation by joining etcd-dev mailing group.

Join Hangouts Meet: meet.google.com/umg-nrxn-qvs

Join by phone: +1 405-792-0633‬ PIN: ‪299 906‬#

Contributing

See CONTRIBUTING for details on setting up your development environment, submitting patches and the contribution workflow.

Please refer to community-membership.md for information on becoming an etcd project member. We welcome and look forward to your contributions to the project!

Please also refer to roadmap to get more details on the priorities for the next few major or minor releases.

Reporting bugs

See reporting bugs for details about reporting any issues. Before opening an issue please check it is not covered in our frequently asked questions.

Reporting a security vulnerability

See security disclosure and release process for details on how to report a security vulnerability and how the etcd team manages it.

Issue and PR management

See issue triage guidelines for details on how issues are managed.

See PR management for guidelines on how pull requests are managed.

etcd Emeritus Maintainers

These emeritus maintainers dedicated a part of their career to etcd and reviewed code, triaged bugs and pushed the project forward over a substantial period of time. Their contribution is greatly appreciated.

Fanmin Shi
Anthony Romano
Brandon Philips
Joe Betz
Gyuho Lee
Jingyi Hu
Xiang Li
Ben Darnell
Sam Batschelet

License

etcd is under the Apache 2.0 license. See the LICENSE file for details.

Languages

Go 96.5%

Shell 2%

Jsonnet 1.1%

Makefile 0.3%

Procfile 0.1%

README.md Unescape Escape

etcd

Maintainers

Getting started

Getting etcd

Running etcd

etcd TCP ports

Running a local etcd cluster

Install etcd client v3

Next steps

Contact

Community meetings

Contributing

Reporting bugs

Reporting a security vulnerability

Issue and PR management

etcd Emeritus Maintainers

License

README.md