1312 Commits

Author SHA1 Message Date
Wei Fu
4db8df677c feature: add new compactor based revision count
What would you like to be added?

Add new compactor based revision count, instead of fixed interval time.

In order to make it happen, the mvcc store needs to export
`CompactNotify` function to notify the compactor that configured number of
write transactions have occured since previsious compaction. The
new compactor can get the revision change and delete out-of-date data in time,
instead of waiting with fixed interval time. The underly bbolt db can
reuse the free pages as soon as possible.

Why is this needed?

In the kubernetes cluster, for instance, argo workflow, there will be batch
requests to create pods , and then there are also a lot of pod status's PATCH
requests, especially when the pod has more than 3 containers. If the burst
requests increase the db size in short time, it will be easy to exceed the max
quota size. And then the cluster admin get involved to defrag, which may casue
long downtime. So, we hope the ETCD can delete the out-of-date data as
soon as possible and slow down the grow of total db size.

Currently, both revision and periodic are based on time. It's not easy
to use fixed interval time to face the unexpected burst update requests.
The new compactor based on revision count can make the admin life easier.
For instance, let's say that average of object size is 50 KiB. The new
compactor will compact based on 10,000 revisions. It's like that ETCD can
compact after new 500 MiB data in, no matter how long ETCD takes to get
new 10,000 revisions. It can handle the burst update requests well.

There are some test results:

* Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000

```
enchmark put --rate=100 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 570 MiB       |       208 MiB |
|                   Periodic(1m) | 232 MiB       |       165 MiB |
|                  Periodic(30s) | 151 MiB       |       127 MiB |
|   NewRevision(retension:10000) | 195 MiB       |       187 MiB |

* Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000

```
bnchmark put --rate=150 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=1024
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 718 MiB       |       554 MiB |
|                   Periodic(1m) | 297 MiB       |       246 MiB |
|                  Periodic(30s) | 185 MiB       |       146 MiB |
|   NewRevision(retension:10000) | 186 MiB       |       178 MiB |

* Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000

```
bnchmark put --rate=200 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=4096
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 874 MiB       |       221 MiB |
|                   Periodic(1m) | 357 MiB       |       260 MiB |
|                  Periodic(30s) | 215 MiB       |       151 MiB |
|   NewRevision(retension:10000) | 182 MiB       |       176 MiB |

For the burst requests, we needs to use short periodic interval.
Otherwise, the total size will be large. I think the new compactor can
handle it well.

Additional Change:

Currently, the quota system only checks DB total size. However, there
could be a lot of free pages which can be reused to upcoming requests.
Based on this proposal, I also want to extend current quota system with DB's
InUse size.

If the InUse size is less than max quota size, we should allow requests to
update. Since the bbolt might be resized if there is no available
continuous pages, we should setup a hard limit for the overflow, like 1
GiB.

```diff
 // Quota represents an arbitrary quota against arbitrary requests. Each request
@@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool {
                return true
        }
        // TODO: maybe optimize Backend.Size()
-       return b.be.Size()+int64(cost) < b.maxBackendBytes
+
+       // Since the compact comes with allocatable pages, we should check the
+       // SizeInUse first. If there is no continuous pages for key/value and
+       // the boltdb continues to resize, it should not increase more than 1
+       // GiB. It's hard limitation.
+       //
+       // TODO: It should be enabled by flag.
+       if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) {
+               return false
+       }
+       return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes
 }
```

And it's likely to disable NOSPACE alarm if the compact can get much
more free pages. It can reduce downtime.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-08-16 23:35:08 +08:00
Benjamin Wang
21c4061d5c
Merge pull request #16288 from skitt/server-semconv-v1.17.0
server: switch to semconv v1.17.0
2023-07-26 13:30:55 +01:00
Stephen Kitt
1010115b8f
server: switch to semconv v1.17.0
This is the latest semconv package used in etcd's dependencies.
Switching to that version reduces the overall package dependencies of
the project (and helps downstream projects which track this,
e.g. Kubernetes).

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-07-24 15:53:04 +02:00
lan.tian
0f975acf2f
update typo in raft.go
Signed-off-by: lan.tian <lance5890@163.com>
2023-07-24 15:48:55 +08:00
iuriatan
abbfc2964a Fix goword issue
Fix `make verify` issues after updating golangci-lint

Signed-off-by: iuriatan <iuriatan@gmail.com>
2023-07-14 16:46:26 -03:00
cui fliter
6760dc9572 remove repetitive the
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-07-13 17:01:01 +08:00
Benjamin Wang
2c22ca7eba dependency: bump golang.org/x/net from v0.11.0 to v0.12.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-07-10 18:43:30 +01:00
Benjamin Wang
843ddb4b1e dependency: bump golang.org/x/crypto from v0.10.0 to v0.11.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-07-10 18:40:35 +01:00
Benjamin Wang
149256735d dependency: bump golang.org/x/sys from v0.9.0 to v0.10.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-07-10 18:38:16 +01:00
Benjamin Wang
f4444e8fb3
Merge pull request #16154 from CaojiamingAlan/uber_applier_test
add tests for uber applier
2023-07-06 20:00:11 +01:00
Benjamin Wang
e887e5291a
Merge pull request #16067 from geetasg/pr1
Adding test for updateClusterVersionV3
2023-07-06 08:36:19 +01:00
caojiamingalan
eff9517a90 etcdserver: add cluster id check for hashKVHandler
Signed-off-by: caojiamingalan <alan.c.19971111@gmail.com>
2023-07-05 14:09:40 -05:00
Tom Wieczorek
a8a9ebd281
auth: Support for EdDSA JWT algorithm
The golang-jwt library supports this already, so supporting it is just a
matter of wiring things up.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
2023-07-05 11:33:08 +02:00
caojiamingalan
ffe73f9a15 add tests for uber applier
Signed-off-by: caojiamingalan <alan.c.19971111@gmail.com>
2023-06-30 22:03:29 -05:00
Benjamin Wang
833aabe4cd
Merge pull request #16149 from ArkaSaha30/main
Manual Dependency Bump
2023-06-28 10:04:50 +01:00
Benjamin Wang
22f9dac7b1
Merge pull request #15708 from chaochn47/confchange_raft_node_notifies_apply
raft node notifies configure when confChanged
2023-06-28 10:03:50 +01:00
ArkaSaha30
37bd1e3382
Bump dependency manually
Signed-off-by: ArkaSaha30 <arkasaha30@gmail.com>
2023-06-28 12:39:27 +05:30
Benjamin Wang
bda68d8d06
Merge pull request #16074 from geetasg/pr2
Enable test to verify membership recovery from backend
2023-06-27 20:24:51 +01:00
Chao Chen
6cdc9ae4fe server/etcdserver/raft.go:
1. rename confChangeCh to raftAdvancedC
2. rename waitApply to confChanged
3. add comments and test assertion

Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-26 22:42:44 -07:00
Benjamin Wang
ad3b6ee4c6 etcdserver: wait for raft is notified on confChange before responding to client
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-06-26 13:40:51 -07:00
Marek Siarkowicz
1967b8e5e6
Merge pull request #16086 from CaojiamingAlan/applier_test
etcdserver: add tests for apply_auth.go
2023-06-26 12:20:05 +02:00
Benjamin Wang
3e725aceef
Merge pull request #16128 from sharathsivakumar/shsi/bump_dependency_wip
Manual dependency Bump
2023-06-23 17:14:11 +01:00
caojiamingalan
11aa59c42d etcdserver: add tests for apply_auth.go
Signed-off-by: caojiamingalan <alan.c.19971111@gmail.com>
2023-06-23 10:58:30 -05:00
sharathsivakumar
b92981606a
dependency: bump github.com/prometheus/client_golang v1.15.1 to v1.16.0
Signed-off-by: sharathsivakumar <mailssr9@gmail.com>
2023-06-23 16:39:09 +02:00
Benjamin Wang
c8247731a2
Merge pull request #16099 from chaochn47/enable_failpoint_in_integration_test
Enable failpoint in integration test
2023-06-23 13:15:08 +01:00
Chao Chen
6d79b86219 Enable failpoint by default in integration tests
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-21 23:13:46 -07:00
Benjamin Wang
04d24c2128
Merge pull request #16100 from geetasg/pr4
Verify consistent index is latest at the time of snapshot
2023-06-22 07:10:23 +01:00
Geeta Gharpure
550aa152a7 Verify consistent index is latest at the time of snapshot
Signed-off-by: Geeta Gharpure <geetagh@amazon.com>
2023-06-19 16:00:04 +00:00
Benjamin Wang
b92d099360 dependency: bump golang.org/x/net from 0.10.0 to 0.11.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-06-19 09:25:07 +01:00
Marek Siarkowicz
cb3730a30f
Merge pull request #16005 from tjungblu/putauthshort
Early exit auth check on lease puts
2023-06-16 08:33:57 +02:00
Thomas Jungblut
84a9af17cc Add first unit test for authApplierV3
This contains a slight refactoring to expose enough information
to write meaningful tests for auth applier v3.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-06-15 15:57:44 +02:00
Geeta Gharpure
e9fa3d30d7 Enable test to verify membership recovery from backend
Signed-off-by: Geeta Gharpure <geetagh@amazon.com>
2023-06-13 18:33:03 +00:00
Prasad Chandrasekaran
3a8c6d749f manual dependency bump
Signed-off-by: Prasad Chandrasekaran <prasadc@vmware.com>
2023-06-13 23:28:47 +05:30
Geeta Gharpure
2b81483103 Adding test for version update function used in 3.6
Signed-off-by: Geeta Gharpure <geetagh@amazon.com>
2023-06-13 17:38:04 +00:00
Benjamin Wang
c104a4d01b
Merge pull request #16031 from kkkkun/add_experimental_hash_check_to_help
add experimental-compact-hash-check-enabled to help
2023-06-12 16:03:30 +08:00
Benjamin Wang
20c4247d2c
Merge pull request #16037 from chaochn47/uds_e2e_test
add uds test cases into e2e TestAuthority
2023-06-12 14:45:59 +08:00
Chao Chen
e6c8bf82e0 add uds test cases into e2e TestAuthority
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-08 15:54:30 -07:00
scuzk373x@gmai.com
798d2b7923 add compact hash check to help
Signed-off-by: scuzk373x@gmai.com <zhuanwajiang@pinduoduo.com>
2023-06-08 14:29:57 +08:00
Benjamin Wang
6c5fde5138
Merge pull request #15985 from CaojiamingAlan/check_revision_before_write_hash
Check ScheduledCompactKeyName and FinishedCompactKeyName before writing hash
2023-06-08 10:10:45 +08:00
caojiamingalan
b9e30bf878 etcdserver: add e2e test to reproduce the incorrect hash issue when resuming scheduled compaction.
check ScheduledCompactKeyName and FinishedCompactKeyName
before writing hash to hashstore. If they do not match, then it means this compaction has once been interrupted and its hash value is invalid. In such cases, we won't write the hash values to the hashstore, and avoids the incorrect corruption alarm.

Signed-off-by: caojiamingalan <alan.c.19971111@gmail.com>
2023-06-07 19:54:09 -05:00
Chao Chen
b2c39fc8e6 2023-06-06: bump up dependencies update identified by dependabot
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-06 20:18:44 -07:00
Thomas Jungblut
dfbe2038f3 Early exit auth check on lease puts
Mitigates #15993 by not checking each key individually for permission
when auth is entirely disabled or admin user is calling the method.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2023-06-06 10:23:46 +02:00
Benjamin Wang
3413f2e08d
Merge pull request #15908 from cuishuang/main
*: use strings.Builder instead of bytes.Buffer
2023-05-26 09:41:01 +08:00
Benjamin Wang
cdd9846b56
Merge pull request #15947 from chaochn47/hash_doc
update code comments
2023-05-25 18:44:49 +08:00
Bogdan Kanivets
ef91e8ae78 dependency: bump github.com/stretchr/testify from 1.8.2 to 1.8.3
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
2023-05-24 23:58:13 -07:00
Chao Chen
9e1e378e9e update code comments
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-24 12:54:27 -07:00
cui fliter
0c919dc212 use the more efficient strings.Builder
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-05-19 10:44:58 +08:00
Benjamin Wang
19ec574f45
Merge pull request #15915 from hexfusion/bytes
server/storage/schema: prefer equal to compare for equality comparisons
2023-05-18 10:53:42 +08:00
Sam Batschelet
a708e94749 server/storage/schema: prefer equal to compare for equality comparisons
Signed-off-by: Sam Batschelet <sbatschelet@gmail.com>
2023-05-16 21:25:34 -04:00
James Blair
5a5b5a1c5d
dependency: bump github.com/prometheus/client_golang from 1.15.0 to 1.15.1
Signed-off-by: James Blair <mail@jamesblair.net>
2023-05-16 09:26:44 +12:00