169 Commits

Author SHA1 Message Date
Fu Wei
d63ca43092
Merge 4db8df677c618b462145fce7cb926c072a0ce932 into c86c93ca2951338115159dcdd20711603044e1f1 2024-09-25 21:36:55 -07:00
Lan Liang
a966c07165 migrate experimental-initial-corrupt-check flag to feature gate.
Signed-off-by: Lan Liang <gcslyp@gmail.com>
2024-08-22 14:42:18 +00:00
Siyuan Zhang
bd228cf6d1 migrate experimental-stop-grpc-service-on-defrag flag to feature gate.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-08-05 13:46:51 -07:00
Siyuan Zhang
0e77563e35 Add config file field for feature-gates flag.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-07-23 15:27:07 -07:00
Siyuan Zhang
7b355141d9 Add "server-feature-gates" flag.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-07-18 13:20:30 -07:00
Benjamin Wang
b6c5262026 Differentiate the warning message for rejected client and peer connections
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
2024-07-13 19:28:21 +01:00
Benjamin Wang
1d13fc58ff
Merge pull request #14066 from rleungx/add-config
embed: add `GRPCAdditionalServerOptions` config
2024-06-21 06:40:32 +01:00
Benjamin Wang
692e44a80b Update the error message when client certificate isn't provided for secure metrics url
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
2024-06-18 14:03:39 +01:00
Gyuho Lee
497f1a45a3
license
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2024-06-18 07:28:43 +08:00
Gyuho Lee
22f20a827b
test(e2e): add a case where client tls is missing for https metrics url
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2024-06-17 21:09:24 +08:00
Gyuho Lee
a657f069a1
fix(server/embed): enforce non-empty client TLS if scheme is https/unixs
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2024-06-17 18:21:38 +08:00
Ryan Leung
d9cb8b80f5 address the comment
Signed-off-by: Ryan Leung <rleungx@gmail.com>
2024-06-12 11:38:41 +08:00
Ryan Leung
29abd62338 introduce GRPCAdditionalServerOptions
Signed-off-by: Ryan Leung <rleungx@gmail.com>
2024-06-07 16:39:35 +08:00
lhy1024
acc9d7c9fe Support multiple values for allowed client and peer TLS identities
Signed-off-by: lhy1024 <admin@liudos.us>
2024-06-06 21:25:17 +08:00
Stephen Kitt
c1f5a445fc
Use Go 1.20 error joining instead of multierr
This still allows the errors to be unwrapped, and drops the direct
dependency on multierr.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-05-31 09:22:52 +02:00
Edwin Xie
4c77726914 Implement flag --experimental-set-member-localaddr
Which sets the LocalAddr to an IP address from --initial-advertise-peer-urls.

Also adds e2e test that requires this flag to succeed.

Co-authored-by: HighPon <s.shiraki.business@gmail.com>
Signed-off-by: Edwin Xie <edwin.xie@broadcom.com>
2024-05-24 18:17:37 +00:00
Shim Myeongseob
769500124f embed: fix typo in comment
Signed-off-by: Shim Myeongseob <Mangsby716@gmail.com>
2024-05-19 13:59:25 +00:00
Seena Fallah
b31f23e113 config: support AllowedCN and AllowedHostname through config file
Allow setting AllowedCN and AllowedHostname tls fields through config file for peer transport security.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-05-07 22:39:16 +02:00
Ivan Valdes
a2bf8d7e80
server/config: address golangci var-naming issues
Addresses issues in V2 Deprecation constant names.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-22 17:12:15 -07:00
Ivan Valdes
2e5188f618
server/embed: address golangci var-naming issues
Addresses issues in ListenPeerUrls, ListenClientUrls,
ListenClientHttpUrls, AdvertisePeerUrls, AdvertiseClientUrls.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-18 08:30:39 -06:00
Ivan Valdes
0a1bc1208f
server/embed: address golangci var-naming issues
Addresses issues in TLSMinVersion, TLSMaxVersion, WALDir, and
MaxWALFiles.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-17 16:33:28 -06:00
Fube
cf66d0f64f etcdserver: updated pre-vote flag description
To better communicate what the pre-vote phase in Raft is.

Signed-off-by: Fube <fubeitch@gmail.com>
2024-04-12 17:14:19 -04:00
Ivan Valdes
66f56d71e4
server: address golangci var-naming issues
Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-03-20 21:12:12 -07:00
Ivan Valdes
14523bdc21
etcdserver: rename MemberId() to MemberID() to address var-naming
Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-03-18 17:18:29 -07:00
Benjamin Wang
362f0a2fcb print error log when creating peer listener failed
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
2024-01-24 13:47:30 +00:00
devincd
931687f87e fix Struct Config has methods on both value and pointer receivers. Such usage is not recommended by the Go Documentation.
Signed-off-by: devincd <505259926@qq.com>
2024-01-09 17:42:59 +08:00
Chao Chen
ea035471ce online defrag notifies gRPC health server to expose NOT_SERVING status
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-10-25 08:58:33 -07:00
Wei Fu
aea1cd0077 feat: enable unparam lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-10-17 21:24:13 +08:00
Marek Siarkowicz
9f40116fa0 Return to default write scheduler since golang.org/x/net@v0.11.0 started using round robin
Introduction of round robin 120fc906b3
Added in v0.10.0 https://github.com/golang/net/compare/v0.10.0...v0.11.0

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-10 16:31:42 +02:00
Marek Siarkowicz
c1fb2c2316 Use default embed config in e2e tests
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-05 14:17:45 +02:00
Marek Siarkowicz
e31de5e3c1 Revert "etcd server shouldn't wait for the ready notification infinitely on startup"
This reverts commit 1713dc67b5149e74cd28bad7c080c646315d0b06.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-03 21:37:18 +02:00
Marek Siarkowicz
50fb919318 Make AddEmbedFlags functon a method on embed.Config
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-02 13:53:38 +02:00
Wei Fu
4704a5af3a *: fix unused issue
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-25 19:37:18 +08:00
Wei Fu
aa97484166 *: enable goimports in verify-lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-21 21:14:09 +08:00
Wei Fu
5e3910d96c *: fix govet-shadow lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-19 20:24:01 +08:00
Benjamin Wang
8eba295bc5 Resolve review comments: add some comments to clarify some confusion script or code
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-09-18 12:09:46 +01:00
Benjamin Wang
5444cdae69 remove all usage of v1 grpc-gateway
1. Manually updated go source file to remove the usage of v1 grpc-gateway;
2. Execute ./scripts/fix.sh

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-18 11:22:16 +01:00
chenyahui
c0aa3b613b Use any instead of interface{}
Signed-off-by: chenyahui <cyhone@qq.com>
2023-09-17 17:41:58 +08:00
Wei Fu
4db8df677c feature: add new compactor based revision count
What would you like to be added?

Add new compactor based revision count, instead of fixed interval time.

In order to make it happen, the mvcc store needs to export
`CompactNotify` function to notify the compactor that configured number of
write transactions have occured since previsious compaction. The
new compactor can get the revision change and delete out-of-date data in time,
instead of waiting with fixed interval time. The underly bbolt db can
reuse the free pages as soon as possible.

Why is this needed?

In the kubernetes cluster, for instance, argo workflow, there will be batch
requests to create pods , and then there are also a lot of pod status's PATCH
requests, especially when the pod has more than 3 containers. If the burst
requests increase the db size in short time, it will be easy to exceed the max
quota size. And then the cluster admin get involved to defrag, which may casue
long downtime. So, we hope the ETCD can delete the out-of-date data as
soon as possible and slow down the grow of total db size.

Currently, both revision and periodic are based on time. It's not easy
to use fixed interval time to face the unexpected burst update requests.
The new compactor based on revision count can make the admin life easier.
For instance, let's say that average of object size is 50 KiB. The new
compactor will compact based on 10,000 revisions. It's like that ETCD can
compact after new 500 MiB data in, no matter how long ETCD takes to get
new 10,000 revisions. It can handle the burst update requests well.

There are some test results:

* Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000

```
enchmark put --rate=100 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 570 MiB       |       208 MiB |
|                   Periodic(1m) | 232 MiB       |       165 MiB |
|                  Periodic(30s) | 151 MiB       |       127 MiB |
|   NewRevision(retension:10000) | 195 MiB       |       187 MiB |

* Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000

```
bnchmark put --rate=150 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=1024
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 718 MiB       |       554 MiB |
|                   Periodic(1m) | 297 MiB       |       246 MiB |
|                  Periodic(30s) | 185 MiB       |       146 MiB |
|   NewRevision(retension:10000) | 186 MiB       |       178 MiB |

* Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000

```
bnchmark put --rate=200 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=4096
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 874 MiB       |       221 MiB |
|                   Periodic(1m) | 357 MiB       |       260 MiB |
|                  Periodic(30s) | 215 MiB       |       151 MiB |
|   NewRevision(retension:10000) | 182 MiB       |       176 MiB |

For the burst requests, we needs to use short periodic interval.
Otherwise, the total size will be large. I think the new compactor can
handle it well.

Additional Change:

Currently, the quota system only checks DB total size. However, there
could be a lot of free pages which can be reused to upcoming requests.
Based on this proposal, I also want to extend current quota system with DB's
InUse size.

If the InUse size is less than max quota size, we should allow requests to
update. Since the bbolt might be resized if there is no available
continuous pages, we should setup a hard limit for the overflow, like 1
GiB.

```diff
 // Quota represents an arbitrary quota against arbitrary requests. Each request
@@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool {
                return true
        }
        // TODO: maybe optimize Backend.Size()
-       return b.be.Size()+int64(cost) < b.maxBackendBytes
+
+       // Since the compact comes with allocatable pages, we should check the
+       // SizeInUse first. If there is no continuous pages for key/value and
+       // the boltdb continues to resize, it should not increase more than 1
+       // GiB. It's hard limitation.
+       //
+       // TODO: It should be enabled by flag.
+       if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) {
+               return false
+       }
+       return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes
 }
```

And it's likely to disable NOSPACE alarm if the compact can get much
more free pages. It can reduce downtime.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-08-16 23:35:08 +08:00
Benjamin Wang
979102f895 clientv3: remove the experimental gRPC API grpccredentials.Bundle
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-08-02 19:35:51 +01:00
Stephen Kitt
1010115b8f
server: switch to semconv v1.17.0
This is the latest semconv package used in etcd's dependencies.
Switching to that version reduces the overall package dependencies of
the project (and helps downstream projects which track this,
e.g. Kubernetes).

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-07-24 15:53:04 +02:00
Chao Chen
e6c8bf82e0 add uds test cases into e2e TestAuthority
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-06-08 15:54:30 -07:00
AngstyDuck
a7344da7d3 server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined
Signed-off-by: AngstyDuck <solsticedante@gmail.com>
2023-05-09 23:10:44 +08:00
Marek Siarkowicz
549087cd69 server: Fix defer function closure escape
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 13:37:31 +02:00
Marek Siarkowicz
bf12179a5a server: Add --listen-client-http-urls flag to allow running grpc server separate from http server
Difference in load configuration for watch delay tests show how huge the
impact is. Even with random write scheduler grpc under http
server can only handle 500 KB with 2 seconds delay. On the other hand,
separate grpc server easily hits 10, 100 or even 1000 MB within 100 miliseconds.

Priority write scheduler that was used in most previous releases
is far worse than random one.

Tests configured to only 5 MB to avoid flakes and taking too long to fill
etcd.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:45 +02:00
Marek Siarkowicz
419a56e51a server: Pick one address that all grpc gateways connect to
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:45 +02:00
Marek Siarkowicz
d1f674d624 server: Extract resolveUrl helper function
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:43 +02:00
Marek Siarkowicz
85c48c4a60 server: Separate client listener grouping from serving
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:48:46 +02:00
Wei Fu
a9988e2625 server/embed: fix data race when start insecure grpc
There are two goroutines accessing the `gs` grpc server var. Before
insecure `gs` server start, the `gs` can be changed to secure server and
then the client will fail to connect to etcd with insecure request. It
is data-race. We should use argument for reference in the new goroutine.

fix: #15495

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-18 21:48:58 +08:00
Marek Siarkowicz
372042c374 refactor: Use proper variable names for urls
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-13 14:48:01 +01:00