79 Commits

Author SHA1 Message Date
Fu Wei
d63ca43092
Merge 4db8df677c618b462145fce7cb926c072a0ce932 into c86c93ca2951338115159dcdd20711603044e1f1 2024-09-25 21:36:55 -07:00
Siyuan Zhang
bd228cf6d1 migrate experimental-stop-grpc-service-on-defrag flag to feature gate.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-08-05 13:46:51 -07:00
Siyuan Zhang
0e77563e35 Add config file field for feature-gates flag.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-07-23 15:27:07 -07:00
Siyuan Zhang
7b355141d9 Add "server-feature-gates" flag.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-07-18 13:20:30 -07:00
Ryan Leung
d9cb8b80f5 address the comment
Signed-off-by: Ryan Leung <rleungx@gmail.com>
2024-06-12 11:38:41 +08:00
Ryan Leung
29abd62338 introduce GRPCAdditionalServerOptions
Signed-off-by: Ryan Leung <rleungx@gmail.com>
2024-06-07 16:39:35 +08:00
lhy1024
acc9d7c9fe Support multiple values for allowed client and peer TLS identities
Signed-off-by: lhy1024 <admin@liudos.us>
2024-06-06 21:25:17 +08:00
Stephen Kitt
c1f5a445fc
Use Go 1.20 error joining instead of multierr
This still allows the errors to be unwrapped, and drops the direct
dependency on multierr.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-05-31 09:22:52 +02:00
Edwin Xie
4c77726914 Implement flag --experimental-set-member-localaddr
Which sets the LocalAddr to an IP address from --initial-advertise-peer-urls.

Also adds e2e test that requires this flag to succeed.

Co-authored-by: HighPon <s.shiraki.business@gmail.com>
Signed-off-by: Edwin Xie <edwin.xie@broadcom.com>
2024-05-24 18:17:37 +00:00
Shim Myeongseob
769500124f embed: fix typo in comment
Signed-off-by: Shim Myeongseob <Mangsby716@gmail.com>
2024-05-19 13:59:25 +00:00
Seena Fallah
b31f23e113 config: support AllowedCN and AllowedHostname through config file
Allow setting AllowedCN and AllowedHostname tls fields through config file for peer transport security.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-05-07 22:39:16 +02:00
Ivan Valdes
a2bf8d7e80
server/config: address golangci var-naming issues
Addresses issues in V2 Deprecation constant names.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-22 17:12:15 -07:00
Ivan Valdes
2e5188f618
server/embed: address golangci var-naming issues
Addresses issues in ListenPeerUrls, ListenClientUrls,
ListenClientHttpUrls, AdvertisePeerUrls, AdvertiseClientUrls.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-18 08:30:39 -06:00
Ivan Valdes
0a1bc1208f
server/embed: address golangci var-naming issues
Addresses issues in TLSMinVersion, TLSMaxVersion, WALDir, and
MaxWALFiles.

Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-04-17 16:33:28 -06:00
Fube
cf66d0f64f etcdserver: updated pre-vote flag description
To better communicate what the pre-vote phase in Raft is.

Signed-off-by: Fube <fubeitch@gmail.com>
2024-04-12 17:14:19 -04:00
devincd
931687f87e fix Struct Config has methods on both value and pointer receivers. Such usage is not recommended by the Go Documentation.
Signed-off-by: devincd <505259926@qq.com>
2024-01-09 17:42:59 +08:00
Chao Chen
ea035471ce online defrag notifies gRPC health server to expose NOT_SERVING status
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-10-25 08:58:33 -07:00
Marek Siarkowicz
c1fb2c2316 Use default embed config in e2e tests
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-05 14:17:45 +02:00
Marek Siarkowicz
e31de5e3c1 Revert "etcd server shouldn't wait for the ready notification infinitely on startup"
This reverts commit 1713dc67b5149e74cd28bad7c080c646315d0b06.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-03 21:37:18 +02:00
Marek Siarkowicz
50fb919318 Make AddEmbedFlags functon a method on embed.Config
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-02 13:53:38 +02:00
Wei Fu
4704a5af3a *: fix unused issue
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-25 19:37:18 +08:00
Wei Fu
aa97484166 *: enable goimports in verify-lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-09-21 21:14:09 +08:00
Wei Fu
4db8df677c feature: add new compactor based revision count
What would you like to be added?

Add new compactor based revision count, instead of fixed interval time.

In order to make it happen, the mvcc store needs to export
`CompactNotify` function to notify the compactor that configured number of
write transactions have occured since previsious compaction. The
new compactor can get the revision change and delete out-of-date data in time,
instead of waiting with fixed interval time. The underly bbolt db can
reuse the free pages as soon as possible.

Why is this needed?

In the kubernetes cluster, for instance, argo workflow, there will be batch
requests to create pods , and then there are also a lot of pod status's PATCH
requests, especially when the pod has more than 3 containers. If the burst
requests increase the db size in short time, it will be easy to exceed the max
quota size. And then the cluster admin get involved to defrag, which may casue
long downtime. So, we hope the ETCD can delete the out-of-date data as
soon as possible and slow down the grow of total db size.

Currently, both revision and periodic are based on time. It's not easy
to use fixed interval time to face the unexpected burst update requests.
The new compactor based on revision count can make the admin life easier.
For instance, let's say that average of object size is 50 KiB. The new
compactor will compact based on 10,000 revisions. It's like that ETCD can
compact after new 500 MiB data in, no matter how long ETCD takes to get
new 10,000 revisions. It can handle the burst update requests well.

There are some test results:

* Fixed value size: 10 KiB, Update Rate: 100/s, Total key space: 3,000

```
enchmark put --rate=100 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 570 MiB       |       208 MiB |
|                   Periodic(1m) | 232 MiB       |       165 MiB |
|                  Periodic(30s) | 151 MiB       |       127 MiB |
|   NewRevision(retension:10000) | 195 MiB       |       187 MiB |

* Random value size: [9 KiB, 11 KiB], Update Rate: 150/s, Total key space: 3,000

```
bnchmark put --rate=150 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=1024
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 718 MiB       |       554 MiB |
|                   Periodic(1m) | 297 MiB       |       246 MiB |
|                  Periodic(30s) | 185 MiB       |       146 MiB |
|   NewRevision(retension:10000) | 186 MiB       |       178 MiB |

* Random value size: [6 KiB, 14 KiB], Update Rate: 200/s, Total key space: 3,000

```
bnchmark put --rate=200 --total=300000 --compact-interval=0 \
  --key-space-size=3000 --key-size=256 --val-size=10240 \
  --delta-val-size=4096
```

|                      Compactor | DB Total Size | DB InUse Size |
|                             -- | --            |            -- |
| Revision(5min,retension:10000) | 874 MiB       |       221 MiB |
|                   Periodic(1m) | 357 MiB       |       260 MiB |
|                  Periodic(30s) | 215 MiB       |       151 MiB |
|   NewRevision(retension:10000) | 182 MiB       |       176 MiB |

For the burst requests, we needs to use short periodic interval.
Otherwise, the total size will be large. I think the new compactor can
handle it well.

Additional Change:

Currently, the quota system only checks DB total size. However, there
could be a lot of free pages which can be reused to upcoming requests.
Based on this proposal, I also want to extend current quota system with DB's
InUse size.

If the InUse size is less than max quota size, we should allow requests to
update. Since the bbolt might be resized if there is no available
continuous pages, we should setup a hard limit for the overflow, like 1
GiB.

```diff
 // Quota represents an arbitrary quota against arbitrary requests. Each request
@@ -130,7 +134,17 @@ func (b *BackendQuota) Available(v interface{}) bool {
                return true
        }
        // TODO: maybe optimize Backend.Size()
-       return b.be.Size()+int64(cost) < b.maxBackendBytes
+
+       // Since the compact comes with allocatable pages, we should check the
+       // SizeInUse first. If there is no continuous pages for key/value and
+       // the boltdb continues to resize, it should not increase more than 1
+       // GiB. It's hard limitation.
+       //
+       // TODO: It should be enabled by flag.
+       if b.be.Size()+int64(cost)-b.maxBackendBytes >= maxAllowedOverflowBytes(b.maxBackendBytes) {
+               return false
+       }
+       return b.be.SizeInUse()+int64(cost) < b.maxBackendBytes
 }
```

And it's likely to disable NOSPACE alarm if the compact can get much
more free pages. It can reduce downtime.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-08-16 23:35:08 +08:00
AngstyDuck
a7344da7d3 server: default value for config file field auto-compaction-mode is now 'periodic'; added additional checks if auto-compaction-mode is undefined
Signed-off-by: AngstyDuck <solsticedante@gmail.com>
2023-05-09 23:10:44 +08:00
Marek Siarkowicz
bf12179a5a server: Add --listen-client-http-urls flag to allow running grpc server separate from http server
Difference in load configuration for watch delay tests show how huge the
impact is. Even with random write scheduler grpc under http
server can only handle 500 KB with 2 seconds delay. On the other hand,
separate grpc server easily hits 10, 100 or even 1000 MB within 100 miliseconds.

Priority write scheduler that was used in most previous releases
is far worse than random one.

Tests configured to only 5 MB to avoid flakes and taking too long to fill
etcd.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:45 +02:00
Marek Siarkowicz
372042c374 refactor: Use proper variable names for urls
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-13 14:48:01 +01:00
Tero Saarni
588b98d085 Add TLSv1.3 support.
Added optional TLS min/max protocol version and command line switches to set
versions for the etcd server.

If max version is not explicitly set by the user, let Go select the max
version which is currently TLSv1.3. Previously max version was set to TLSv1.2.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
2023-01-30 16:16:53 +02:00
Chao Chen
2c46b2b299 externalize snapshot catchup entries to etcd flag
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-01-04 19:01:07 -08:00
Piotr Tabor
9e1abbab6e Fix goimports in all existing files. Execution of ./scripts/fix.sh
Signed-off-by: Piotr Tabor <ptab@google.com>
2022-12-29 09:41:31 +01:00
Bogdan Kanivets
7e8ebf7727 server: added duplicate warning-unary-request-duration flag
--warning-unary-request-duration is a duplicate of --experimental-warning-unary-request-duration
experimental-warning-unary-request-duration will be removed in v3.7.

fixes https://github.com/etcd-io/etcd/issues/13783

Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
2022-11-18 18:06:00 +08:00
nic-chen
6f6275e1ab chore: update the warn message on startup when the server name is default
Signed-off-by: nic-chen <chenjunxu6@gmail.com>
2022-10-28 20:56:45 +08:00
nic-chen
191fb306ef fix: apply review suggestion
Signed-off-by: nic-chen <chenjunxu6@gmail.com>
2022-10-27 08:23:22 +08:00
demoManito
a9c3d56508 etcd: remove redundant type conversion
Signed-off-by: demoManito <1430482733@qq.com>
2022-09-20 11:26:02 +08:00
Sam Batschelet
76a5902efa server/etcdmain: add configurable cipher list to gRPC proxy listener
Signed-off-by: Allen Ray <alray@redhat.com>
2022-08-17 10:56:27 -04:00
Marek Siarkowicz
d44bbff278 server: Make corrtuption check optional and period configurable
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 09:31:15 +02:00
Benjamin Wang
1a6fe4dbc6 update the comment for MaxConcurrentStreams to clearly state it's the max value for each client.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-07 04:51:20 +08:00
Benjamin Wang
053ba95ed5 set max concurrent streams to the http2 server
The default max stream is 250 in http2. When there are more then
250 streams, the client side may be blocked until some previous
streams are released. So we need to support configuring a larger
`MaxConcurrentStreams`.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-06 03:43:46 +08:00
杨金珏
6220174687 support custom grpc.MaxConcurrentStreams
There is no update on the original PR (see below) for more then 2
weeks. So Benjamin(@ahrtr) continues to work on the PR. The first
step is to rebase the PR, because there are lots of conflicts with
the main branch.

The change to go.mod and go.sum reverted, because they are not needed.
The e2e test cases are also reverted, because they are not correct.

```
https://github.com/etcd-io/etcd/pull/14081
```

Signed-off-by: nic-chen <chenjunxu6@gmail.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-06 03:43:46 +08:00
Dirc
ba405f64c1
Update link to tuning page 2022-06-08 10:16:04 +02:00
ahrtr
1a3822f2c3 Rename ClientConfig to ConfigSpec
The ClientConfig is a fully declarive configuration, so it makes more
sense to rename it to ConfigSpec. It can also mitigate the confusion
between Config and ClientConfig.
2022-03-13 05:41:49 +08:00
ahrtr
3dcbbf62d9 Move clientconfig into clientv3 so that it can be reused by both etcdctl and v3 discovery 2022-03-12 06:38:41 +08:00
ahrtr
2f36e0c62b Change discovery url to endpoints
Currently the discovery url is just one endpoint. But actually it
should be the same as the etcdctl, which means that it should be
a list of endpoints. When one endpoint is down, the clientv3 can
fail over to the next endpoint automatically.
2022-02-24 09:11:41 +08:00
ahrtr
ebc86d12c0 support v3 discovery to bootstrap a new etcd cluster 2022-02-21 23:22:49 +08:00
ahrtr
1713dc67b5 etcd server shouldn't wait for the ready notification infinitely on startup 2022-01-27 16:19:20 +08:00
Marek Siarkowicz
ee5ef42c5c server: --enable-v2 and --enable-v2v3 is decomissioned 2022-01-14 13:19:30 +01:00
Marek Siarkowicz
7d10899d7f server: Require either cluster version v3.6 or --experimental-enable-lease-checkpoint-persist to persist lease remainingTTL
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.

To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
2021-12-02 12:26:47 +01:00
Sam Batschelet
63a1cc3fe4 add --experimental-max-learner flag
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-11-09 09:52:00 -05:00
Eng Zer Jun
2a151c8982
*: move from io/ioutil to io and os packages
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-10-28 00:05:28 +08:00
Arda Güçlü
e647995a38 Add zap encoding configurable
Json encoding is the default zap encoding value and can not be changeable.
This PR enables configuring zap encoding to console via new flag `log-format`.
2021-09-22 15:48:47 +03:00
Sam Batschelet
a4a82cc982
Merge pull request #13248 from lilic/add-sampling-rate
server: Add sampling rate to distributed tracing
2021-08-30 08:31:00 -04:00