16720 Commits

Author SHA1 Message Date
Gyuho Lee
f46c924f10
Merge pull request #12756 from hexfusion/bump-cl-03-09-3.5
CHANGELOG: add socket option flags #12702
2021-03-09 19:52:05 -08:00
Sam Batschelet
7ac1367783 CHANGELOG: add socket option flags #12702
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-09 12:47:10 -05:00
Gyuho Lee
94a371acd7
Merge pull request #12750 from ptabor/20210306-mlock
--experimental-memory-mlock support
2021-03-09 09:13:40 -08:00
Gyuho Lee
6fd85af641
Merge pull request #12702 from hexfusion/add-so
*: add support for socket options
2021-03-09 09:02:24 -08:00
Sam Batschelet
5b49fb41c8 fixup: add ListenerOptions
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-08 11:27:03 -05:00
Piotr Tabor
a46a358577 --experimental-memory-mlock support
The flag protects etcd memory from being swapped out to disk.
This can happen in memory constrained systems where mmaped bbolt
area is natural condidate for swapping out.

This flag should provide better tail latency on the cost of higher RSS
ram usage. If the experiment is successful, the logic should get moved
into bbolt layer, where we can protect specific bbolt instances
(e.g. avoid protecting both during defragmentation).
2021-03-07 12:32:57 +01:00
Piotr Tabor
792b7f57d3
Merge pull request #12747 from wilsonwang371/master
pkg/wait: change list from single element to an array.
2021-03-07 11:53:30 +01:00
Piotr Tabor
7556b9a011
Merge pull request #12752 from cwedgwood/master-nofsyncdowrite
[RFC (against master branch)] etcdserver: when using --unsafe-no-fsync write data
2021-03-07 11:51:57 +01:00
Chris Wedgwood
b63d31e89b etcdserver: when using --unsafe-no-fsync write data
There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
2021-03-05 10:58:04 -08:00
Piotr Tabor
f4001630d9
Merge pull request #12748 from ptabor/20210305-deflake
Test flakes: 1 fix + 1 diagnostic
2021-03-04 23:38:16 +01:00
Piotr Tabor
66cef61444 Detect leaked go-routines bases on pre-normalization syntax. 2021-03-04 22:28:44 +01:00
Piotr Tabor
339f8fa4bd test.sh: Run integration tests with -v and shorter deadline.
The purpose of this change is to learn more about flake cases like:
  https://travis-ci.com/github/etcd-io/etcd/jobs/488324449

```
% (cd tests && 'env' 'go' 'test' '-timeout=30m' '--race=false' '--cpu=2' './integration/...')
stderr: go: downloading github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e
ok  	go.etcd.io/etcd/tests/v3/integration	197.295s
ok  	go.etcd.io/etcd/tests/v3/integration/client	0.089s
ok  	go.etcd.io/etcd/tests/v3/integration/client/examples	0.038s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3	70.365s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/concurrency	3.169s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/connectivity	100.535s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/examples	1.341s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes	3.277s

No output has been received in the last 10m0s,
```
2021-03-04 22:23:05 +01:00
Wilson Wang
432fde88a9 pkg/wait: change list from single element to an array.
We found wait lock contention when a large amount of write operations. Converting wait from single element to an array helps to improve the performance.

Fixes #12731

Signed-off-by: Wilson Wang <wilsonny371@gmail.com>
2021-03-04 12:16:41 -08:00
Piotr Tabor
aefbd226b8
Merge pull request #12745 from ptabor/20210304-fix-wtfpl
Update version of certifi/gocertifi to get rid of WTF Public license
2021-03-04 12:47:06 +01:00
Piotr Tabor
ba1eebe2ea
Merge pull request #12744 from emilyselwood/patch-1
Metrics example 404s - fix url
2021-03-04 09:59:56 +01:00
Piotr Tabor
f7a2389992 Update version of certifi/gocertifi to get rid of WTF Public license
Seems old versions of https://github.com/certifi/gocertifi where
categorized as "Do What The F*ck You Want To Public License".

Update to newer version that is explicit `Mozilla Public License` 2.0 (MPL 2.0).
2021-03-04 09:48:34 +01:00
Emily Selwood
9ebbf5f38b
Metrics example 404s - fix url
The metrics example link points to a file that appears to have moved. This change points it to what I think is the right place.
2021-03-04 08:17:48 +00:00
Piotr Tabor
61fef348f8
Merge pull request #12742 from wilsonwang371/master
debugutil: Remove extra space in trace handler route
2021-03-04 08:55:44 +01:00
Wilson Wang
7fc447fb5c debugutil: Remove extra space in trace handler route
debugutil: Remove extra space in trace handler route. To use trace, user needed to escape the extra space and the extra space needs to be removed.
2021-03-03 16:01:04 -08:00
Sam Batschelet
f02525c75d
Merge pull request #12741 from hexfusion/bump-cl-03-03
CHANGELOG: update to include experimental-apply-warning-duration
2021-03-03 13:38:22 -05:00
Sam Batschelet
e9947bc018 CHANGELOG: update to include experimental-apply-warning-duration
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-03 12:08:58 -05:00
Piotr Tabor
102c198444
Merge pull request #12705 from astromechza/bm_etcd_peer_server_cert
etcdmain: added peer-client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
2021-03-02 09:03:35 +01:00
Gyuho Lee
102096ade2
Merge pull request #12737 from davidlanouette/12718
client v2: check for empty request from the context
2021-03-01 23:57:58 -08:00
Gyuho Lee
6ace85b624
Merge pull request #12736 from ptabor/20210301-fix-flakes
tests: Fixes a few recently spotted test-flakes
2021-03-01 14:22:29 -08:00
David Lanouette
6998c5641c client v2: rename error var for revive
The revive tool complained durring the build.  Error variable has been
renamed.

Fixes #12718

Signed-off-by: David Lanouette <David.Lanouette@GMail.com>
2021-03-01 17:09:16 -05:00
David Lanouette
7d02ce2073 client v2: check for empty request from the context
If the simpleHTTPClient.Do is called and the context has a nil request, return an error early.

Fixes #12718

Signed-off-by: David Lanouette <David.Lanouette@GMail.com>
2021-03-01 16:46:30 -05:00
Piotr Tabor
5e10d12996 tests: Fixes a few recently spotted test-flakes
```
Unexpected goroutines running after all test(s).
1 instances of:
syscall.Syscall(...)
	/usr/local/go/src/syscall/asm_linux_386.s:19 +0x5
syscall.Close(...)
	/usr/local/go/src/syscall/zsyscall_linux_386.go:285 +0x3d
internal/poll.(*FD).destroy(...)
	/usr/local/go/src/internal/poll/fd_unix.go:77 +0x30
internal/poll.(*FD).decref(...)
	/usr/local/go/src/internal/poll/fd_mutex.go:213 +0x38
internal/poll.(*FD).Close(...)
	/usr/local/go/src/internal/poll/fd_unix.go:99 +0x43
net.(*netFD).Close(...)
	/usr/local/go/src/net/fd_posix.go:37 +0x49
FAIL	go.etcd.io/etcd/tests/v3/integration/client	0.039s
```

```
--- FAIL: TestServer_TCP_Secure_DelayTx (0.20s)
    server_test.go:110: took 128.026085ms with no latency
    server_test.go:125: took 62.980988ms with latency 50ms��5ms
    server_test.go:133: expected took1 128.026085ms < took2 62.980988ms (with latency)
```

https://github.com/etcd-io/etcd/issues/12372
2021-03-01 18:07:38 +01:00
Ben Meier
3d44f5bf80
*: added client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
In some environments, the CA is not able to sign certificates with both
'client auth' and 'server auth' extended usage parameters and so an operator
needs to be able to set a seperate client certificate to use when making
requests which is different to the certificate used for accepting requests.
This applies to both proxy and etcd member mode and is available as both a CLI
 flag and config file field for peer TLS.

Signed-off-by: Ben Meier <ben.meier@oracle.com>
2021-02-28 14:37:56 +00:00
Gyuho Lee
d06d93d5b1 CHANGELOG: add go 1.16 for etcd 3.5, add release links
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2021-02-26 23:10:25 +00:00
Gyuho Lee
d4d303d908 CHANGELOG: add v3.4.15
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2021-02-26 23:04:36 +00:00
Gyuho Lee
cb040801f9
Merge pull request #12730 from ptabor/20200208-client-no-logger
clientv3: Cleaning up dead logger code
2021-02-26 10:44:42 -08:00
Piotr Tabor
a7f340216d Reformat code according to 'gotip' rules.
In practices adds annotations in the new syntax:
```
+//go:build !linux
 // +build !linux
```

Fixes failing gotip PASSES='fmt' check:
https://travis-ci.com/github/etcd-io/etcd/jobs/486453806
2021-02-26 10:14:46 +01:00
Piotr Tabor
54b87505a3 Remove dead legacy logger code. 2021-02-26 09:13:09 +01:00
Piotr Tabor
a769916ea2
Merge pull request #12729 from ptabor/20210225-raftexample
raftExample: Allow closing raftexample node when snapshotting.
2021-02-26 08:58:30 +01:00
Piotr Tabor
3976d68ed3 raftExample: Allow closing raftexample node when snapshotting.
Fix race that made the raftExample test fail.
2021-02-26 08:56:12 +01:00
Piotr Tabor
9563698f64
Merge pull request #12727 from ptabor/20210225-fix-resolver-ordering
ClientV3: Ordering: Fix TestEndpointSwitchResolvesViolation test
2021-02-26 08:48:20 +01:00
Sahdev Zala
39726116c5
Merge pull request #12728 from nate-double-u/12700-update-links
Updating links in .md files after removing Documentation.
2021-02-25 19:43:19 -05:00
Piotr Tabor
45b1e6b470 ClientV3: Ordering: Fix the ordering test such it does not fail.
The test depended on very subtle timing semantic and on properties of
'copied' clients.

https://travis-ci.com/github/etcd-io/etcd/jobs/486191449

Examplar failure:
```
{"level":"warn","ts":"2021-02-25T12:34:47.894Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6fc0/#initially=[unix://localhost:86269902489114839060]","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
{"level":"warn","ts":"2021-02-25T12:34:48.163Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00035a000/#initially=[unix://localhost:78285857058450835940]","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: not leader"}
{"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2021-02-25T12:34:50.255Z","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2021-02-25T12:34:51.717Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:52.017Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":"2021-02-25T12:34:52.018Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:53.018Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
--- FAIL: TestEndpointSwitchResolvesViolation (10.12s)
    ordering_util_test.go:81: failed to resolve order violation etcdclient: no cluster members have a revision higher than the previously received revision
```
2021-02-25 22:15:13 +01:00
Nate W
d41e18817a Updating links in .md files after removing Documentation.
Signed-off-by: Nate W <4453979+nate-double-u@users.noreply.github.com>
2021-02-25 12:59:57 -08:00
Gyuho Lee
fa82d11a95
Merge pull request #12725 from ptabor/20210225-release-scripts-fix-2
Improve release scripts: Lessons learned from 3.5.0-alpha.0
2021-02-25 11:46:12 -08:00
Piotr Tabor
60a669762f Improve release scripts:
- Fix script that creates manifest-list based multi-arch-images.
    The images need to be pushed first.
  - Use docker instead of gcloud docker helper
  - Make sure docker pushes are properly 'dry run'
  - Added preparation instruction to the release script.
2021-02-25 12:28:52 +00:00
Piotr Tabor
ae36379800
Merge pull request #12722 from ptabor/20210225-reporting-bugs
Github: Shorten the reporting-bugs link.
2021-02-25 09:06:04 +01:00
Piotr Tabor
4af3bb3b01 Github: Shorten the reporting-bugs link. 2021-02-25 09:00:00 +01:00
Piotr Tabor
926663f8d8
Merge pull request #12720 from FogDong/master
Docs: fix the report bug link in issue template
2021-02-25 08:53:49 +01:00
Fog Dong
b0949cb49f Docs: fix the report bug link in issue template 2021-02-25 10:58:11 +08:00
Piotr Tabor
60d5159091 version: bump up to 3.5.0-alpha.0 v3.5.0-alpha.0 tests/v3.5.0-alpha.0 etcdctl/v3.5.0-alpha.0 server/v3.5.0-alpha.0 client/v3.5.0-alpha.0 client/v2.305.0-alpha.0 raft/v3.5.0-alpha.0 pkg/v3.5.0-alpha.0 api/v3.5.0-alpha.0 2021-02-24 19:55:45 +00:00
Piotr Tabor
d4d2b80608
Merge pull request #12719 from ptabor/20210224-release-scripts-fix
Release scripts: Minor fixes discovered during attempt for release 3.5.0-alpha.0
2021-02-24 20:19:19 +01:00
Piotr Tabor
c640957e2d Release scripts: Minor fixes discovered during attempt for 3.5.0-alpha.0 2021-02-24 19:16:27 +00:00
Sahdev Zala
fe277f48ad
Merge pull request #12716 from iAziz786/dial-journal-typo
systemd: Fix typo in DialJournal documentation
2021-02-24 12:24:30 -05:00
Mohammad Aziz
252dcc9bdb
Fix typo in DialJournal 2021-02-24 22:23:27 +05:30