16735 Commits

Author SHA1 Message Date
Piotr Tabor
783e26fcdf Fix gogo to 1.3.2 in api/go.mod. 2021-03-11 19:34:34 +01:00
Gyuho Lee
3ead91ca3e
Merge pull request #12739 from LeoYang90/optimization_watch_prevkv
create event do not need prevkv range
2021-03-10 09:48:42 -08:00
Gyuho Lee
633e9273db
Merge pull request #12759 from wpedrak/staticcheck_partial_fixes
*: partial staticcheck fix
2021-03-10 09:47:52 -08:00
wpedrak
2c2456bf3d *: partial staticcheck fix 2021-03-10 14:13:38 +00:00
Gyuho Lee
cb0e58942f
Merge pull request #12753 from ptabor/20210306-integration-zap
Integration tests: Multiple improvements
2021-03-09 19:52:48 -08:00
Gyuho Lee
f46c924f10
Merge pull request #12756 from hexfusion/bump-cl-03-09-3.5
CHANGELOG: add socket option flags #12702
2021-03-09 19:52:05 -08:00
Sam Batschelet
7ac1367783 CHANGELOG: add socket option flags #12702
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-09 12:47:10 -05:00
Piotr Tabor
c8243a9927 Tests: Functional - in case of failure, log the exception. 2021-03-09 18:19:52 +01:00
Piotr Tabor
b6c2e87a74 Testing: Integration tests does not check whether t==nil 2021-03-09 18:19:52 +01:00
Piotr Tabor
5ddabfdb24 tests: Make tests operate in /tmp director instead of src.
Thanks to this, unix sockets should be not longer
created by integration tests in the the source code directory,
so potentially trigger IDE reloads and unnecessery load (and mess).
2021-03-09 18:19:52 +01:00
Piotr Tabor
bfe02c0526 tests: Cluster creation that failed shouldn't leak goroutines. 2021-03-09 18:19:52 +01:00
Piotr Tabor
9ba1287334 travis script. Turning off verbose for grpcproxy.
+ Upgrade to '*.sh' variant of the scripts.
2021-03-09 18:19:51 +01:00
Piotr Tabor
41f6cc7234 Tests: Better isolation between store_v2v3 integration tests. 2021-03-09 18:19:51 +01:00
Piotr Tabor
fb1d48e98e Integration tests: Use BeforeTest(t) instead of defer AfterTest().
Thanks to this change, a single method BeforeTest(t) can handle
before-test logic as well as registration of cleanup code
(t.Cleanup(func)).
2021-03-09 18:19:51 +01:00
Piotr Tabor
87258efd90 Integration tests: Use zaptest.Logger based testing.TB
Thanks to this the logs:
  - are automatically printed if the test fails.
  - are in pretty consistent format.
  - are annotated by 'member' information of the cluster emitting them.

Side changes:
  - Set propert default got DefaultWarningApplyDuration (used to be '0')
  - Name the members based on their 'place' on the list (as opposed to
'random')
2021-03-09 18:19:51 +01:00
Piotr Tabor
efb584cc9b leak.go: Make the per-test AfterTest strictly wait for none of the unwanted rountines. 2021-03-09 18:19:51 +01:00
Gyuho Lee
94a371acd7
Merge pull request #12750 from ptabor/20210306-mlock
--experimental-memory-mlock support
2021-03-09 09:13:40 -08:00
Gyuho Lee
6fd85af641
Merge pull request #12702 from hexfusion/add-so
*: add support for socket options
2021-03-09 09:02:24 -08:00
Sam Batschelet
5b49fb41c8 fixup: add ListenerOptions
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-08 11:27:03 -05:00
Piotr Tabor
a46a358577 --experimental-memory-mlock support
The flag protects etcd memory from being swapped out to disk.
This can happen in memory constrained systems where mmaped bbolt
area is natural condidate for swapping out.

This flag should provide better tail latency on the cost of higher RSS
ram usage. If the experiment is successful, the logic should get moved
into bbolt layer, where we can protect specific bbolt instances
(e.g. avoid protecting both during defragmentation).
2021-03-07 12:32:57 +01:00
Piotr Tabor
792b7f57d3
Merge pull request #12747 from wilsonwang371/master
pkg/wait: change list from single element to an array.
2021-03-07 11:53:30 +01:00
Piotr Tabor
7556b9a011
Merge pull request #12752 from cwedgwood/master-nofsyncdowrite
[RFC (against master branch)] etcdserver: when using --unsafe-no-fsync write data
2021-03-07 11:51:57 +01:00
Chris Wedgwood
b63d31e89b etcdserver: when using --unsafe-no-fsync write data
There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
2021-03-05 10:58:04 -08:00
Piotr Tabor
f4001630d9
Merge pull request #12748 from ptabor/20210305-deflake
Test flakes: 1 fix + 1 diagnostic
2021-03-04 23:38:16 +01:00
Piotr Tabor
66cef61444 Detect leaked go-routines bases on pre-normalization syntax. 2021-03-04 22:28:44 +01:00
Piotr Tabor
339f8fa4bd test.sh: Run integration tests with -v and shorter deadline.
The purpose of this change is to learn more about flake cases like:
  https://travis-ci.com/github/etcd-io/etcd/jobs/488324449

```
% (cd tests && 'env' 'go' 'test' '-timeout=30m' '--race=false' '--cpu=2' './integration/...')
stderr: go: downloading github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e
ok  	go.etcd.io/etcd/tests/v3/integration	197.295s
ok  	go.etcd.io/etcd/tests/v3/integration/client	0.089s
ok  	go.etcd.io/etcd/tests/v3/integration/client/examples	0.038s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3	70.365s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/concurrency	3.169s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/connectivity	100.535s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/examples	1.341s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes	3.277s

No output has been received in the last 10m0s,
```
2021-03-04 22:23:05 +01:00
Wilson Wang
432fde88a9 pkg/wait: change list from single element to an array.
We found wait lock contention when a large amount of write operations. Converting wait from single element to an array helps to improve the performance.

Fixes #12731

Signed-off-by: Wilson Wang <wilsonny371@gmail.com>
2021-03-04 12:16:41 -08:00
Piotr Tabor
aefbd226b8
Merge pull request #12745 from ptabor/20210304-fix-wtfpl
Update version of certifi/gocertifi to get rid of WTF Public license
2021-03-04 12:47:06 +01:00
Piotr Tabor
ba1eebe2ea
Merge pull request #12744 from emilyselwood/patch-1
Metrics example 404s - fix url
2021-03-04 09:59:56 +01:00
Piotr Tabor
f7a2389992 Update version of certifi/gocertifi to get rid of WTF Public license
Seems old versions of https://github.com/certifi/gocertifi where
categorized as "Do What The F*ck You Want To Public License".

Update to newer version that is explicit `Mozilla Public License` 2.0 (MPL 2.0).
2021-03-04 09:48:34 +01:00
Emily Selwood
9ebbf5f38b
Metrics example 404s - fix url
The metrics example link points to a file that appears to have moved. This change points it to what I think is the right place.
2021-03-04 08:17:48 +00:00
Piotr Tabor
61fef348f8
Merge pull request #12742 from wilsonwang371/master
debugutil: Remove extra space in trace handler route
2021-03-04 08:55:44 +01:00
Wilson Wang
7fc447fb5c debugutil: Remove extra space in trace handler route
debugutil: Remove extra space in trace handler route. To use trace, user needed to escape the extra space and the extra space needs to be removed.
2021-03-03 16:01:04 -08:00
Sam Batschelet
f02525c75d
Merge pull request #12741 from hexfusion/bump-cl-03-03
CHANGELOG: update to include experimental-apply-warning-duration
2021-03-03 13:38:22 -05:00
Sam Batschelet
e9947bc018 CHANGELOG: update to include experimental-apply-warning-duration
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-03 12:08:58 -05:00
leoyang.yl
d70f35f8d1 create event do not need prevkv range 2021-03-02 17:43:24 +08:00
Piotr Tabor
102c198444
Merge pull request #12705 from astromechza/bm_etcd_peer_server_cert
etcdmain: added peer-client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
2021-03-02 09:03:35 +01:00
Gyuho Lee
102096ade2
Merge pull request #12737 from davidlanouette/12718
client v2: check for empty request from the context
2021-03-01 23:57:58 -08:00
Gyuho Lee
6ace85b624
Merge pull request #12736 from ptabor/20210301-fix-flakes
tests: Fixes a few recently spotted test-flakes
2021-03-01 14:22:29 -08:00
David Lanouette
6998c5641c client v2: rename error var for revive
The revive tool complained durring the build.  Error variable has been
renamed.

Fixes #12718

Signed-off-by: David Lanouette <David.Lanouette@GMail.com>
2021-03-01 17:09:16 -05:00
David Lanouette
7d02ce2073 client v2: check for empty request from the context
If the simpleHTTPClient.Do is called and the context has a nil request, return an error early.

Fixes #12718

Signed-off-by: David Lanouette <David.Lanouette@GMail.com>
2021-03-01 16:46:30 -05:00
Piotr Tabor
5e10d12996 tests: Fixes a few recently spotted test-flakes
```
Unexpected goroutines running after all test(s).
1 instances of:
syscall.Syscall(...)
	/usr/local/go/src/syscall/asm_linux_386.s:19 +0x5
syscall.Close(...)
	/usr/local/go/src/syscall/zsyscall_linux_386.go:285 +0x3d
internal/poll.(*FD).destroy(...)
	/usr/local/go/src/internal/poll/fd_unix.go:77 +0x30
internal/poll.(*FD).decref(...)
	/usr/local/go/src/internal/poll/fd_mutex.go:213 +0x38
internal/poll.(*FD).Close(...)
	/usr/local/go/src/internal/poll/fd_unix.go:99 +0x43
net.(*netFD).Close(...)
	/usr/local/go/src/net/fd_posix.go:37 +0x49
FAIL	go.etcd.io/etcd/tests/v3/integration/client	0.039s
```

```
--- FAIL: TestServer_TCP_Secure_DelayTx (0.20s)
    server_test.go:110: took 128.026085ms with no latency
    server_test.go:125: took 62.980988ms with latency 50ms��5ms
    server_test.go:133: expected took1 128.026085ms < took2 62.980988ms (with latency)
```

https://github.com/etcd-io/etcd/issues/12372
2021-03-01 18:07:38 +01:00
Ben Meier
3d44f5bf80
*: added client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
In some environments, the CA is not able to sign certificates with both
'client auth' and 'server auth' extended usage parameters and so an operator
needs to be able to set a seperate client certificate to use when making
requests which is different to the certificate used for accepting requests.
This applies to both proxy and etcd member mode and is available as both a CLI
 flag and config file field for peer TLS.

Signed-off-by: Ben Meier <ben.meier@oracle.com>
2021-02-28 14:37:56 +00:00
Gyuho Lee
d06d93d5b1 CHANGELOG: add go 1.16 for etcd 3.5, add release links
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2021-02-26 23:10:25 +00:00
Gyuho Lee
d4d303d908 CHANGELOG: add v3.4.15
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2021-02-26 23:04:36 +00:00
Gyuho Lee
cb040801f9
Merge pull request #12730 from ptabor/20200208-client-no-logger
clientv3: Cleaning up dead logger code
2021-02-26 10:44:42 -08:00
Piotr Tabor
a7f340216d Reformat code according to 'gotip' rules.
In practices adds annotations in the new syntax:
```
+//go:build !linux
 // +build !linux
```

Fixes failing gotip PASSES='fmt' check:
https://travis-ci.com/github/etcd-io/etcd/jobs/486453806
2021-02-26 10:14:46 +01:00
Piotr Tabor
54b87505a3 Remove dead legacy logger code. 2021-02-26 09:13:09 +01:00
Piotr Tabor
a769916ea2
Merge pull request #12729 from ptabor/20210225-raftexample
raftExample: Allow closing raftexample node when snapshotting.
2021-02-26 08:58:30 +01:00
Piotr Tabor
3976d68ed3 raftExample: Allow closing raftexample node when snapshotting.
Fix race that made the raftExample test fail.
2021-02-26 08:56:12 +01:00