16759 Commits

Author SHA1 Message Date
wpedrak
dac6e37ea1 *: over 20 staticcheck fixes 2021-03-18 15:06:17 +01:00
Piotr Tabor
2932969b91
Merge pull request #12781 from ptabor/20210315-flaki-balancer
Integration tests: Use testing.T logger through zap for grpc
2021-03-18 08:30:54 +01:00
Piotr Tabor
63c51709b4
Merge pull request #12784 from ptabor/20210317-readme-go-version
README: Update required go version.
2021-03-17 15:41:21 +01:00
Piotr Tabor
7d53a06957 README: Update required go version. 2021-03-17 15:37:07 +01:00
Piotr Tabor
725a8c5e02 Enable configuring delegated zap-logging for embed server. 2021-03-17 08:17:36 +01:00
Piotr Tabor
a84bd093b0 Integration with grpc-settable logger. 2021-03-16 22:50:41 +01:00
Gyuho Lee
e599f4a482
Merge pull request #12775 from ptabor/20210314-zip
etcd-raft-zap logger fixes.
2021-03-14 11:37:11 -07:00
Piotr Tabor
1e7c1805d8 Unify logic of building raft-loggers for etcd.
1. We had the same code copied 3 times.
2. For no good reason the code was not reusing existing logger if this one is given.
2021-03-14 16:02:50 +01:00
Piotr Tabor
44bd22307e Merge get_logger() & Logger() method. 2021-03-14 14:05:17 +01:00
Piotr Tabor
527c765ece
Merge pull request #12773 from ptabor/20210310-test-fixes
Minor test fixes
2021-03-14 13:36:19 +01:00
Piotr Tabor
de67806175 mend 2021-03-14 13:35:47 +01:00
Piotr Tabor
67491a00ea e2e/expect: In case of sut process failure, print last 40lines of logs. 2021-03-13 23:41:29 +01:00
Piotr Tabor
a47c18d30a Fix 2 remaining 'defer AfterTest' calls. 2021-03-13 23:41:29 +01:00
Piotr Tabor
0c1e6d05e7
Merge pull request #12772 from ptabor/20210312-3.5-todos
Fix/remove broken: TestMetricDbSizeDefragDebugging
2021-03-12 23:11:25 +01:00
Piotr Tabor
b406647dd7 Fix/remove broken: TestMetricDbSizeDefragDebugging 2021-03-12 23:05:53 +01:00
Gyuho Lee
efce58d1ec
Merge pull request #12770 from ptabor/20210312-3.5-todos
TODO's 3.5: Decommission metrics, PreVote=true.
2021-03-12 08:51:21 -08:00
Piotr Tabor
948e32ae15 Delete etcd_debug metrics scheduled for deletion in 3.5. 2021-03-12 16:30:47 +01:00
Piotr Tabor
54189f2f60 Enable --pre-vote=true by default in 3.5. 2021-03-12 16:23:23 +01:00
Gyuho Lee
4eba403ccc
Merge pull request #12765 from ptabor/20210312-move-config
Move config (ServerConfig) out of etcdserver package.
2021-03-11 15:05:29 -08:00
Piotr Tabor
fd7fed1511 Move config (ServerConfig) out of etcdserver package.
Motivation:
  - ServerConfig is part of 'embed' public API, while etcdserver is more 'internal'
  - EtcdServer is already too big and config is pretty wide-spread leaf
if we were to split etcdserver (e.g. into pre & post-apply part).
2021-03-11 20:56:22 +01:00
Piotr Tabor
6dcd0de075
Merge pull request #12764 from ptabor/20210311-update-gogo
Fix gogo to 1.3.2 in api/go.mod.
2021-03-11 20:21:29 +01:00
Piotr Tabor
783e26fcdf Fix gogo to 1.3.2 in api/go.mod. 2021-03-11 19:34:34 +01:00
Piotr Tabor
b9226d03f4
Merge pull request #12763 from hexfusion/bump-proto
vendor: bump gogo/proto to v1.3.2
2021-03-11 17:59:07 +01:00
Sam Batschelet
8ff0ff836a *: regen proto
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-11 11:27:52 -05:00
Sam Batschelet
d3aa3fb486 vendor: bump gogo/proto to v1.3.2
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-11 11:27:25 -05:00
Gyuho Lee
3ead91ca3e
Merge pull request #12739 from LeoYang90/optimization_watch_prevkv
create event do not need prevkv range
2021-03-10 09:48:42 -08:00
Gyuho Lee
633e9273db
Merge pull request #12759 from wpedrak/staticcheck_partial_fixes
*: partial staticcheck fix
2021-03-10 09:47:52 -08:00
wpedrak
2c2456bf3d *: partial staticcheck fix 2021-03-10 14:13:38 +00:00
Gyuho Lee
cb0e58942f
Merge pull request #12753 from ptabor/20210306-integration-zap
Integration tests: Multiple improvements
2021-03-09 19:52:48 -08:00
Gyuho Lee
f46c924f10
Merge pull request #12756 from hexfusion/bump-cl-03-09-3.5
CHANGELOG: add socket option flags #12702
2021-03-09 19:52:05 -08:00
Sam Batschelet
7ac1367783 CHANGELOG: add socket option flags #12702
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-09 12:47:10 -05:00
Piotr Tabor
c8243a9927 Tests: Functional - in case of failure, log the exception. 2021-03-09 18:19:52 +01:00
Piotr Tabor
b6c2e87a74 Testing: Integration tests does not check whether t==nil 2021-03-09 18:19:52 +01:00
Piotr Tabor
5ddabfdb24 tests: Make tests operate in /tmp director instead of src.
Thanks to this, unix sockets should be not longer
created by integration tests in the the source code directory,
so potentially trigger IDE reloads and unnecessery load (and mess).
2021-03-09 18:19:52 +01:00
Piotr Tabor
bfe02c0526 tests: Cluster creation that failed shouldn't leak goroutines. 2021-03-09 18:19:52 +01:00
Piotr Tabor
9ba1287334 travis script. Turning off verbose for grpcproxy.
+ Upgrade to '*.sh' variant of the scripts.
2021-03-09 18:19:51 +01:00
Piotr Tabor
41f6cc7234 Tests: Better isolation between store_v2v3 integration tests. 2021-03-09 18:19:51 +01:00
Piotr Tabor
fb1d48e98e Integration tests: Use BeforeTest(t) instead of defer AfterTest().
Thanks to this change, a single method BeforeTest(t) can handle
before-test logic as well as registration of cleanup code
(t.Cleanup(func)).
2021-03-09 18:19:51 +01:00
Piotr Tabor
87258efd90 Integration tests: Use zaptest.Logger based testing.TB
Thanks to this the logs:
  - are automatically printed if the test fails.
  - are in pretty consistent format.
  - are annotated by 'member' information of the cluster emitting them.

Side changes:
  - Set propert default got DefaultWarningApplyDuration (used to be '0')
  - Name the members based on their 'place' on the list (as opposed to
'random')
2021-03-09 18:19:51 +01:00
Piotr Tabor
efb584cc9b leak.go: Make the per-test AfterTest strictly wait for none of the unwanted rountines. 2021-03-09 18:19:51 +01:00
Gyuho Lee
94a371acd7
Merge pull request #12750 from ptabor/20210306-mlock
--experimental-memory-mlock support
2021-03-09 09:13:40 -08:00
Gyuho Lee
6fd85af641
Merge pull request #12702 from hexfusion/add-so
*: add support for socket options
2021-03-09 09:02:24 -08:00
Sam Batschelet
5b49fb41c8 fixup: add ListenerOptions
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-08 11:27:03 -05:00
Piotr Tabor
a46a358577 --experimental-memory-mlock support
The flag protects etcd memory from being swapped out to disk.
This can happen in memory constrained systems where mmaped bbolt
area is natural condidate for swapping out.

This flag should provide better tail latency on the cost of higher RSS
ram usage. If the experiment is successful, the logic should get moved
into bbolt layer, where we can protect specific bbolt instances
(e.g. avoid protecting both during defragmentation).
2021-03-07 12:32:57 +01:00
Piotr Tabor
792b7f57d3
Merge pull request #12747 from wilsonwang371/master
pkg/wait: change list from single element to an array.
2021-03-07 11:53:30 +01:00
Piotr Tabor
7556b9a011
Merge pull request #12752 from cwedgwood/master-nofsyncdowrite
[RFC (against master branch)] etcdserver: when using --unsafe-no-fsync write data
2021-03-07 11:51:57 +01:00
Chris Wedgwood
b63d31e89b etcdserver: when using --unsafe-no-fsync write data
There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
2021-03-05 10:58:04 -08:00
Piotr Tabor
f4001630d9
Merge pull request #12748 from ptabor/20210305-deflake
Test flakes: 1 fix + 1 diagnostic
2021-03-04 23:38:16 +01:00
Piotr Tabor
66cef61444 Detect leaked go-routines bases on pre-normalization syntax. 2021-03-04 22:28:44 +01:00
Piotr Tabor
339f8fa4bd test.sh: Run integration tests with -v and shorter deadline.
The purpose of this change is to learn more about flake cases like:
  https://travis-ci.com/github/etcd-io/etcd/jobs/488324449

```
% (cd tests && 'env' 'go' 'test' '-timeout=30m' '--race=false' '--cpu=2' './integration/...')
stderr: go: downloading github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e
ok  	go.etcd.io/etcd/tests/v3/integration	197.295s
ok  	go.etcd.io/etcd/tests/v3/integration/client	0.089s
ok  	go.etcd.io/etcd/tests/v3/integration/client/examples	0.038s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3	70.365s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/concurrency	3.169s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/connectivity	100.535s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/examples	1.341s
ok  	go.etcd.io/etcd/tests/v3/integration/clientv3/experimental/recipes	3.277s

No output has been received in the last 10m0s,
```
2021-03-04 22:23:05 +01:00