wpedrak
e9779231ec
server: add 500ms retries to ReadIndex requests for l-reads
...
It is second approach (with first being #12762 ) to solve #12680
2021-03-16 16:34:15 +01:00
wpedrak
4b21e38381
refactored l-read loop in v3_server.go
2021-03-16 11:03:45 +01:00
Piotr Tabor
1e7c1805d8
Unify logic of building raft-loggers for etcd.
...
1. We had the same code copied 3 times.
2. For no good reason the code was not reusing existing logger if this one is given.
2021-03-14 16:02:50 +01:00
Piotr Tabor
44bd22307e
Merge get_logger() & Logger() method.
2021-03-14 14:05:17 +01:00
Piotr Tabor
948e32ae15
Delete etcd_debug metrics scheduled for deletion in 3.5.
2021-03-12 16:30:47 +01:00
Piotr Tabor
54189f2f60
Enable --pre-vote=true by default in 3.5.
2021-03-12 16:23:23 +01:00
Gyuho Lee
4eba403ccc
Merge pull request #12765 from ptabor/20210312-move-config
...
Move config (ServerConfig) out of etcdserver package.
2021-03-11 15:05:29 -08:00
Piotr Tabor
fd7fed1511
Move config (ServerConfig) out of etcdserver package.
...
Motivation:
- ServerConfig is part of 'embed' public API, while etcdserver is more 'internal'
- EtcdServer is already too big and config is pretty wide-spread leaf
if we were to split etcdserver (e.g. into pre & post-apply part).
2021-03-11 20:56:22 +01:00
Piotr Tabor
b9226d03f4
Merge pull request #12763 from hexfusion/bump-proto
...
vendor: bump gogo/proto to v1.3.2
2021-03-11 17:59:07 +01:00
Sam Batschelet
d3aa3fb486
vendor: bump gogo/proto to v1.3.2
...
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-11 11:27:25 -05:00
Gyuho Lee
3ead91ca3e
Merge pull request #12739 from LeoYang90/optimization_watch_prevkv
...
create event do not need prevkv range
2021-03-10 09:48:42 -08:00
Piotr Tabor
fb1d48e98e
Integration tests: Use BeforeTest(t) instead of defer AfterTest().
...
Thanks to this change, a single method BeforeTest(t) can handle
before-test logic as well as registration of cleanup code
(t.Cleanup(func)).
2021-03-09 18:19:51 +01:00
Gyuho Lee
94a371acd7
Merge pull request #12750 from ptabor/20210306-mlock
...
--experimental-memory-mlock support
2021-03-09 09:13:40 -08:00
Gyuho Lee
6fd85af641
Merge pull request #12702 from hexfusion/add-so
...
*: add support for socket options
2021-03-09 09:02:24 -08:00
Sam Batschelet
5b49fb41c8
fixup: add ListenerOptions
...
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-08 11:27:03 -05:00
Piotr Tabor
a46a358577
--experimental-memory-mlock support
...
The flag protects etcd memory from being swapped out to disk.
This can happen in memory constrained systems where mmaped bbolt
area is natural condidate for swapping out.
This flag should provide better tail latency on the cost of higher RSS
ram usage. If the experiment is successful, the logic should get moved
into bbolt layer, where we can protect specific bbolt instances
(e.g. avoid protecting both during defragmentation).
2021-03-07 12:32:57 +01:00
Chris Wedgwood
b63d31e89b
etcdserver: when using --unsafe-no-fsync write data
...
There are situations where we don't wish to fsync but we do want to
write the data.
Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes. For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.
By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.
Anecdotally:
Because the fsync is missing there is the argument that certain
types of failure events will cause data corruption or loss, in
testing this wasn't seen. If this was to occur the expectation is
the member can be readded to a cluster or worst-case restored from a
robust persisted snapshot.
The etcd members are deployed across isolated racks with different
power feeds. An instantaneous failure of all of them simultaneously
is unlikely.
Testing was usually of the form:
* create (Kubernetes) etcd write-churn by creating replicasets of
some 1000s of pods
* break/fail the leader
Failure testing included:
* hard node power-off events
* disk removal
* orderly reboots/shutdown
In all cases when the node recovered it was able to rejoin the
cluster and synchronize.
2021-03-05 10:58:04 -08:00
Piotr Tabor
f7a2389992
Update version of certifi/gocertifi to get rid of WTF Public license
...
Seems old versions of https://github.com/certifi/gocertifi where
categorized as "Do What The F*ck You Want To Public License".
Update to newer version that is explicit `Mozilla Public License` 2.0 (MPL 2.0).
2021-03-04 09:48:34 +01:00
leoyang.yl
d70f35f8d1
create event do not need prevkv range
2021-03-02 17:43:24 +08:00
Ben Meier
3d44f5bf80
*: added client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
...
In some environments, the CA is not able to sign certificates with both
'client auth' and 'server auth' extended usage parameters and so an operator
needs to be able to set a seperate client certificate to use when making
requests which is different to the certificate used for accepting requests.
This applies to both proxy and etcd member mode and is available as both a CLI
flag and config file field for peer TLS.
Signed-off-by: Ben Meier <ben.meier@oracle.com>
2021-02-28 14:37:56 +00:00
Piotr Tabor
a7f340216d
Reformat code according to 'gotip' rules.
...
In practices adds annotations in the new syntax:
```
+//go:build !linux
// +build !linux
```
Fixes failing gotip PASSES='fmt' check:
https://travis-ci.com/github/etcd-io/etcd/jobs/486453806
2021-02-26 10:14:46 +01:00
Piotr Tabor
45b1e6b470
ClientV3: Ordering: Fix the ordering test such it does not fail.
...
The test depended on very subtle timing semantic and on properties of
'copied' clients.
https://travis-ci.com/github/etcd-io/etcd/jobs/486191449
Examplar failure:
```
{"level":"warn","ts":"2021-02-25T12:34:47.894Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6fc0/#initially=[unix://localhost:86269902489114839060]","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
{"level":"warn","ts":"2021-02-25T12:34:48.163Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00035a000/#initially=[unix://localhost:78285857058450835940]","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: not leader"}
{"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2021-02-25T12:34:50.255Z","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2021-02-25T12:34:51.717Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:52.017Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":"2021-02-25T12:34:52.018Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"warn","ts":"2021-02-25T12:34:53.018Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
--- FAIL: TestEndpointSwitchResolvesViolation (10.12s)
ordering_util_test.go:81: failed to resolve order violation etcdclient: no cluster members have a revision higher than the previously received revision
```
2021-02-25 22:15:13 +01:00
Piotr Tabor
60d5159091
version: bump up to 3.5.0-alpha.0
2021-02-24 19:55:45 +00:00
Piotr Tabor
1a9c81abda
Update grpc dependency to 1.32.
...
Simplify grpc testing infrastructure to align with upstream changes.
2021-02-23 11:31:50 +01:00
Piotr Tabor
57dcb037c0
Merge pull request #12706 from ptabor/20210218
...
clientv3: PS: Replace balancer with upstream grpc solution
2021-02-23 10:56:20 +01:00
Piotr Tabor
4a1c24556c
clientv3: PS: Replace balancer with upstream grpc solution
...
Addresses comments from: https://github.com/etcd-io/etcd/pull/12671#pullrequestreview-593942302
2021-02-23 10:03:15 +01:00
yangweiwei
7ef6ebd5eb
mvcc: optimize watch logic of watchableStore
...
Optimize watchableStore.watch func
2021-02-22 14:10:25 +08:00
Gyuho Lee
3d7aac948b
Merge pull request #12196 from ironcladlou/metrics-watch-error-fix
...
etcdserver: fix incorrect metrics generated when clients cancel watches
2021-02-19 12:46:49 -08:00
Sam Batschelet
49078c683b
*: add support for socket options
...
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-02-19 13:31:23 -05:00
Piotr Tabor
b67ed4e4aa
Merge pull request #12671 from ptabor/20210207-grpc-del-balancer
...
clientv3: Replace balancer with upstream grpc solution
2021-02-17 22:45:13 +01:00
Piotr Tabor
47b2506fcb
Merge pull request #12670 from postgrespro/customizable_raft_connection_timeouts
...
raft: makes 'ConnReadTimeout/ConnWriteTimeout' customizable
2021-02-16 09:07:41 +01:00
Piotr Tabor
77e6df28cf
Merge pull request #12675 from ptabor/20210209-grpc-remove-legacy-resolver
...
Cleanup grpc client/v3/naming API
2021-02-11 09:31:48 +01:00
Maksim Buldukyan
7e38cfcc8d
raft: makes 'ConnReadTimeout/ConnWriteTimeout' customizable
2021-02-10 10:36:50 +07:00
Piotr Tabor
a836a8045b
Get rid of legacy client/v3/naming API.
...
Update grpcproxy to use the new abstractions.
2021-02-09 11:56:28 +01:00
Chao Chen
2ae3e82f07
etcdserver/api/etcdhttp: log successful etcd server side health check in debug level
...
When we have an external component that checks /health periodically, the
etcd server logs can be quite verbose (e.g., DDOS-ing against insure
etcd health check can lead to disk space full due to large log files).
This change was introduced in #11704 .
While we keep the warning logs for etcd health check failures, the
success (or OK) log level should be set to DEBUG.
Fixes #12676
2021-02-08 17:15:43 -08:00
Piotr Tabor
0b75fede64
Replace client/v3/balancer with standard components: resolver + round_robin LB
...
This commit significantly reduces volume of custom code
in etcd client v3, while preserving full existing functionality.
2021-02-08 18:50:31 +01:00
Brad Davidson
603d975599
Fix cluster peer HTTP SRV discovery
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-02-03 03:08:13 -08:00
Yanhao Mo
6d82778a4e
etcdserver: export method EtcdServer.leaderChangedNotify ( #12378 )
2021-02-02 18:13:32 +08:00
Piotr Tabor
d6d03beaea
Merge pull request #12538 from lzhfromustc/12_9_GoroutineLeak
...
test: change channel operations to avoid potential goroutine leaks
2021-02-01 21:16:43 +01:00
Piotr Tabor
958f6f9878
Merge pull request #12481 from kalexmills/fix-defer-log
...
fix: pass argument url in defer to avoid loopclosure
2021-01-31 23:20:32 +01:00
Piotr Tabor
1a890a4659
Merge branch 'master' into update
2021-01-16 09:35:05 +01:00
Piotr Tabor
58f78df1de
Raft: Expand raft documentation, in particular point on godocs.
2021-01-15 12:34:02 +01:00
Sahdev Zala
69e99e80fa
Merge pull request #12465 from spacewander/fdoc
...
chore: update the documentation link in the comment
2021-01-14 00:39:25 -05:00
Jingyi Hu
bfc6e2ff30
Merge pull request #12611 from ptabor/20210111-fix-flakes
...
e2e tests flakes & leaks fixes: In particular TestIssue6361
2021-01-12 21:26:54 +08:00
Piotr Tabor
0d9cfc11c8
Fix usage of reflect.SliceHeader: reported by vet on tip golang
...
Example:
https://travis-ci.com/github/etcd-io/etcd/jobs/470404938
```
% (cd server && go vet ./...)
stderr: # go.etcd.io/etcd/server/v3/etcdserver/api/v2store
stderr: etcdserver/api/v2store/node_extern_test.go:107:9: possible misuse of reflect.SliceHeader
stderr: etcdserver/api/v2store/node_extern_test.go:107:16: possible misuse of reflect.SliceHeader
```
2021-01-12 00:14:51 +01:00
Piotr Tabor
74274f4417
e2e: Adding better diagnostic and location for temporary files to Snapshot tests.
2021-01-12 00:14:51 +01:00
Piotr Tabor
8ccd4e1146
Fix flaky tests reported due to data race on grpc logging registration.
...
Example:
```
==================
WARNING: DATA RACE
Write at 0x000002178320 by goroutine 575:
google.golang.org/grpc/grpclog.SetLoggerV2()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/grpclog/loggerv2.go:70 +0x444
go.etcd.io/etcd/server/v3/embed.(*Config).setupLogging.func1.1()
/home/ptab/corp/etcd/server/embed/config_logging.go:119 +0x345
sync.(*Once).doSlow()
/usr/lib/google-golang/src/sync/once.go:66 +0x109
sync.(*Once).Do()
/usr/lib/google-golang/src/sync/once.go:57 +0x68
go.etcd.io/etcd/server/v3/embed.(*Config).setupLogging.func1()
/home/ptab/corp/etcd/server/embed/config_logging.go:109 +0x3b1
go.etcd.io/etcd/server/v3/embed.(*Config).setupLogging()
/home/ptab/corp/etcd/server/embed/config_logging.go:174 +0x6af
go.etcd.io/etcd/server/v3/embed.(*Config).Validate()
/home/ptab/corp/etcd/server/embed/config.go:553 +0x55
go.etcd.io/etcd/server/v3/embed.StartEtcd()
/home/ptab/corp/etcd/server/embed/etcd.go:93 +0x84
go.etcd.io/etcd/tests/v3/integration.TestKVWithEmptyValue()
/home/ptab/corp/etcd/tests/integration/v3_kv_test.go:33 +0x18c
testing.tRunner()
/usr/lib/google-golang/src/testing/testing.go:1123 +0x202
Previous read at 0x000002178320 by goroutine 956:
[failed to restore the stack]
Goroutine 575 (running) created at:
testing.(*T).Run()
/usr/lib/google-golang/src/testing/testing.go:1168 +0x5bb
testing.runTests.func1()
/usr/lib/google-golang/src/testing/testing.go:1441 +0xa6
testing.tRunner()
/usr/lib/google-golang/src/testing/testing.go:1123 +0x202
testing.runTests()
/usr/lib/google-golang/src/testing/testing.go:1439 +0x612
testing.(*M).Run()
/usr/lib/google-golang/src/testing/testing.go:1347 +0x3c4
go.etcd.io/etcd/pkg/v3/testutil.MustTestMainWithLeakDetection()
/home/ptab/corp/etcd/pkg/testutil/leak.go:150 +0x38
go.etcd.io/etcd/tests/v3/integration.TestMain()
/home/ptab/corp/etcd/tests/integration/main_test.go:14 +0x272
main.main()
_testmain.go:349 +0x269
Goroutine 956 (finished) created at:
google.golang.org/grpc/internal/transport.newHTTP2Server()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:288 +0x18a4
google.golang.org/grpc/internal/transport.NewServerTransport()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/transport.go:534 +0x2f5
google.golang.org/grpc.(*Server).newHTTP2Transport()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:726 +0x2ca
google.golang.org/grpc.(*Server).handleRawConn()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:693 +0x60f
google.golang.org/grpc.(*Server).Serve.func3()
/home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:663 +0x4c
==================
...
{"level":"info","ts":"2021-01-09T22:21:04.550+0100","caller":"embed/etcd.go:330","msg":"closed etcd server","name":"default","data-dir":"/tmp/etcd-017337431","advertise-peer-urls":["http://localhost:2380 "],"advertise-client-urls":["http://localhost:2379 "]}
--- FAIL: TestKVWithEmptyValue (1.08s)
v3_kv_test.go:62: my-namespace/foobar = data
v3_kv_test.go:62: my-namespace/foobar1 = data
v3_kv_test.go:62: namespace/foobar1 = data
v3_kv_test.go:72: foobar = data
v3_kv_test.go:72: foobar1 = data
v3_kv_test.go:87: delete keys:2
testing.go:1038: race detected during execution of test
```
2021-01-11 10:06:31 +01:00
Dan Lorenc
5b90402082
Switch from dgrijalva/jwt-go to form3tech-oss/jwt-go.
...
dgrijalva/jwt-go has been abandoned and contains several serious
security issues. Most projects are now switching to the form3tech fork.
See https://snyk.io/vuln/SNYK-GOLANG-GITHUBCOMDGRIJALVAJWTGO-596515 for
info on the issues.
Signed-off-by: Dan Lorenc <dlorenc@google.com>
2021-01-10 08:04:20 -06:00
Piotr Tabor
23340bb62a
Refresh proto generation script after moving modules files.
...
With modulatiozation server protos get moved into ./server directory,
but it was not reflected in scripts/genproto.sh.
2021-01-08 16:33:12 +01:00
lzhfromustc
f2a912a4e6
test: change channel operations to avoid potential goroutine leaks
...
In these unit tests, goroutines may leak if certain branches are chosen. This commit edits channel operations and buffer sizes, so no matter what branch is chosen, the test will end correctly. This commit doesn't change the semantics of unit tests.
2020-12-09 22:23:21 -05:00