Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Chris Wedgwood	b63d31e89b	etcdserver: when using --unsafe-no-fsync write data There are situations where we don't wish to fsync but we do want to write the data. Typically this occurs in clusters where fsync latency (often the result of firmware) transiently spikes. For Kubernetes clusters this causes (many) elections which have knock-on effects such that the API server will transiently fail causing other components fail in turn. By writing the data (buffered and asynchronously flushed, so in most situations the write is fast) and avoiding the fsync we no longer trigger this situation and opportunistically write out the data. Anecdotally: Because the fsync is missing there is the argument that certain types of failure events will cause data corruption or loss, in testing this wasn't seen. If this was to occur the expectation is the member can be readded to a cluster or worst-case restored from a robust persisted snapshot. The etcd members are deployed across isolated racks with different power feeds. An instantaneous failure of all of them simultaneously is unlikely. Testing was usually of the form: * create (Kubernetes) etcd write-churn by creating replicasets of some 1000s of pods * break/fail the leader Failure testing included: * hard node power-off events * disk removal * orderly reboots/shutdown In all cases when the node recovered it was able to rejoin the cluster and synchronize.	2021-03-05 10:58:04 -08:00
Piotr Tabor	f7a2389992	Update version of certifi/gocertifi to get rid of WTF Public license Seems old versions of https://github.com/certifi/gocertifi where categorized as "Do What The F*ck You Want To Public License". Update to newer version that is explicit `Mozilla Public License` 2.0 (MPL 2.0).	2021-03-04 09:48:34 +01:00
Ben Meier	3d44f5bf80	*: added client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers In some environments, the CA is not able to sign certificates with both 'client auth' and 'server auth' extended usage parameters and so an operator needs to be able to set a seperate client certificate to use when making requests which is different to the certificate used for accepting requests. This applies to both proxy and etcd member mode and is available as both a CLI flag and config file field for peer TLS. Signed-off-by: Ben Meier <ben.meier@oracle.com>	2021-02-28 14:37:56 +00:00
Piotr Tabor	a7f340216d	Reformat code according to 'gotip' rules. In practices adds annotations in the new syntax: ``` +//go:build !linux // +build !linux ``` Fixes failing gotip PASSES='fmt' check: https://travis-ci.com/github/etcd-io/etcd/jobs/486453806	2021-02-26 10:14:46 +01:00
Piotr Tabor	45b1e6b470	ClientV3: Ordering: Fix the ordering test such it does not fail. The test depended on very subtle timing semantic and on properties of 'copied' clients. https://travis-ci.com/github/etcd-io/etcd/jobs/486191449 Examplar failure: ``` {"level":"warn","ts":"2021-02-25T12:34:47.894Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6fc0/#initially=[unix://localhost:86269902489114839060]","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"} {"level":"warn","ts":"2021-02-25T12:34:48.163Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00035a000/#initially=[unix://localhost:78285857058450835940]","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: not leader"} {"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"warn","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"} {"level":"info","ts":"2021-02-25T12:34:48.255Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"info","ts":"2021-02-25T12:34:50.255Z","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"} {"level":"info","ts":"2021-02-25T12:34:51.717Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"warn","ts":"2021-02-25T12:34:52.017Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Canceled desc = context canceled"} {"level":"info","ts":"2021-02-25T12:34:52.018Z","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"warn","ts":"2021-02-25T12:34:53.018Z","caller":"v3/maintenance.go:221","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} --- FAIL: TestEndpointSwitchResolvesViolation (10.12s) ordering_util_test.go:81: failed to resolve order violation etcdclient: no cluster members have a revision higher than the previously received revision ```	2021-02-25 22:15:13 +01:00
Piotr Tabor	60d5159091	version: bump up to 3.5.0-alpha.0	2021-02-24 19:55:45 +00:00
Piotr Tabor	1a9c81abda	Update grpc dependency to 1.32. Simplify grpc testing infrastructure to align with upstream changes.	2021-02-23 11:31:50 +01:00
Piotr Tabor	57dcb037c0	Merge pull request #12706 from ptabor/20210218 clientv3: PS: Replace balancer with upstream grpc solution	2021-02-23 10:56:20 +01:00
Piotr Tabor	4a1c24556c	clientv3: PS: Replace balancer with upstream grpc solution Addresses comments from: https://github.com/etcd-io/etcd/pull/12671#pullrequestreview-593942302	2021-02-23 10:03:15 +01:00
yangweiwei	7ef6ebd5eb	mvcc: optimize watch logic of watchableStore Optimize watchableStore.watch func	2021-02-22 14:10:25 +08:00
Gyuho Lee	3d7aac948b	Merge pull request #12196 from ironcladlou/metrics-watch-error-fix etcdserver: fix incorrect metrics generated when clients cancel watches	2021-02-19 12:46:49 -08:00
Piotr Tabor	b67ed4e4aa	Merge pull request #12671 from ptabor/20210207-grpc-del-balancer clientv3: Replace balancer with upstream grpc solution	2021-02-17 22:45:13 +01:00
Piotr Tabor	47b2506fcb	Merge pull request #12670 from postgrespro/customizable_raft_connection_timeouts raft: makes 'ConnReadTimeout/ConnWriteTimeout' customizable	2021-02-16 09:07:41 +01:00
Piotr Tabor	77e6df28cf	Merge pull request #12675 from ptabor/20210209-grpc-remove-legacy-resolver Cleanup grpc client/v3/naming API	2021-02-11 09:31:48 +01:00
Maksim Buldukyan	7e38cfcc8d	raft: makes 'ConnReadTimeout/ConnWriteTimeout' customizable	2021-02-10 10:36:50 +07:00
Piotr Tabor	a836a8045b	Get rid of legacy client/v3/naming API. Update grpcproxy to use the new abstractions.	2021-02-09 11:56:28 +01:00
Chao Chen	2ae3e82f07	etcdserver/api/etcdhttp: log successful etcd server side health check in debug level When we have an external component that checks /health periodically, the etcd server logs can be quite verbose (e.g., DDOS-ing against insure etcd health check can lead to disk space full due to large log files). This change was introduced in #11704. While we keep the warning logs for etcd health check failures, the success (or OK) log level should be set to DEBUG. Fixes #12676	2021-02-08 17:15:43 -08:00
Piotr Tabor	0b75fede64	Replace client/v3/balancer with standard components: resolver + round_robin LB This commit significantly reduces volume of custom code in etcd client v3, while preserving full existing functionality.	2021-02-08 18:50:31 +01:00
Brad Davidson	603d975599	Fix cluster peer HTTP SRV discovery Signed-off-by: Brad Davidson <brad.davidson@rancher.com>	2021-02-03 03:08:13 -08:00
Yanhao Mo	6d82778a4e	etcdserver: export method EtcdServer.leaderChangedNotify (#12378 )	2021-02-02 18:13:32 +08:00
Piotr Tabor	d6d03beaea	Merge pull request #12538 from lzhfromustc/12_9_GoroutineLeak test: change channel operations to avoid potential goroutine leaks	2021-02-01 21:16:43 +01:00
Piotr Tabor	958f6f9878	Merge pull request #12481 from kalexmills/fix-defer-log fix: pass argument url in defer to avoid loopclosure	2021-01-31 23:20:32 +01:00
Piotr Tabor	1a890a4659	Merge branch 'master' into update	2021-01-16 09:35:05 +01:00
Piotr Tabor	58f78df1de	Raft: Expand raft documentation, in particular point on godocs.	2021-01-15 12:34:02 +01:00
Sahdev Zala	69e99e80fa	Merge pull request #12465 from spacewander/fdoc chore: update the documentation link in the comment	2021-01-14 00:39:25 -05:00
Jingyi Hu	bfc6e2ff30	Merge pull request #12611 from ptabor/20210111-fix-flakes e2e tests flakes & leaks fixes: In particular TestIssue6361	2021-01-12 21:26:54 +08:00
Piotr Tabor	0d9cfc11c8	Fix usage of reflect.SliceHeader: reported by vet on tip golang Example: https://travis-ci.com/github/etcd-io/etcd/jobs/470404938 ``` % (cd server && go vet ./...) stderr: # go.etcd.io/etcd/server/v3/etcdserver/api/v2store stderr: etcdserver/api/v2store/node_extern_test.go:107:9: possible misuse of reflect.SliceHeader stderr: etcdserver/api/v2store/node_extern_test.go:107:16: possible misuse of reflect.SliceHeader ```	2021-01-12 00:14:51 +01:00
Piotr Tabor	74274f4417	e2e: Adding better diagnostic and location for temporary files to Snapshot tests.	2021-01-12 00:14:51 +01:00
Piotr Tabor	8ccd4e1146	Fix flaky tests reported due to data race on grpc logging registration. Example: ``` ================== WARNING: DATA RACE Write at 0x000002178320 by goroutine 575: google.golang.org/grpc/grpclog.SetLoggerV2() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/grpclog/loggerv2.go:70 +0x444 go.etcd.io/etcd/server/v3/embed.(Config).setupLogging.func1.1() /home/ptab/corp/etcd/server/embed/config_logging.go:119 +0x345 sync.(Once).doSlow() /usr/lib/google-golang/src/sync/once.go:66 +0x109 sync.(Once).Do() /usr/lib/google-golang/src/sync/once.go:57 +0x68 go.etcd.io/etcd/server/v3/embed.(Config).setupLogging.func1() /home/ptab/corp/etcd/server/embed/config_logging.go:109 +0x3b1 go.etcd.io/etcd/server/v3/embed.(Config).setupLogging() /home/ptab/corp/etcd/server/embed/config_logging.go:174 +0x6af go.etcd.io/etcd/server/v3/embed.(Config).Validate() /home/ptab/corp/etcd/server/embed/config.go:553 +0x55 go.etcd.io/etcd/server/v3/embed.StartEtcd() /home/ptab/corp/etcd/server/embed/etcd.go:93 +0x84 go.etcd.io/etcd/tests/v3/integration.TestKVWithEmptyValue() /home/ptab/corp/etcd/tests/integration/v3_kv_test.go:33 +0x18c testing.tRunner() /usr/lib/google-golang/src/testing/testing.go:1123 +0x202 Previous read at 0x000002178320 by goroutine 956: [failed to restore the stack] Goroutine 575 (running) created at: testing.(T).Run() /usr/lib/google-golang/src/testing/testing.go:1168 +0x5bb testing.runTests.func1() /usr/lib/google-golang/src/testing/testing.go:1441 +0xa6 testing.tRunner() /usr/lib/google-golang/src/testing/testing.go:1123 +0x202 testing.runTests() /usr/lib/google-golang/src/testing/testing.go:1439 +0x612 testing.(M).Run() /usr/lib/google-golang/src/testing/testing.go:1347 +0x3c4 go.etcd.io/etcd/pkg/v3/testutil.MustTestMainWithLeakDetection() /home/ptab/corp/etcd/pkg/testutil/leak.go:150 +0x38 go.etcd.io/etcd/tests/v3/integration.TestMain() /home/ptab/corp/etcd/tests/integration/main_test.go:14 +0x272 main.main() _testmain.go:349 +0x269 Goroutine 956 (finished) created at: google.golang.org/grpc/internal/transport.newHTTP2Server() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:288 +0x18a4 google.golang.org/grpc/internal/transport.NewServerTransport() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/transport.go:534 +0x2f5 google.golang.org/grpc.(Server).newHTTP2Transport() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:726 +0x2ca google.golang.org/grpc.(Server).handleRawConn() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:693 +0x60f google.golang.org/grpc.(*Server).Serve.func3() /home/ptab/private/golang/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:663 +0x4c ================== ... {"level":"info","ts":"2021-01-09T22:21:04.550+0100","caller":"embed/etcd.go:330","msg":"closed etcd server","name":"default","data-dir":"/tmp/etcd-017337431","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]} --- FAIL: TestKVWithEmptyValue (1.08s) v3_kv_test.go:62: my-namespace/foobar = data v3_kv_test.go:62: my-namespace/foobar1 = data v3_kv_test.go:62: namespace/foobar1 = data v3_kv_test.go:72: foobar = data v3_kv_test.go:72: foobar1 = data v3_kv_test.go:87: delete keys:2 testing.go:1038: race detected during execution of test ```	2021-01-11 10:06:31 +01:00
Dan Lorenc	5b90402082	Switch from dgrijalva/jwt-go to form3tech-oss/jwt-go. dgrijalva/jwt-go has been abandoned and contains several serious security issues. Most projects are now switching to the form3tech fork. See https://snyk.io/vuln/SNYK-GOLANG-GITHUBCOMDGRIJALVAJWTGO-596515 for info on the issues. Signed-off-by: Dan Lorenc <dlorenc@google.com>	2021-01-10 08:04:20 -06:00
Piotr Tabor	23340bb62a	Refresh proto generation script after moving modules files. With modulatiozation server protos get moved into ./server directory, but it was not reflected in scripts/genproto.sh.	2021-01-08 16:33:12 +01:00
lzhfromustc	f2a912a4e6	test: change channel operations to avoid potential goroutine leaks In these unit tests, goroutines may leak if certain branches are chosen. This commit edits channel operations and buffer sizes, so no matter what branch is chosen, the test will end correctly. This commit doesn't change the semantics of unit tests.	2020-12-09 22:23:21 -05:00
K. Alex Mills	3f6e0ec94b	fix: pass argument url in defer to avoid loopclosure Because of the well-known range loop closure issue, the value of u may have changed by the time the anonymous function mentioned in the defer is run. To address this, the simplest fix is to pass the url used in the loop as an argument to the function run in defer.	2020-11-19 15:29:26 -06:00
Dan Mace	9571325fe8	etcdserver: fix incorrect metrics generated when clients cancel watches Before this patch, a client which cancels the context for a watch results in the server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of a gRPC `Unavailable` metric in association with the client watch cancellation. The metric looks like this: grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} So, the watch server has misidentified the error as a server error and then propagates the mistake to metrics, leading to a false indicator that the leader has been lost. This false signal then leads to false alerting. The commit `9c103dd0de` introduced an interceptor which wraps watch streams requiring a leader, causing those streams to be actively canceled when leader loss is detected. However, the error handling code assumes all stream context cancellations are from the interceptor. This assumption is broken when the context was canceled because of a client stream cancelation. The core challenge is lack of information conveyed via `context.Context` which is shared by both the send and receive sides of the stream handling and is subject to cancellation by all paths (including the gRPC library itself). If any piece of the system cancels the shared context, there's no way for a context consumer to understand who cancelled the context or why. To solve the ambiguity of the stream interceptor code specifically, this patch introduces a custom context struct which the interceptor uses to expose a custom error through the context when the interceptor decides to actively cancel a stream. Now the consuming side can more safely assume a generic context cancellation can be propagated as a cancellation, and the server generated leader error is preserved and propagated normally without any special inference. When a client cancels the stream, there remains a race in the error handling code between the send and receive goroutines whereby the underlying gRPC error is lost in the case where the send path returns and is handled first, but this issue can be taken separately as no matter which paths wins, we can detect a generic cancellation. This is a replacement of https://github.com/etcd-io/etcd/pull/11375. Fixes #10289, #9725, #9576, #9166	2020-11-18 17:02:09 -05:00
Ankur Gargi	c1c681adc3	server: Added config parameter experimental-warning-apply-duration	2020-11-17 17:33:19 -05:00
Gyuho Lee	1b8d2b1a47	Merge pull request #12452 from ptabor/20201104-release-mod-scripts Release scripts for modules	2020-11-14 03:42:42 -08:00
Gyuho Lee	dc586a5ad2	Merge pull request #12459 from jingyih/proper_request_cancellation server: proper cancellation for range request	2020-11-13 12:20:41 -08:00
spacewander	67f040f921	Update other Documentation/v2 links	2020-11-11 09:57:01 +08:00
spacewander	f2eb15a81b	chore: update the documentation link in the comment Close #12462.	2020-11-11 09:53:18 +08:00
Maciej Borsz	0bea7df7c1	Add metric tracking apply method duration: * etcd_server_apply_duration_seconds It can be used to understand which operations are slow, in addition to the warning log message.	2020-11-06 11:11:16 +01:00
jingyih	0558e379c3	server: proper request cancellation for range	2020-11-05 21:30:02 -08:00
Piotr Tabor	eeafcef0d2	Use "v3.5.0-pre" to reference within-etcd modules instead of v3.0.0-000101010000000-00000000000, that might be misleading as we don't develop etcd v3.0.0 any longer. This version is a virtual version and is not supposed to be tagged within the repository. We should tag real versions like: 3.5.0-alpha.0. Please notice that go.etcd.io/etcd/client/v2 will be versioned as `v2.305.0-pre`. The reason is that client v2 must have v2 version. I propose a convention to envode the major version as 100x in minor version to make the association to the underlying repository clear, staying within v2 version family. The change was generated using: ``` DRY_RUN=false TARGET_VERSION="v3.5.0-pre" ./scripts/release_mod.sh update_versions ```	2020-11-04 18:28:43 +01:00
Piotr Tabor	6e800b9b01	20201103 no commit title check (#12447 ) * Turn off checking of format of commit message. * scripts/fix.sh: Fix fixing whitespaces in .sh scripts Aparently there is a difference between: find ./ -print0 -name .sh and find ./ -name .sh -print0 etcdserver unit tests: Do not call .Fatalf(...) from not test's goroutine. Fixes following test failures: https://travis-ci.com/github/etcd-io/etcd/jobs/425920416 ``` % (cd server && go vet ./...) stderr: # go.etcd.io/etcd/server/v3/etcdserver stderr: etcdserver/server_test.go:1002:4: call to (T).Fatalf from a non-test goroutine stderr: etcdserver/server_test.go:1166:4: call to (T).Fatalf from a non-test goroutine FAIL: (code:2): % (cd server && go vet ./...) FAIL: 'run go vet ./...' checking failed (!=0 return code) FAIL: 'govet' failed at Tue Nov 3 04:07:47 UTC 2020 ```	2020-11-03 07:59:42 -08:00
Jingyi Hu	f224fa4e42	Merge pull request #12425 from viviyww/cluster-set-version etcdserver: updated cluster version	2020-11-03 22:41:24 +08:00
tangcong	a960d6b1c7	*: add self-signed-cert-validity flag	2020-10-30 10:10:26 +08:00
yangweiwei	aa1024a16e	etcdserver: updated cluster version during cluster version update in etcd cluster, the log should info from XX to XX.	2020-10-27 16:32:40 +08:00
Piotr Tabor	aaf423e962	server: Update imports. find -name '*.go' \| xargs sed -i --follow-symlinks 's\|etcd/v3/\|etcd/server/v3/\|g'	2020-10-26 13:02:32 +01:00
Piotr Tabor	6c1efd6ba5	server: Update go.mod	2020-10-26 13:02:32 +01:00
Piotr Tabor	4a5e9d1261	server: Move server files to 'server' directory. 26 git mv mvcc wal auth etcdserver etcdmain proxy embed/ lease/ server 36 git mv go.mod go.sum server	2020-10-26 12:57:19 +01:00
Piotr Tabor	e62417297d	: Rename of imports of raft (as its now a module) % find -name '.go' -o -name '.md' -o -name '.sh' \| xargs sed -i --follow-symlinks 's\|etcd/v3/raft\|etcd/raft/v3\|g'	2020-10-16 13:58:18 +02:00

1 2 3 4 5 ...

436 Commits