Mirroristas/etcd

mirror of https://github.com/etcd-io/etcd.git synced 2024-09-27 06:25:44 +00:00

Author	SHA1	Message	Date
Piotr Tabor	067521981e	v2 etcdctl backup: producing consistent state of membership	2021-04-27 19:34:34 +02:00
Piotr Tabor	a70386a1a4	Simplify membership interface: Does not pass the 'unused' token.	2021-04-27 17:17:31 +02:00
Piotr Tabor	7ae3d25f91	Membership: Add additional methods to trim/manage membership data in backend.	2021-04-27 17:17:31 +02:00
Piotr Tabor	768da490ed	sever: v2store deprecation: Fix `etcdctl snapshot restore` to restore correct 'backend' (bbolt) context in aspect of membership. Prior to this change the 'restored' backend used to still contain: - old memberid (mvcc deletion used, why the membership is in bolt bucket, but not mvcc part): ``` mvs := mvcc.NewStore(s.lg, be, lessor, ci, mvcc.StoreConfig{CompactionBatchLimit: math.MaxInt32}) defer mvs.Close() txn := mvs.Write(traceutil.TODO()) btx := be.BatchTx() del := func(k, v []byte) error { txn.DeleteRange(k, nil) return nil } // delete stored members from old cluster since using new members btx.UnsafeForEach([]byte("members"), del) ``` - didn't get new members added.	2021-04-27 17:17:30 +02:00
Piotr Tabor	9a4b2bdccc	Errors: `context cancelled` or `context deadline exceeded` are exposed as codes.Canceled, codes.DeadlineExceeded instead of 'codes.Unknown'	2021-04-22 14:35:24 +02:00
Piotr Tabor	ea287dd9f8	Merge pull request #12854 from ptabor/20210410-shouldApplyV3 (no)StoreV2 (Part 3): Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store	2021-04-21 09:31:38 +02:00
Sam Batschelet	0f2c940f64	Merge pull request #12880 from chaochn47/exclude_alarms_from_health_check etcdhttp/metrics.go: exclude alarms from health check conditionally with `?exclude=NOSPACE`	2021-04-20 21:18:15 -04:00
Chao Chen	140ea4fa29	etcdhttp/metrics.go: exclude alarms from health check conditionally with ?exclude=NOSPACE	2021-04-20 13:17:09 -07:00
Piotr Tabor	17b982382e	Fix TestSnapshotV3RestoreMultiMemberAdd flakes (leaks) - most important: unix's socket transport should not keep idle connections. For top-level Transport we close them using: `f3c518025e/server/etcdserver/api/rafthttp/transport.go (L226)` but currently we don't have access to close them witing the nest (unix) transport. Short idle deadline is good enough. - Use dialContext (instead of dial) to make sure context is passed down the stack - Make sure Context is cancelled as soon as the operation is done in pipeline - nit: use dedicated method to yeld goroutines. Tested with: ``` d=$(date +"%Y%m%d_%H%M") (cd tests && go test --timeout=60m ./integration/snapshot -run TestSnapshotV3RestoreMultiMemberAdd -v --count=180 2>&1 \| tee log_${d}.log) ``` There were transports & cmux leaked: ``` leak.go:118: Test appears to have leaked a Transport: internal/poll.runtime_pollWait(0x7f6c5c3784c8, 0x72, 0xffffffffffffffff) /usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc003296298, 0x72, 0x0, 0x18, 0xffffffffffffffff) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x18, 0xc0009056e2, 0x203000) /usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc000010258, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/net/net.go:183 +0x91 github.com/soheilhy/cmux.(bufferedReader).Read(0xc0003d24e0, 0xc0031f60a8, 0x18, 0x18, 0xc0003d24d0, 0xc0009056e2, 0xc000278400) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/buffer.go:53 +0x12d github.com/soheilhy/cmux.hasHTTP2Preface(0x1367e20, 0xc0003d24e0, 0x7f6c5c699f40) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/matchers.go:195 +0x8a github.com/soheilhy/cmux.matchersToMatchWriters.func1(0x7f6c5c699f40, 0xc000010258, 0x1367e20, 0xc0003d24e0, 0xc000010258) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:128 +0x39 github.com/soheilhy/cmux.(cMux).serve(0xc003228690, 0x138c410, 0xc000010258, 0xc00327f740, 0xc0059ba860) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:192 +0x1e7 created by github.com/soheilhy/cmux.(cMux).Serve /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:179 +0x191 internal/poll.runtime_pollWait(0x7f6c5c60f3f0, 0x72, 0xffffffffffffffff) /usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000d53018, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5 net.(netFD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x3, 0x3, 0x1000000000001) /usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00031a570, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/net/net.go:183 +0x91 net/http.(persistConn).Read(0xc00093b320, 0xc000cfd000, 0x1000, 0x1000, 0x577750, 0x60, 0x0) /usr/lib/google-golang/src/net/http/transport.go:1933 +0x77 bufio.(Reader).fill(0xc005702fc0) /usr/lib/google-golang/src/bufio/bufio.go:101 +0x108 bufio.(Reader).Peek(0xc005702fc0, 0x1, 0xc00077c660, 0xc003b082a0, 0xc000d08de0, 0x5ae586, 0x11dd6c0) /usr/lib/google-golang/src/bufio/bufio.go:139 +0x4f net/http.(persistConn).readLoop(0xc00093b320) /usr/lib/google-golang/src/net/http/transport.go:2094 +0x1a8 created by net/http.(Transport).dialConn /usr/lib/google-golang/src/net/http/transport.go:1754 +0xdaa net/http.(persistConn).writeLoop(0xc00093b320) /usr/lib/google-golang/src/net/http/transport.go:2393 +0xf7 created by net/http.(Transport).dialConn /usr/lib/google-golang/src/net/http/transport.go:1755 +0xdcf sync.runtime_Semacquire(0xc0059ba868) /usr/lib/google-golang/src/runtime/sema.go:56 +0x45 sync.(WaitGroup).Wait(0xc0059ba860) /usr/lib/google-golang/src/sync/waitgroup.go:130 +0x65 github.com/soheilhy/cmux.(cMux).Serve.func1(0xc003228690, 0xc0059ba860) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:158 +0x56 github.com/soheilhy/cmux.(cMux).Serve(0xc003228690, 0x13698c0, 0xc00377a0f0) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:173 +0x115 go.etcd.io/etcd/server/v3/embed.(Etcd).servePeers.func1(0xc0007cc360, 0x122b75f) /home/ptab/corp/etcd/server/embed/etcd.go:518 +0x2b9 go.etcd.io/etcd/server/v3/embed.(Etcd).servePeers.func3(0xc00036d080, 0xc0059330a0) /home/ptab/corp/etcd/server/embed/etcd.go:549 +0x182 created by go.etcd.io/etcd/server/v3/embed.(Etcd).servePeers /home/ptab/corp/etcd/server/embed/etcd.go:543 +0x73a --- FAIL: TestSnapshotV3RestoreMultiMemberAdd (17.74s) ```	2021-04-16 20:17:28 +02:00
Piotr Tabor	d69e46ea47	Make ShouldApplyV3 an enum - not bool	2021-04-13 23:01:03 +02:00
Piotr Tabor	b1c04ce043	Applying consistency fix: ClusterVersionSet (and co) might get no applied on v2store ClusterVersionSet, ClusterMemberAttrSet, DowngradeInfoSet functions are writing both to V2store and backend. Prior this CL there were in a branch not executed if shouldApplyV3 was false, e.g. during restore when Backend is up-to-date (has high consistency-index) while v2store requires replay from WAL log. The most serious consequence of this bug was that v2store after restore could have different index (revision) than the same exact store before restore, so potentially different content between replicas. Also this change is supressing double-applying of Membership (ClusterConfig) changes on Backend (store v3) - that lackilly are not part of MVCC/KeyValue store, so they didn't caused Revisions to be bumped. Inspired by jingyih@ comment: https://github.com/etcd-io/etcd/pull/12820#issuecomment-815299406	2021-04-12 09:43:48 +02:00
Piotr Tabor	3bb7acc8cf	Migrate dependencies pkg/foo -> client/pkg/foo	2021-04-07 00:38:47 +02:00
Piotr Tabor	55ccbe62a2	membership/cluster_test: Use zaptest logger.	2021-03-26 13:54:59 +01:00
Piotr Tabor	7d7a9c6f23	codec.go: should use google runtime golang-proto runtime can deal with both: gogo & golang generated protobufs. It does not work vice-versa with protobuf-1.5.1.	2021-03-24 22:06:47 +01:00
Joel Smith	19f7c6ef3e	*: Update gogo/protobuf to v1.3.2, rerun ./scripts/genproto.sh While it appears that etcd is not vulnerable to CVE-2021-3121, it is a good idea to update to the new generator so that new vulnerable code isn't generated in any future APIs. Also, this lays the issue to rest of whether there is any issue with etcd and CVE-2021-3121.	2021-03-23 11:48:06 -06:00
Gyuho Lee	3ead91ca3e	Merge pull request #12739 from LeoYang90/optimization_watch_prevkv create event do not need prevkv range	2021-03-10 09:48:42 -08:00
Piotr Tabor	fb1d48e98e	Integration tests: Use BeforeTest(t) instead of defer AfterTest(). Thanks to this change, a single method BeforeTest(t) can handle before-test logic as well as registration of cleanup code (t.Cleanup(func)).	2021-03-09 18:19:51 +01:00
Gyuho Lee	6fd85af641	Merge pull request #12702 from hexfusion/add-so *: add support for socket options	2021-03-09 09:02:24 -08:00
Sam Batschelet	5b49fb41c8	fixup: add ListenerOptions Signed-off-by: Sam Batschelet <sbatsche@redhat.com>	2021-03-08 11:27:03 -05:00
leoyang.yl	d70f35f8d1	create event do not need prevkv range	2021-03-02 17:43:24 +08:00
Gyuho Lee	3d7aac948b	Merge pull request #12196 from ironcladlou/metrics-watch-error-fix etcdserver: fix incorrect metrics generated when clients cancel watches	2021-02-19 12:46:49 -08:00
Sam Batschelet	49078c683b	*: add support for socket options Signed-off-by: Sam Batschelet <sbatsche@redhat.com>	2021-02-19 13:31:23 -05:00
Maksim Buldukyan	7e38cfcc8d	raft: makes 'ConnReadTimeout/ConnWriteTimeout' customizable	2021-02-10 10:36:50 +07:00
Chao Chen	2ae3e82f07	etcdserver/api/etcdhttp: log successful etcd server side health check in debug level When we have an external component that checks /health periodically, the etcd server logs can be quite verbose (e.g., DDOS-ing against insure etcd health check can lead to disk space full due to large log files). This change was introduced in #11704. While we keep the warning logs for etcd health check failures, the success (or OK) log level should be set to DEBUG. Fixes #12676	2021-02-08 17:15:43 -08:00
Yanhao Mo	6d82778a4e	etcdserver: export method EtcdServer.leaderChangedNotify (#12378 )	2021-02-02 18:13:32 +08:00
Sahdev Zala	69e99e80fa	Merge pull request #12465 from spacewander/fdoc chore: update the documentation link in the comment	2021-01-14 00:39:25 -05:00
Jingyi Hu	bfc6e2ff30	Merge pull request #12611 from ptabor/20210111-fix-flakes e2e tests flakes & leaks fixes: In particular TestIssue6361	2021-01-12 21:26:54 +08:00
Piotr Tabor	0d9cfc11c8	Fix usage of reflect.SliceHeader: reported by vet on tip golang Example: https://travis-ci.com/github/etcd-io/etcd/jobs/470404938 ``` % (cd server && go vet ./...) stderr: # go.etcd.io/etcd/server/v3/etcdserver/api/v2store stderr: etcdserver/api/v2store/node_extern_test.go:107:9: possible misuse of reflect.SliceHeader stderr: etcdserver/api/v2store/node_extern_test.go:107:16: possible misuse of reflect.SliceHeader ```	2021-01-12 00:14:51 +01:00
Piotr Tabor	74274f4417	e2e: Adding better diagnostic and location for temporary files to Snapshot tests.	2021-01-12 00:14:51 +01:00
Piotr Tabor	23340bb62a	Refresh proto generation script after moving modules files. With modulatiozation server protos get moved into ./server directory, but it was not reflected in scripts/genproto.sh.	2021-01-08 16:33:12 +01:00
Dan Mace	9571325fe8	etcdserver: fix incorrect metrics generated when clients cancel watches Before this patch, a client which cancels the context for a watch results in the server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of a gRPC `Unavailable` metric in association with the client watch cancellation. The metric looks like this: grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} So, the watch server has misidentified the error as a server error and then propagates the mistake to metrics, leading to a false indicator that the leader has been lost. This false signal then leads to false alerting. The commit 9c103dd0dedfc723cd4f33b6a5e81343d8a6bae7 introduced an interceptor which wraps watch streams requiring a leader, causing those streams to be actively canceled when leader loss is detected. However, the error handling code assumes all stream context cancellations are from the interceptor. This assumption is broken when the context was canceled because of a client stream cancelation. The core challenge is lack of information conveyed via `context.Context` which is shared by both the send and receive sides of the stream handling and is subject to cancellation by all paths (including the gRPC library itself). If any piece of the system cancels the shared context, there's no way for a context consumer to understand who cancelled the context or why. To solve the ambiguity of the stream interceptor code specifically, this patch introduces a custom context struct which the interceptor uses to expose a custom error through the context when the interceptor decides to actively cancel a stream. Now the consuming side can more safely assume a generic context cancellation can be propagated as a cancellation, and the server generated leader error is preserved and propagated normally without any special inference. When a client cancels the stream, there remains a race in the error handling code between the send and receive goroutines whereby the underlying gRPC error is lost in the case where the send path returns and is handled first, but this issue can be taken separately as no matter which paths wins, we can detect a generic cancellation. This is a replacement of https://github.com/etcd-io/etcd/pull/11375. Fixes #10289, #9725, #9576, #9166	2020-11-18 17:02:09 -05:00
spacewander	67f040f921	Update other Documentation/v2 links	2020-11-11 09:57:01 +08:00
spacewander	f2eb15a81b	chore: update the documentation link in the comment Close #12462.	2020-11-11 09:53:18 +08:00
jingyih	0558e379c3	server: proper request cancellation for range	2020-11-05 21:30:02 -08:00
yangweiwei	aa1024a16e	etcdserver: updated cluster version during cluster version update in etcd cluster, the log should info from XX to XX.	2020-10-27 16:32:40 +08:00
Piotr Tabor	aaf423e962	server: Update imports. find -name '*.go' \| xargs sed -i --follow-symlinks 's\|etcd/v3/\|etcd/server/v3/\|g'	2020-10-26 13:02:32 +01:00
Piotr Tabor	4a5e9d1261	server: Move server files to 'server' directory. 26 git mv mvcc wal auth etcdserver etcdmain proxy embed/ lease/ server 36 git mv go.mod go.sum server	2020-10-26 12:57:19 +01:00

37 Commits