573 Commits

Author SHA1 Message Date
Piotr Tabor
d53d2db1e2 Tests: Backend hooks support. 2021-05-04 15:38:22 +02:00
Piotr Tabor
9f11b16b2d backend: Hooks interface & implementation. 2021-05-04 15:38:22 +02:00
Piotr Tabor
a3fc535a5a
Merge pull request #12914 from ptabor/20210430-read-membership
No-storeV2: Read membership information from the backend (Part5)
2021-05-04 10:15:01 +02:00
Wilson Wang
56154216b7 update variable declaration location 2021-05-03 10:30:15 -07:00
wpedrak
6623c008ee server: reapply Mlock flag after defrag 2021-05-03 11:01:02 +02:00
Piotr Tabor
451f65d661
Merge pull request #12908 from ptabor/20210429-client-retry-logging
Clientv3 (retry interceptor) logs should use the configured logger
2021-04-29 19:25:04 +02:00
Piotr Tabor
cedbea6c81
Merge pull request #12904 from wpedrak/limit_mlocked_memory
server: replace mlockall with `Mlock` in `--experimental-memory-mlock`
2021-04-29 18:21:24 +02:00
Piotr Tabor
ffea1537d4 ClientV3 tests use integration.NewClient that configures proper logger. 2021-04-29 18:18:34 +02:00
Piotr Tabor
205a1a442a Read membership information from the backend. 2021-04-29 13:45:45 +02:00
Piotr Tabor
bc8d3f6639 Clientv3 (retry) logs should use the configured logger.
clientv3 logs (especially tests) were poluted with unattributed to testing.T log lines:

```
{"level":"warn","ts":"2021-04-29T12:42:11.055+0200","logger":"etcd-client","caller":"v3/retry_interceptor.go:64","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000fafc0/#initially=[unix://localhost:m10]","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
```

The reasons were 2 fold:
  - Interceptors were copying logger before "WithLogger" could modify it.
  - We were not propagating the loggers in a few testing contexts.
2021-04-29 12:57:09 +02:00
wpedrak
1145c57601 server: replace mlockall with Mlock in --experimental-memory-mlock
Implementation of `--experimental-memory-mlock` backed by `mlockall` syscall is replaced by `Mlock` flag (backed by mlock syscall) of bboltDB.
2021-04-29 12:08:52 +02:00
wpedrak
927b3a3152 server: replace mlockall with Mlock in --experimental-memory-mlock
Implementation of `--experimental-memory-mlock` backed by `mlockall` syscall is replaced by `Mlock` flag (backed by mlock syscall) of bboltDB.
2021-04-29 12:08:20 +02:00
Piotr Tabor
e90504fe62 Unify shared code (and constants) with cindex package. 2021-04-29 11:51:25 +02:00
Piotr Tabor
f53b70facb Embed: In case KVStoreHash verification fails, close the backend.
In case of failed verification, the server used to keep opened backend
(so the file was locked on OS level).
2021-04-29 11:51:25 +02:00
Piotr Tabor
911204cd76 Fix ETCDCTL_API=2 etcdctl backup --with-v3 consistent index consistency
Prior to this CL, `ETCDCTL_API=2 etcdctl backup --with-v3` was readacting WAL log
(by removal of some entries), but was NOT updating consistent_index in the backend.
Also the WAL editing logic was buggy, as it didn't took in consideration the fact
that when TERM changes, there can be entries with duplicated indexes in
the log. So its NOT sufficient to subtract number of removed entries to
get accurate log indexes.

The PR replaces removing and shifting of WAL entries with replacing them with an no-op entries.
Thanks to this consistent-index references are staying up to date.

The PR also:
  - updates 'verification' logic to check whether consistent_index does not lag befor last snapshot
  - env-gated execution of verification framework in `etcdctl backup`.

Tested with:
```
(./build.sh && cd tests && EXPECT_DEBUG=TRUE 'env' 'go' 'test' '-timeout=300m' 'go.etcd.io/etcd/tests/v3/e2e' -run=TestCtlV2Backup --count=1000 2>&1 | tee TestCtlV2BackupV3.log)
```
2021-04-29 11:51:24 +02:00
Wilson Wang
8d8d0377a2 server: applier uses ReadTx instead of ConcurrentTx 2021-04-28 11:06:24 -07:00
Piotr Tabor
c4b13a5c83 Integrate verification framework
Verification framework is integrated with:
  - integration tests (by default)
  - `ETCD_VERIFY=all etcdctl snapshot restore` command
  - etcd shutdown when running with `ETCD_VERIFY=all` env.
2021-04-28 07:56:16 +02:00
Piotr Tabor
47b28b600a Verification package: Verified given data-dir.
For now verifies whete Backend.cindex is consistent with WAL log,
but should get expanded to cover memberships & revisions.
2021-04-28 07:56:15 +02:00
Piotr Tabor
6f8f506cf4 Create 'datadir' package responsible for paths. 2021-04-28 07:56:13 +02:00
Piotr Tabor
067521981e v2 etcdctl backup: producing consistent state of membership 2021-04-27 19:34:34 +02:00
Piotr Tabor
a70386a1a4 Simplify membership interface: Does not pass the 'unused' token. 2021-04-27 17:17:31 +02:00
Piotr Tabor
7ae3d25f91 Membership: Add additional methods to trim/manage membership data in backend. 2021-04-27 17:17:31 +02:00
Piotr Tabor
768da490ed sever: v2store deprecation: Fix etcdctl snapshot restore to restore
correct 'backend' (bbolt) context in aspect of membership.

Prior to this change the 'restored' backend used to still contain:
  - old memberid (mvcc deletion used, why the membership is in bolt
bucket, but not mvcc part):
    ```
	mvs := mvcc.NewStore(s.lg, be, lessor, ci, mvcc.StoreConfig{CompactionBatchLimit: math.MaxInt32})
	defer mvs.Close()
	txn := mvs.Write(traceutil.TODO())
	btx := be.BatchTx()
	del := func(k, v []byte) error {
		txn.DeleteRange(k, nil)
		return nil
	}

	// delete stored members from old cluster since using new members
	btx.UnsafeForEach([]byte("members"), del)
    ```
  - didn't get new members added.
2021-04-27 17:17:30 +02:00
Makdon
ddcb463822
etcdserver/mvcc: update trace.Step condition 2021-04-25 23:07:45 +08:00
Piotr Tabor
9a4b2bdccc Errors: context cancelled or context deadline exceeded are exposed as codes.Canceled, codes.DeadlineExceeded instead of 'codes.Unknown' 2021-04-22 14:35:24 +02:00
Piotr Tabor
d7d110b5a8 mvcc/backend tests: Refactor: Do not mix testing&prod code. 2021-04-21 09:43:13 +02:00
Piotr Tabor
ea287dd9f8
Merge pull request #12854 from ptabor/20210410-shouldApplyV3
(no)StoreV2 (Part 3): Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store
2021-04-21 09:31:38 +02:00
Sam Batschelet
0f2c940f64
Merge pull request #12880 from chaochn47/exclude_alarms_from_health_check
etcdhttp/metrics.go: exclude alarms from health check conditionally with `?exclude=NOSPACE`
2021-04-20 21:18:15 -04:00
Chao Chen
140ea4fa29 etcdhttp/metrics.go: exclude alarms from health check conditionally with ?exclude=NOSPACE 2021-04-20 13:17:09 -07:00
Piotr Tabor
11249fdee9
Merge pull request #12874 from ptabor/20210417-go1.16.3
Update go for 3.5: 1.15.x -> 1.16.3
2021-04-19 16:51:49 +02:00
Piotr Tabor
3423a949c0 Update go for 3.5: 1.15 -> 1.16.(3).
https://github.com/etcd-io/etcd/issues/12732
2021-04-19 16:50:54 +02:00
Piotr Tabor
06ba0fc5a2
Merge pull request #12846 from pyiyun/fix-snaptmpfile-bug
etcdserver: remove temp files in snap dir when etcdserver starting
2021-04-17 12:58:46 +02:00
chao
80586c5b47
etcdserver/util.go: reduce memory when logging range requests (#12871)
* etcdserver/util.go: reduce memory when logging range requests

Fixes #12835

* Update CHANGELOG-3.5.md
2021-04-16 16:39:34 -07:00
Piotr Tabor
17b982382e Fix TestSnapshotV3RestoreMultiMemberAdd flakes (leaks)
- most important: unix's socket transport should not keep idle
connections. For top-level Transport we close them using:
    f3c518025e/server/etcdserver/api/rafthttp/transport.go (L226)
    but currently we don't have access to close them witing the nest (unix) transport. Short idle deadline is good enough.

  - Use dialContext (instead of dial) to make sure context is passed down the stack
  - Make sure Context is cancelled as soon as the operation is done in pipeline
  - nit: use dedicated method to yeld goroutines.

Tested with:
```
d=$(date +"%Y%m%d_%H%M")
(cd tests && go test --timeout=60m ./integration/snapshot -run TestSnapshotV3RestoreMultiMemberAdd -v --count=180 2>&1 | tee log_${d}.log)
```

There were transports & cmux leaked:

```
   leak.go:118: Test appears to have leaked a Transport:
        internal/poll.runtime_pollWait(0x7f6c5c3784c8, 0x72, 0xffffffffffffffff)
        	/usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55
        internal/poll.(*pollDesc).wait(0xc003296298, 0x72, 0x0, 0x18, 0xffffffffffffffff)
        	/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45
        internal/poll.(*pollDesc).waitRead(...)
        	/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92
        internal/poll.(*FD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0)
        	/usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5
        net.(*netFD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x18, 0xc0009056e2, 0x203000)
        	/usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f
        net.(*conn).Read(0xc000010258, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0)
        	/usr/lib/google-golang/src/net/net.go:183 +0x91
        github.com/soheilhy/cmux.(*bufferedReader).Read(0xc0003d24e0, 0xc0031f60a8, 0x18, 0x18, 0xc0003d24d0, 0xc0009056e2, 0xc000278400)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/buffer.go:53 +0x12d
        github.com/soheilhy/cmux.hasHTTP2Preface(0x1367e20, 0xc0003d24e0, 0x7f6c5c699f40)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/matchers.go:195 +0x8a
        github.com/soheilhy/cmux.matchersToMatchWriters.func1(0x7f6c5c699f40, 0xc000010258, 0x1367e20, 0xc0003d24e0, 0xc000010258)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:128 +0x39
        github.com/soheilhy/cmux.(*cMux).serve(0xc003228690, 0x138c410, 0xc000010258, 0xc00327f740, 0xc0059ba860)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:192 +0x1e7
        created by github.com/soheilhy/cmux.(*cMux).Serve
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:179 +0x191

        internal/poll.runtime_pollWait(0x7f6c5c60f3f0, 0x72, 0xffffffffffffffff)
        	/usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55
        internal/poll.(*pollDesc).wait(0xc000d53018, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
        	/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45
        internal/poll.(*pollDesc).waitRead(...)
        	/usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92
        internal/poll.(*FD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        	/usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5
        net.(*netFD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x3, 0x3, 0x1000000000001)
        	/usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f
        net.(*conn).Read(0xc00031a570, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        	/usr/lib/google-golang/src/net/net.go:183 +0x91
        net/http.(*persistConn).Read(0xc00093b320, 0xc000cfd000, 0x1000, 0x1000, 0x577750, 0x60, 0x0)
        	/usr/lib/google-golang/src/net/http/transport.go:1933 +0x77
        bufio.(*Reader).fill(0xc005702fc0)
        	/usr/lib/google-golang/src/bufio/bufio.go:101 +0x108
        bufio.(*Reader).Peek(0xc005702fc0, 0x1, 0xc00077c660, 0xc003b082a0, 0xc000d08de0, 0x5ae586, 0x11dd6c0)
        	/usr/lib/google-golang/src/bufio/bufio.go:139 +0x4f
        net/http.(*persistConn).readLoop(0xc00093b320)
        	/usr/lib/google-golang/src/net/http/transport.go:2094 +0x1a8
        created by net/http.(*Transport).dialConn
        	/usr/lib/google-golang/src/net/http/transport.go:1754 +0xdaa

        net/http.(*persistConn).writeLoop(0xc00093b320)
        	/usr/lib/google-golang/src/net/http/transport.go:2393 +0xf7
        created by net/http.(*Transport).dialConn
        	/usr/lib/google-golang/src/net/http/transport.go:1755 +0xdcf

        sync.runtime_Semacquire(0xc0059ba868)
        	/usr/lib/google-golang/src/runtime/sema.go:56 +0x45
        sync.(*WaitGroup).Wait(0xc0059ba860)
        	/usr/lib/google-golang/src/sync/waitgroup.go:130 +0x65
        github.com/soheilhy/cmux.(*cMux).Serve.func1(0xc003228690, 0xc0059ba860)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:158 +0x56
        github.com/soheilhy/cmux.(*cMux).Serve(0xc003228690, 0x13698c0, 0xc00377a0f0)
        	/home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:173 +0x115
        go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func1(0xc0007cc360, 0x122b75f)
        	/home/ptab/corp/etcd/server/embed/etcd.go:518 +0x2b9
        go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func3(0xc00036d080, 0xc0059330a0)
        	/home/ptab/corp/etcd/server/embed/etcd.go:549 +0x182
        created by go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers
        	/home/ptab/corp/etcd/server/embed/etcd.go:543 +0x73a
--- FAIL: TestSnapshotV3RestoreMultiMemberAdd (17.74s)
```
2021-04-16 20:17:28 +02:00
pyiyun
28a490b09c etcdserver: remove temp files in snap dir when etcdServer starting
When etcd exits abnormally, tmp files will remain in snap dir, so clean up tmp files in snap dir when etcdserver starting.

Fixes #12837
2021-04-16 20:30:04 +08:00
Piotr Tabor
b47c5fcc12 Address review comments a.d. logging. 2021-04-15 17:54:37 +02:00
Piotr Tabor
d72f7ef5cc Give control to Embedded servers whether they override global loggers
So far each instance of embed server was overriding the grpc loggers and zap.global loggers.
It's counter intutitive that last created Embedded server was 'wining' and more-over it was breaking grpc expectation to change it "only" before the grpc stack is being used.

This PR introduces explicit call: `embed.Config::SetupGlobalLoggers()`, that changes the loggers where requested. The call is used by etcd main binary.

The immediate benefit from this change is reduction of  test flakiness, as there were flakes due to not a proper logger being used across tests.
2021-04-14 12:47:38 +02:00
Piotr Tabor
eafbc8c57e Update zap logging dependency.
In particular bring up zapgrpc V2 code:
89e382035d
https://pkg.go.dev/google.golang.org/grpc/grpclog#LoggerV2
2021-04-14 12:15:48 +02:00
Piotr Tabor
d69e46ea47 Make ShouldApplyV3 an enum - not bool 2021-04-13 23:01:03 +02:00
Piotr Tabor
b1c04ce043 Applying consistency fix: ClusterVersionSet (and co) might get no applied on v2store
ClusterVersionSet, ClusterMemberAttrSet, DowngradeInfoSet functions are
writing both to V2store and backend. Prior this CL there were
in a branch not executed if shouldApplyV3 was false,
e.g. during restore when Backend is up-to-date (has high
consistency-index) while v2store requires replay from WAL log.

The most serious consequence of this bug was that v2store after restore
could have different index (revision) than the same exact store before restore,
so potentially different content between replicas.

Also this change is supressing double-applying of Membership
(ClusterConfig) changes on Backend (store v3) - that lackilly are not
part of MVCC/KeyValue store, so they didn't caused Revisions to be
bumped.

Inspired by jingyih@ comment:
https://github.com/etcd-io/etcd/pull/12820#issuecomment-815299406
2021-04-12 09:43:48 +02:00
wpedrak
3991a8c9fa etcdserver: replace forceVersionC with FirstCommitInTermNotify 2021-04-09 11:30:42 +02:00
wpedrak
3d485faac5 etcdserver: resend ReadIndex request on empty apply request
Empty apply indicates first commit in current term. It is first time when follower is sure, that it's ReadIndex request can be processed.
2021-04-09 11:30:42 +02:00
Piotr Tabor
63c25bf378
Merge pull request #12804 from ptabor/20210326-v3-publish
server: v2store deprecation: Prepare to use publishV3 instead of publish V2.
2021-04-08 09:22:22 +02:00
Piotr Tabor
e776efbb2a
Merge pull request #12828 from ptabor/20210404-embed-etcd
embed: etcd.Close() is closing Errc() channel as well.
2021-04-08 01:20:07 +02:00
Piotr Tabor
5da9cac193 embed: etcd.Close() is closing Errc() channel as well.
Inspired by https://github.com/etcd-io/etcd/pull/9612 by purpleidea@.
2021-04-08 01:19:13 +02:00
Piotr Tabor
cc0f812f51 server_test.go: Use context.Background() instead of TODO in tests.
Suggected in: https://golang.org/pkg/context/#Background
2021-04-08 01:15:16 +02:00
Piotr Tabor
931af493cf Merge pull request #12830 from ptabor/20210405-split-pkg
Split client/pkg as dedicated low-dependencies module for client
2021-04-08 01:12:17 +02:00
Piotr Tabor
3bb7acc8cf Migrate dependencies pkg/foo -> client/pkg/foo 2021-04-07 00:38:47 +02:00
Sam Batschelet
a40a6e9ad8
Merge pull request #12805 from ptabor/20210326-minor-test-fixes
tests: logging & temp-dir fixes
2021-03-28 19:14:13 -04:00
Piotr Tabor
f290ab2e60 Update dependecies:
github.com/grpc-ecosystem/grpc-gateway v1.14.6 -> grpc-gateway v1.16.0
  golang.org/x/time v0.0.0-20200630173020-3af7569d3a1e->v0.0.0-20210220033141-f8bda1e9f3ba
2021-03-27 20:48:33 +01:00