131 Commits

Author SHA1 Message Date
Benjamin Wang
646ba66c5e
Merge pull request #14434 from tjungblu/bz_1918413_3.5
etcdctl: allow move-leader to connect to multiple endpoints
2022-09-08 17:58:03 +08:00
Thomas Jungblut
243b7a125b etcdctl: fix move-leader for multiple endpoints
Due to a duplicate call of clientConfigFromCmd, the move-leader command
would fail with "conflicting environment variable is shadowed by corresponding command-line flag".
Also in scenarios where no command-line flag was supplied.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
2022-09-08 11:20:15 +02:00
Vivek Patani
7639d93f15 server,test: refresh cache on each NewAuthStore
- permissions were incorrectly loaded on restarts.
- #14355
- Backport of https://github.com/etcd-io/etcd/pull/14358

Signed-off-by: vivekpatani <9080894+vivekpatani@users.noreply.github.com>
2022-09-07 10:22:05 -07:00
Marek Siarkowicz
2ddb9e0883 tests: Fix member id in CORRUPT alarm
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:56 +02:00
Marek Siarkowicz
5660bf0e7f server: Make corrtuption check optional and period configurable
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:56 +02:00
Marek Siarkowicz
21fb173f76 server: Implement compaction hash checking
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:56 +02:00
Marek Siarkowicz
a56ec0be4b tests: Cover periodic check in tests
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:56 +02:00
Marek Siarkowicz
8d4ca10ece tests: Move CorruptBBolt to testutil
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Marek Siarkowicz
a8020a0320 tests: Rename corruptHash to CorruptBBolt
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-09-07 15:11:55 +02:00
Vitalii Levitskii
67e4c59e01 Backport of pull/14354 to 3.5.5
Signed-off-by: Vitalii Levitskii <vitalii@uber.com>
2022-08-29 15:58:17 +03:00
Benjamin Wang
ff447b4a35 add e2e test cases to cover the maxConcurrentStreams
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-13 14:43:44 +08:00
Marek Siarkowicz
a060b42e47 server: Use default logging configuration instead of zap production one
This fixes problem where logs json changes format of timestamp.
2022-04-01 12:23:44 +02:00
Marek Siarkowicz
25556a08a8 tests: Keeps log in expect to allow their analysis 2022-04-01 12:23:14 +02:00
ahrtr
7345d4211b always print raft_term in decimal when displaying member list in json 2022-02-22 17:09:21 +08:00
Marek Siarkowicz
79f9a45574 client: Use first endpoint as http2 authority header 2021-09-30 12:15:33 +02:00
Marek Siarkowicz
7f25a500e3 tests: Add grpc authority e2e tests 2021-09-30 12:15:33 +02:00
Arda Güçlü
6e2fe84ebd Decouple prefixArgs from os.Env dependency
prefixArgs uses os.Setenv in e2e tests instead envMap.
This creates overwrites in some test cases and have an impact
on test quality and isolation between tests.
This PR uses ctlcontext envMap in each tests with high priority
and merges os environment variables with low priority.
2021-09-30 12:04:31 +02:00
J. David Lowe
ae194c1470 etcdserver: don't activate alarm w/missing AlarmType
Narrowly prevent etcd from crashing when given a bad ACTIVATE payload, e.g.:

$ curl -d "{\"action\":\"ACTIVATE\"}" ${ETCD}/v3/maintenance/alarm
curl: (52) Empty reply from server
2021-06-04 14:21:04 -07:00
Piotr Tabor
3f13d3a2d5 integration.BeforeTest can be run without leak-detection. 2021-05-28 10:01:36 +02:00
Piotr Tabor
d99d0df5a5 Adding etcdutl test coverage. 2021-05-17 11:54:03 +02:00
Piotr Tabor
c09aca1ba4 Split etcdctl into etcdctl (public API access) & etcdutl (direct surgery on files)
Motivation is as follows:

  - etcdctl we only depend on clientv3 APIs, no dependencies of bolt, backend, mvcc, file-layout
  - etcdctl can be officially supported across wide range of versions, while etcdutl is pretty specific to file format at particular version.
it's step towards desired modules layout, documented in: https://etcd.io/docs/next/dev-internal/modules/
2021-05-17 11:54:03 +02:00
Piotr Tabor
099fd65821 Fix coverage test failure: e2e TestIssue6361.
Tested with:
```
(cd tests && COVERDIR='../../c' 'env' 'go' 'test' '-tags=cov' '-timeout' '30m' 'go.etcd.io/etcd/tests/v3/e2e' -run TestIssue6361 -v 2>&1 | tee log.log)
```
2021-05-16 10:58:41 +02:00
Piotr Tabor
c7a76470d5 Fix path to the coverage folder for e2e tests. 2021-05-16 09:49:50 +02:00
Piotr Tabor
13ef6fc343 Fix coverage tests
2 problems:
  - spawnCmdWithLogger was not implemented (when built with 'cov' tag)
  - the logic was depending on relative paths. We change it to absolute
to be able to run in the test-specific temporary directories.
2021-05-16 09:49:50 +02:00
Piotr Tabor
c18010cf42 etcdproxy e2e tests should run in dedicated directories.
So far all proxies were sharing the same (current) directory,
leading to tests flakes, e.g. due to certificates being overriden
in autoTLS mode.
2021-05-14 22:42:31 +02:00
Piotr Tabor
582d02e7f5 E2E tests should log commandlines used to spawn etcd or etcd proxy binaries. 2021-05-14 22:42:31 +02:00
Piotr Tabor
79e3d7bd3e Add e2e tests for --v2-deprecation flag. 2021-05-12 19:20:49 +02:00
Marek Siarkowicz
efc8505739 etcdserver: Implement running defrag if freeable space will exceed privided threshold 2021-05-11 14:00:29 +02:00
Piotr Tabor
f53b70facb Embed: In case KVStoreHash verification fails, close the backend.
In case of failed verification, the server used to keep opened backend
(so the file was locked on OS level).
2021-04-29 11:51:25 +02:00
Piotr Tabor
2ad893b110 Integrate verification into e2e tests. 2021-04-29 11:51:24 +02:00
Piotr Tabor
4725567d5e e2e tests: More logging and expect adopted to 3.4. 2021-04-27 17:17:31 +02:00
Piotr Tabor
d4a8093ea5 Switch release-test (upgrade test) to use etcd 3.4 (instead of 3.3) as upgrade-base. 2021-04-08 01:15:16 +02:00
Piotr Tabor
931af493cf Merge pull request #12830 from ptabor/20210405-split-pkg
Split client/pkg as dedicated low-dependencies module for client
2021-04-08 01:12:17 +02:00
Piotr Tabor
3bb7acc8cf Migrate dependencies pkg/foo -> client/pkg/foo 2021-04-07 00:38:47 +02:00
garenchan
c047ed593c etcdctl: lock return exit code of exec-command
Sometimes we expect to get the exit code of the command being
executed.
2021-04-06 14:34:31 +08:00
Piotr Tabor
03f55eeb2c Make NewTmpBackend use testing tmp location (so cleanup). 2021-03-26 13:54:55 +01:00
Piotr Tabor
fb1d48e98e Integration tests: Use BeforeTest(t) instead of defer AfterTest().
Thanks to this change, a single method BeforeTest(t) can handle
before-test logic as well as registration of cleanup code
(t.Cleanup(func)).
2021-03-09 18:19:51 +01:00
Ben Meier
3d44f5bf80
*: added client-{client,key}-file parameters for supporting separate client and server certs when communicating between peers
In some environments, the CA is not able to sign certificates with both
'client auth' and 'server auth' extended usage parameters and so an operator
needs to be able to set a seperate client certificate to use when making
requests which is different to the certificate used for accepting requests.
This applies to both proxy and etcd member mode and is available as both a CLI
 flag and config file field for peer TLS.

Signed-off-by: Ben Meier <ben.meier@oracle.com>
2021-02-28 14:37:56 +00:00
Piotr Tabor
a7f340216d Reformat code according to 'gotip' rules.
In practices adds annotations in the new syntax:
```
+//go:build !linux
 // +build !linux
```

Fixes failing gotip PASSES='fmt' check:
https://travis-ci.com/github/etcd-io/etcd/jobs/486453806
2021-02-26 10:14:46 +01:00
Gyuho Lee
3d7aac948b
Merge pull request #12196 from ironcladlou/metrics-watch-error-fix
etcdserver: fix incorrect metrics generated when clients cancel watches
2021-02-19 12:46:49 -08:00
Piotr Tabor
f0ecad00e3 Use temp-directory that is covered by framework level cleanup
Prior to this PR, the e2e tests where creating dirs like:
```
/tmp/testname1.etcd030299846
/tmp/testname0.etcd039445123
/tmp/testname0.etcd206372065
```
and not cleaning them, that led to disk-space-exceeded flakes.

After the PR, the testing.TB tempdir mechanism is used and the names are
being cleaned and are more miningful:

```
../../bin/etcd --name test-TestCtlV3EndpointHashKV-2 --listen-client-urls http://localhost:20010 --advertise-client-urls http://localhost:20010 --listen-peer-urls https://localhost:20011 --initial-advertise-peer-urls https://localhost:20011 --initial-cluster-token new --data-dir /tmp/TestCtlV3EndpointHashKV429176179/003 --snapshot-count 100000 --experimental-initial-corrupt-check --peer-auto-tls --initial-cluster test-TestCtlV3EndpointHashKV-0=https://localhost:20001,test-TestCtlV3EndpointHashKV-1=https://localhost:20006,test-TestCtlV3EndpointHashKV-2=https://localhost:20011
```
2021-01-30 13:25:55 +01:00
Piotr Tabor
70b5ef1d3a Fix tests flakiness: in particular TestIssue6361.
The root reason of flakes, was that server was considered as ready to
early.
In particular:
```
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:44.474+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ed5f620d34a8e61b","remote-peer-id":"ca50e9357181d758"}
../../bin/etcd-2456648: {"level":"warn","ts":"2021-01-11T09:56:49.040+0100","caller":"etcdserver/server.go:1942","msg":"failed to publish local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to connection lost"}
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:49.049+0100","caller":"etcdserver/server.go:1921","msg":"published local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","cluster-id":"34f27e83b3bc2ff","publish-timeout":"7s"}
```
was taking 5s.   If this was happening concurrently with etcdctl, the
etcdctl could timeout.

The fix, requires servers to report 'ready to serve client requests' to consider them up.

Fixed also some whitelisted 'goroutines'.
2021-01-12 00:14:51 +01:00
Piotr Tabor
74274f4417 e2e: Adding better diagnostic and location for temporary files to Snapshot tests. 2021-01-12 00:14:51 +01:00
Piotr Tabor
26f9b4be8f e2e tests were leaking 'defunc' etcdctl processes.
The commit ensures that spawned etcdctl processes are "closed",
so they perform proper os wait processing.
This might have contributed to file-descriptor/open-files limit being
exceeded.
2021-01-11 11:55:30 +01:00
Sahdev Zala
a1ff0d5373
Merge pull request #12328 from viviyww/tests-e2e-panic-case
tests: fix test case panic error
2020-12-24 14:54:33 -05:00
Gyuho Lee
b5cefb5b3d
Merge pull request #12392 from ironcladlou/fixture-mutations
tests: prevent cross-test contamination via shared state
2020-11-19 10:05:42 -08:00
Dan Mace
9571325fe8 etcdserver: fix incorrect metrics generated when clients cancel watches
Before this patch, a client which cancels the context for a watch results in the
server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of
a gRPC `Unavailable` metric in association with the client watch cancellation.
The metric looks like this:

    grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}

So, the watch server has misidentified the error as a server error and then
propagates the mistake to metrics, leading to a false indicator that the leader
has been lost. This false signal then leads to false alerting.

The commit 9c103dd0dedfc723cd4f33b6a5e81343d8a6bae7 introduced an interceptor which wraps
watch streams requiring a leader, causing those streams to be actively canceled
when leader loss is detected.

However, the error handling code assumes all stream context cancellations are
from the interceptor. This assumption is broken when the context was canceled
because of a client stream cancelation.

The core challenge is lack of information conveyed via `context.Context` which
is shared by both the send and receive sides of the stream handling and is
subject to cancellation by all paths (including the gRPC library itself). If any
piece of the system cancels the shared context, there's no way for a context
consumer to understand who cancelled the context or why.

To solve the ambiguity of the stream interceptor code specifically, this patch
introduces a custom context struct which the interceptor uses to expose a custom
error through the context when the interceptor decides to actively cancel a
stream. Now the consuming side can more safely assume a generic context
cancellation can be propagated as a cancellation, and the server generated
leader error is preserved and propagated normally without any special inference.

When a client cancels the stream, there remains a race in the error handling
code between the send and receive goroutines whereby the underlying gRPC error
is lost in the case where the send path returns and is handled first, but this
issue can be taken separately as no matter which paths wins, we can detect a
generic cancellation.

This is a replacement of https://github.com/etcd-io/etcd/pull/11375.

Fixes #10289, #9725, #9576, #9166
2020-11-18 17:02:09 -05:00
Piotr Tabor
aaf423e962 server: Update imports.
find -name '*.go' | xargs sed -i --follow-symlinks 's|etcd/v3/|etcd/server/v3/|g'
2020-10-26 13:02:32 +01:00
Gyuho Lee
bc3a77d298
Merge pull request #12099 from YoyinZyc/downgrade-httphandler
[Etcd downgrade] Add http handler to enable downgrade info communication between each member
2020-10-26 04:42:24 -07:00
Piotr Tabor
09679d29ad etcdctl: Rename of imports after making etcdctl a module.
```
find -name '*.go' | xargs sed -i --follow-symlinks 's|etcd/v3/etcdctl|etcd/etcdctl/v3|g'
```
2020-10-21 11:15:35 +02:00