% (cd tests && go vet ./...)
stderr: # go.etcd.io/etcd/tests/v3/integration/clientv3/concurrency_test
stderr: integration/clientv3/concurrency/election_test.go:74:6: call to (*T).Fatal from a non-test goroutine
stderr: integration/clientv3/concurrency/mutex_test.go:57:4: call to (*T).Fatal from a non-test goroutine
Prior to this PR, the e2e tests where creating dirs like:
```
/tmp/testname1.etcd030299846
/tmp/testname0.etcd039445123
/tmp/testname0.etcd206372065
```
and not cleaning them, that led to disk-space-exceeded flakes.
After the PR, the testing.TB tempdir mechanism is used and the names are
being cleaned and are more miningful:
```
../../bin/etcd --name test-TestCtlV3EndpointHashKV-2 --listen-client-urls http://localhost:20010 --advertise-client-urls http://localhost:20010 --listen-peer-urls https://localhost:20011 --initial-advertise-peer-urls https://localhost:20011 --initial-cluster-token new --data-dir /tmp/TestCtlV3EndpointHashKV429176179/003 --snapshot-count 100000 --experimental-initial-corrupt-check --peer-auto-tls --initial-cluster test-TestCtlV3EndpointHashKV-0=https://localhost:20001,test-TestCtlV3EndpointHashKV-1=https://localhost:20006,test-TestCtlV3EndpointHashKV-2=https://localhost:20011
```
This is not yet implementation, just API and tests to be filled
with implementation in next CLs,
tracked by: https://github.com/etcd-io/etcd/issues/12652
We propose here 3 packages:
- clientv3/naming/endpoints ->
That is abstraction layer over etcd that allows to write, read &
watch Endpoints information. It's independent from GRPC API. It hides
the storage details.
- clientv3/naming/endpoints/internal ->
That contains the grpc's compatible Update class to preserve the
internal JSON mashalling format.
- clientv3/naming/resolver ->
That implements the GRPC resolver API, such that etcd can be
used for connection.Dial in grpc.
Please see the grpc_naming.md document changes & grpcproxy/cluster.go
new integration, to see how the new abstractions work.
They used to take >10min with coverage, so were causing interrupted
Travis runs.
Know thay fit in 100-150s (together), thanks also to parallel
execution.
- making sure the DRY_RUN mode can finish e2e, so e.g. commits to
local copy of repository are OK in dry-run (while git pushes are NOT).
- better interaction with ./test_lib.sh script.
- more consistent logging
- bringing back s390x architecture that on go 1.14.3 seems to work as
expected.
Changes:
- signing tags.
- allows to override BRANCH and REPOSITORY using env variables.
Tested by a release in my private fork:
BRANCH="20201126-ptabor-release" REPOSITORY="git@github.com:ptabor/etcd.git" ./scripts/release 3.5.0-alpha.20
The root reason of flakes, was that server was considered as ready to
early.
In particular:
```
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:44.474+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ed5f620d34a8e61b","remote-peer-id":"ca50e9357181d758"}
../../bin/etcd-2456648: {"level":"warn","ts":"2021-01-11T09:56:49.040+0100","caller":"etcdserver/server.go:1942","msg":"failed to publish local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to connection lost"}
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:49.049+0100","caller":"etcdserver/server.go:1921","msg":"published local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","cluster-id":"34f27e83b3bc2ff","publish-timeout":"7s"}
```
was taking 5s. If this was happening concurrently with etcdctl, the
etcdctl could timeout.
The fix, requires servers to report 'ready to serve client requests' to consider them up.
Fixed also some whitelisted 'goroutines'.
The commit ensures that spawned etcdctl processes are "closed",
so they perform proper os wait processing.
This might have contributed to file-descriptor/open-files limit being
exceeded.
Before:
```
{"level":"info","ts":1610273495.3791487,"caller":"agent/handler.go:668","msg":"cleaning up page cache"}
{"level":"info","ts":1610273495.3793094,"caller":"agent/handler.go:94","msg":"created etcd log file","path":"/tmp/etcd-functional-2/etcd.log"}
{"level":"info","ts":1610273495.379328,"caller":"agent/handler.go:668","msg":"cleaning up page cache"}
[sudo] password for ptab:
pam_glogin: invalid password
Sorry, try again.
[sudo] password for ptab:
```
Now the caches are dropped if the current users is in sudoers, bot not
in the other cases.
To be honest I don't see the purpose for dropping the caches at all in
the test.
dgrijalva/jwt-go has been abandoned and contains several serious
security issues. Most projects are now switching to the form3tech fork.
See https://snyk.io/vuln/SNYK-GOLANG-GITHUBCOMDGRIJALVAJWTGO-596515 for
info on the issues.
Signed-off-by: Dan Lorenc <dlorenc@google.com>
In these unit tests, goroutines may leak if certain branches are chosen. This commit edits channel operations and buffer sizes, so no matter what branch is chosen, the test will end correctly. This commit doesn't change the semantics of unit tests.
Before this patch, a client which cancels the context for a watch results in the
server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of
a gRPC `Unavailable` metric in association with the client watch cancellation.
The metric looks like this:
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}
So, the watch server has misidentified the error as a server error and then
propagates the mistake to metrics, leading to a false indicator that the leader
has been lost. This false signal then leads to false alerting.
The commit 9c103dd0dedfc723cd4f33b6a5e81343d8a6bae7 introduced an interceptor which wraps
watch streams requiring a leader, causing those streams to be actively canceled
when leader loss is detected.
However, the error handling code assumes all stream context cancellations are
from the interceptor. This assumption is broken when the context was canceled
because of a client stream cancelation.
The core challenge is lack of information conveyed via `context.Context` which
is shared by both the send and receive sides of the stream handling and is
subject to cancellation by all paths (including the gRPC library itself). If any
piece of the system cancels the shared context, there's no way for a context
consumer to understand who cancelled the context or why.
To solve the ambiguity of the stream interceptor code specifically, this patch
introduces a custom context struct which the interceptor uses to expose a custom
error through the context when the interceptor decides to actively cancel a
stream. Now the consuming side can more safely assume a generic context
cancellation can be propagated as a cancellation, and the server generated
leader error is preserved and propagated normally without any special inference.
When a client cancels the stream, there remains a race in the error handling
code between the send and receive goroutines whereby the underlying gRPC error
is lost in the case where the send path returns and is handled first, but this
issue can be taken separately as no matter which paths wins, we can detect a
generic cancellation.
This is a replacement of https://github.com/etcd-io/etcd/pull/11375.
Fixes#10289, #9725, #9576, #9166
instead of v3.0.0-000101010000000-00000000000,
that might be misleading as we don't develop etcd v3.0.0 any longer.
This version is a virtual version and is not supposed to be tagged
within the repository. We should tag real versions like: 3.5.0-alpha.0.
Please notice that go.etcd.io/etcd/client/v2 will be versioned as `v2.305.0-pre`.
The reason is that client v2 must have v2 version. I propose a
convention to envode the major version as 100x in minor version to make
the association to the underlying repository clear, staying within v2
version family.
The change was generated using:
```
DRY_RUN=false TARGET_VERSION="v3.5.0-pre" ./scripts/release_mod.sh update_versions
```
We make v2 client code a module go.etcd.io/etcd/client/v2.
Pretty mechanical change that can be summarized as:
mkdir client/v2
cd client/v2 && git mod init go.etcd.io/etcd/client/v2
git mv client/*.go client/v2/
find -name '*.go' | xargs sed -i --follow-symlinks 's|/v3/client["]|/client/v2\"|g'
+ fixing changelog, bom, go.mod etc.
The e2e tests can be flaky due to various tests mutating shared mutable
fixtures, causing non-deterministic behavior depending on the test set, order,
etc.
For example, `configTLS` is mutated in at least two tests in such a way that the
config is potentially invalidated for any subsequent test running in the same
process (e.g. by setting the `enableV2` field). This particular example caused
a substantial amount of confusion diagnosing the new test introduced for
https://github.com/etcd-io/etcd/pull/12370.
Independent tests should not share mutable state unless deliberately. This patch
refactors the e2e test config fixtures to safeguard against these problems by
replacing the package variables (which cannot easily be made immutable) with
functions that return new instances.
This brings consistency between proto-generation code and actual versions of libraries being used in runtime:
github.com/gogo/protobuf v1.2.1,v1.0.0 -> v1.3.1
github.com/golang/protobuf v1.3.2 -> v1.3.5
github.com/grpc-ecosystem/grpc-gateway v1.9.5,v1.4.1,v1.15.2 -> v1.14.6
google.golang.org/grpc v1.26.0 -> v1.29.1
Moved as far as possible, without bumping on grpc 1.30.0 "naming" decomissioning.
Please also notice that gogo/protobuf is likely to reach EOL: https://github.com/gogo/protobuf/issues/691
Fix test case in tests/e2e/etcd_config_test.go:311, here should
check if it is nil. As the following errors:
--- FAIL: TestGrpcproxyAndCommonName (0.00s)
etcd_config_test.go:304: Unexpected error: fork/exec ../../bin/etcd: no such file or directory
etcd_config_test.go:309: Unexpected error: fork/exec ../../bin/etcd: no such file or directory
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x56bd96]
Replace ./scripts/install_tool.sh with `gobin`, such that we have
consistent handling for all tools needed for build and consistent
versioning within ./tools/mod/go.mod.
Side changes:
- Expose /scripts/fix.sh that fixes formatting and bom across modules
- Expose *.sh variants of scripts like build and ./test (first step
towards replacement).
- Make stderr output of commands explicit and make commands use
different color than callouts.