They used to take >10min with coverage, so were causing interrupted
Travis runs.
Know thay fit in 100-150s (together), thanks also to parallel
execution.
The root reason of flakes, was that server was considered as ready to
early.
In particular:
```
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:44.474+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ed5f620d34a8e61b","remote-peer-id":"ca50e9357181d758"}
../../bin/etcd-2456648: {"level":"warn","ts":"2021-01-11T09:56:49.040+0100","caller":"etcdserver/server.go:1942","msg":"failed to publish local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to connection lost"}
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:49.049+0100","caller":"etcdserver/server.go:1921","msg":"published local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","cluster-id":"34f27e83b3bc2ff","publish-timeout":"7s"}
```
was taking 5s. If this was happening concurrently with etcdctl, the
etcdctl could timeout.
The fix, requires servers to report 'ready to serve client requests' to consider them up.
Fixed also some whitelisted 'goroutines'.
The commit ensures that spawned etcdctl processes are "closed",
so they perform proper os wait processing.
This might have contributed to file-descriptor/open-files limit being
exceeded.
In these unit tests, goroutines may leak if certain branches are chosen. This commit edits channel operations and buffer sizes, so no matter what branch is chosen, the test will end correctly. This commit doesn't change the semantics of unit tests.
Use golang.org/x/sys/unix for F_OFD_* constants.
This fixes the issue that F_OFD_GETLK was defined incorrectly,
resulting in bugs such as https://github.com/moby/moby/issues/31182
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This brings consistency between proto-generation code and actual versions of libraries being used in runtime:
github.com/gogo/protobuf v1.2.1,v1.0.0 -> v1.3.1
github.com/golang/protobuf v1.3.2 -> v1.3.5
github.com/grpc-ecosystem/grpc-gateway v1.9.5,v1.4.1,v1.15.2 -> v1.14.6
google.golang.org/grpc v1.26.0 -> v1.29.1
Moved as far as possible, without bumping on grpc 1.30.0 "naming" decomissioning.
Please also notice that gogo/protobuf is likely to reach EOL: https://github.com/gogo/protobuf/issues/691
This package depends on raft and is solelly used by etcdserver/raft.go.
So it does not fullfills conditions of:
```
pkg/ is a collection of utility packages used by etcd without being specific to etcd itself. A package belongs here
only if it could possibly be moved out into its own repository in the future.
```
Mechanical execution of:
```
git mv ./pkg/mock/mockserver ./client/pkg/mock
git mv ./pkg/mock/{mockstorage,mockstore,mockwait} ./server/pkg/mock
```
The packages depend on etcd API / protos - so they are NOT etcd-dependencies.
In such situation thay should be placed in 'pkg' folder.
We introduce a LazyCluster abstraction (instead of copy-pasted logic)
that makes clusters to be created only if there are runnable tests
in need for the infrastructure.
This CL tries to connect 2 objectives:
- Examples should be close (the same package) to the original code,
such that they can participate in documentation.
- Examples should be runnable - such that they are not getting out of
sync with underlying API/implementation.
In case of etcd-client, the examples are assuming running 'integration'
style, i.e. thay do connect to fully functional etcd-server.
That would lead to a cyclic dependencies between modules:
- server depends on client (as client need to be lightweight)
- client (for test purposes) depend on server.
Go modules does not allow to distingush testing dependency from
prod-code dependency.
Thus to meet the objective:
- The examples are getting executed within testing/integration packages against real etcd
- The examples are symlinked to 'unit' tests, such that they included in documentation.
- Long-term the unit examples should get rewritten to use 'mocks' instead of real integration tests.
- We were leaking goroutines in auth-test
- The go-routines were depending / modifying global test environment
variables (simpleTokenTTLDefault) leading to races
Removed the leaked go-routines, and expanded 'auth' package to
be covered we leaked go-routines detection.
To improve debuggability of `agreement among raft nodes before
linearized reading`, we added some tracing inside
`linearizableReadLoop`.
This will allow us to know the timing of `s.r.ReadIndex` vs
`s.applyWait.Wait(rs.Index)`.
Examplar flake: https://travis-ci.com/github/etcd-io/etcd/jobs/388806782
```
go test -timeout=5m -cpu=1 --run=Example ./client/...
ok go.etcd.io/etcd/v3/client 0.085s
testing: warning: no tests to run
PASS
Unexpected goroutines running after all test(s).
1 instances of:
text/template/parse.(*lexer).emit(...)
/usr/local/go/src/text/template/parse/lex.go:157
text/template/parse.lexText(...)
/usr/local/go/src/text/template/parse/lex.go:269 +0x4f0
text/template/parse.(*lexer).run(...)
/usr/local/go/src/text/template/parse/lex.go:230 +0x37
created by text/template/parse.lex
/usr/local/go/src/text/template/parse/lex.go:223 +0x190
FAIL go.etcd.io/etcd/v3/client/integration 0.013s
```
SubTraceStart and SubTraceEnd steps are only placeholders, not really
steps, we should skip them when logging the long duration steps,
otherwise these steps will lead to incorrect start time and duration
of subsequent steps.
Direct syscalls using syscall.Syscall(SYS_*, ...) should no longer be
used on darwin, see [1]. Instead, use the fcntl libSystem wrappers
provided by the golang.org/x/sys/unix package which implement the same
functionality.
[1] https://golang.org/doc/go1.12#darwin
The flake happened e.g. in:
https://travis-ci.com/github/etcd-io/etcd/jobs/386607570
```
--- PASS: TestWatchClose (0.37s)
PASS
Unexpected goroutines running after all test(s).
1 instances of:
testing.runTests.func1.1(...)
/usr/local/go/src/testing/testing.go:1289 +0x60
created by testing.runTests.func1
/usr/local/go/src/testing/testing.go:1289 +0xdb
FAIL go.etcd.io/etcd/v3/clientv3/integration 344.389s
FAIL
```
This is implementation detail of Go testing.lib and we should not worry.
Marked all 'integrational, e2e' as skipped in the --short mode.
Thanks to this we will be able to significantly simplify ./test script.
The run currently takes ~23s.
With (follow up) move of ~clientv3/snapshot to integration tests (as
part of modularization), we can expect this to fall to 5-10s.
```
% time go test --short ./... --count=1
ok go.etcd.io/etcd/v3 0.098s
? go.etcd.io/etcd/v3/Documentation/learning/lock/client [no test files]
? go.etcd.io/etcd/v3/Documentation/learning/lock/storage [no test files]
ok go.etcd.io/etcd/v3/auth 0.724s
? go.etcd.io/etcd/v3/auth/authpb [no test files]
ok go.etcd.io/etcd/v3/client 0.166s
ok go.etcd.io/etcd/v3/client/integration 0.166s
ok go.etcd.io/etcd/v3/clientv3 3.219s
ok go.etcd.io/etcd/v3/clientv3/balancer 1.102s
? go.etcd.io/etcd/v3/clientv3/balancer/connectivity [no test files]
? go.etcd.io/etcd/v3/clientv3/balancer/picker [no test files]
? go.etcd.io/etcd/v3/clientv3/balancer/resolver/endpoint [no test files]
ok go.etcd.io/etcd/v3/clientv3/clientv3util 0.096s [no tests to run]
ok go.etcd.io/etcd/v3/clientv3/concurrency 3.323s
? go.etcd.io/etcd/v3/clientv3/credentials [no test files]
ok go.etcd.io/etcd/v3/clientv3/integration 0.131s
? go.etcd.io/etcd/v3/clientv3/leasing [no test files]
? go.etcd.io/etcd/v3/clientv3/mirror [no test files]
ok go.etcd.io/etcd/v3/clientv3/namespace 0.041s
ok go.etcd.io/etcd/v3/clientv3/naming 0.115s
ok go.etcd.io/etcd/v3/clientv3/ordering 0.121s
ok go.etcd.io/etcd/v3/clientv3/snapshot 19.325s
ok go.etcd.io/etcd/v3/clientv3/yaml 0.090s
ok go.etcd.io/etcd/v3/contrib/raftexample 7.572s
? go.etcd.io/etcd/v3/contrib/recipes [no test files]
ok go.etcd.io/etcd/v3/embed 0.282s
ok go.etcd.io/etcd/v3/etcdctl 0.054s
? go.etcd.io/etcd/v3/etcdctl/ctlv2 [no test files]
ok go.etcd.io/etcd/v3/etcdctl/ctlv2/command 0.117s
? go.etcd.io/etcd/v3/etcdctl/ctlv3 [no test files]
ok go.etcd.io/etcd/v3/etcdctl/ctlv3/command 0.070s
ok go.etcd.io/etcd/v3/etcdmain 0.172s
ok go.etcd.io/etcd/v3/etcdserver 1.698s
? go.etcd.io/etcd/v3/etcdserver/api [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/etcdhttp 0.075s
ok go.etcd.io/etcd/v3/etcdserver/api/membership 0.104s
? go.etcd.io/etcd/v3/etcdserver/api/membership/membershippb [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/rafthttp 0.181s
ok go.etcd.io/etcd/v3/etcdserver/api/snap 0.078s
? go.etcd.io/etcd/v3/etcdserver/api/snap/snappb [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v2auth 0.142s
ok go.etcd.io/etcd/v3/etcdserver/api/v2discovery 0.035s
ok go.etcd.io/etcd/v3/etcdserver/api/v2error 0.043s
ok go.etcd.io/etcd/v3/etcdserver/api/v2http 0.070s
ok go.etcd.io/etcd/v3/etcdserver/api/v2http/httptypes 0.031s
? go.etcd.io/etcd/v3/etcdserver/api/v2stats [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v2store 0.645s
ok go.etcd.io/etcd/v3/etcdserver/api/v2v3 0.218s
? go.etcd.io/etcd/v3/etcdserver/api/v3alarm [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3client [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v3compactor 1.765s
? go.etcd.io/etcd/v3/etcdserver/api/v3election [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3election/v3electionpb [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3election/v3electionpb/gw [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock/v3lockpb [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock/v3lockpb/gw [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v3rpc 0.091s
ok go.etcd.io/etcd/v3/etcdserver/api/v3rpc/rpctypes 0.012s
ok go.etcd.io/etcd/v3/etcdserver/cindex 0.054s
ok go.etcd.io/etcd/v3/etcdserver/etcdserverpb 0.039s
? go.etcd.io/etcd/v3/etcdserver/etcdserverpb/gw [no test files]
ok go.etcd.io/etcd/v3/functional/agent 0.094s
? go.etcd.io/etcd/v3/functional/cmd/etcd-agent [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-proxy [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-runner [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-tester [no test files]
ok go.etcd.io/etcd/v3/functional/rpcpb 0.060s
? go.etcd.io/etcd/v3/functional/runner [no test files]
ok go.etcd.io/etcd/v3/functional/tester 0.079s
ok go.etcd.io/etcd/v3/integration 0.684s
ok go.etcd.io/etcd/v3/integration/embed 0.101s
ok go.etcd.io/etcd/v3/lease 3.455s
ok go.etcd.io/etcd/v3/lease/leasehttp 2.185s
? go.etcd.io/etcd/v3/lease/leasepb [no test files]
ok go.etcd.io/etcd/v3/mvcc 7.246s
ok go.etcd.io/etcd/v3/mvcc/backend 0.354s
? go.etcd.io/etcd/v3/mvcc/mvccpb [no test files]
ok go.etcd.io/etcd/v3/pkg/adt 0.025s
? go.etcd.io/etcd/v3/pkg/contention [no test files]
? go.etcd.io/etcd/v3/pkg/cpuutil [no test files]
ok go.etcd.io/etcd/v3/pkg/crc 0.008s
? go.etcd.io/etcd/v3/pkg/debugutil [no test files]
ok go.etcd.io/etcd/v3/pkg/expect 0.015s
ok go.etcd.io/etcd/v3/pkg/fileutil 0.268s
ok go.etcd.io/etcd/v3/pkg/flags 0.021s
ok go.etcd.io/etcd/v3/pkg/httputil 0.020s
ok go.etcd.io/etcd/v3/pkg/idutil 0.008s
ok go.etcd.io/etcd/v3/pkg/ioutil 0.025s
ok go.etcd.io/etcd/v3/pkg/logutil 0.047s
? go.etcd.io/etcd/v3/pkg/mock/mockserver [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockstorage [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockstore [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockwait [no test files]
ok go.etcd.io/etcd/v3/pkg/netutil 1.024s
ok go.etcd.io/etcd/v3/pkg/osutil 0.021s
ok go.etcd.io/etcd/v3/pkg/pathutil 0.008s
ok go.etcd.io/etcd/v3/pkg/pbutil 0.008s
ok go.etcd.io/etcd/v3/pkg/proxy 4.081s
ok go.etcd.io/etcd/v3/pkg/report 0.008s
? go.etcd.io/etcd/v3/pkg/runtime [no test files]
ok go.etcd.io/etcd/v3/pkg/schedule 0.009s
ok go.etcd.io/etcd/v3/pkg/srv 0.019s
ok go.etcd.io/etcd/v3/pkg/stringutil 0.008s
? go.etcd.io/etcd/v3/pkg/systemd [no test files]
ok go.etcd.io/etcd/v3/pkg/testutil 0.023s
ok go.etcd.io/etcd/v3/pkg/tlsutil 3.965s
ok go.etcd.io/etcd/v3/pkg/traceutil 0.034s
ok go.etcd.io/etcd/v3/pkg/transport 0.532s
ok go.etcd.io/etcd/v3/pkg/types 0.028s
ok go.etcd.io/etcd/v3/pkg/wait 0.023s
ok go.etcd.io/etcd/v3/proxy/grpcproxy 0.101s
? go.etcd.io/etcd/v3/proxy/grpcproxy/adapter [no test files]
? go.etcd.io/etcd/v3/proxy/grpcproxy/cache [no test files]
ok go.etcd.io/etcd/v3/proxy/httpproxy 0.044s
ok go.etcd.io/etcd/v3/proxy/tcpproxy 0.047s
ok go.etcd.io/etcd/v3/raft 0.312s
ok go.etcd.io/etcd/v3/raft/confchange 0.183s
ok go.etcd.io/etcd/v3/raft/quorum 0.316s
ok go.etcd.io/etcd/v3/raft/raftpb 0.024s
ok go.etcd.io/etcd/v3/raft/rafttest 0.640s
ok go.etcd.io/etcd/v3/raft/tracker 0.026s
ok go.etcd.io/etcd/v3/tests/e2e 0.077s
? go.etcd.io/etcd/v3/tools/benchmark [no test files]
? go.etcd.io/etcd/v3/tools/benchmark/cmd [no test files]
? go.etcd.io/etcd/v3/tools/etcd-dump-db [no test files]
ok go.etcd.io/etcd/v3/tools/etcd-dump-logs 0.088s
? go.etcd.io/etcd/v3/tools/etcd-dump-metrics [no test files]
? go.etcd.io/etcd/v3/tools/local-tester/bridge [no test files]
? go.etcd.io/etcd/v3/version [no test files]
ok go.etcd.io/etcd/v3/wal 1.517s
? go.etcd.io/etcd/v3/wal/walpb [no test files]
go test --short ./... --count=1 76.12s user 12.57s system 375% cpu 23.635 total
```
We have following communication schema:
client --- 1 ---> grpc-proxy --- 2 --- > etcd-server
There are 2 sets of flags/certs in grpc proxy [ https://github.com/etcd-io/etcd/blob/master/etcdmain/grpc_proxy.go#L140 ]:
A. (cert-file, key-file, trusted-ca-file, auto-tls) this are controlling [1] so client to proxy connection and in particular they are describing proxy public identity.
B. (cert,key, cacert ) - these are controlling [2] so what's the identity that proxy uses to make connections to the etcd-server.
If 2 (B.) contains certificate with CN and etcd-server is running with --client-cert-auth=true, the CN can be used as identity of 'client' from service perspective. This is permission escalation, that we should forbid.
If 1 (A.) contains certificate with CN - it should be considered perfectly valid. The server can (should) have full identity.
So only --cert flag (and not --cert-file flag) should be validated for empty CN.
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.
Tested with:
go test ./clientv3/. -v "--run=(Example).*" --count=2
go test ./clientv3/. -v "--run=(Test).*" --count=2
go test ./integration/embed/. -v "--run=(Test).*" --count=2
The fix is needed to mitigate consequences of
https://github.com/golang/go/issues/29458 "golang breaking change" that
causes following test failures on etcd end:
--- FAIL: TestCtlV2Set (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetQuorum (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetClientTLS (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
os.MkdirAll creates directory before umask so make sure that a desired
permission is set after creating a directory with MkdirAll. Use the
existing TouchDirAll function which checks for permission if dir is already
exist and when create a new dir.