Thanks to this, unix sockets should be not longer
created by integration tests in the the source code directory,
so potentially trigger IDE reloads and unnecessery load (and mess).
The root reason of flakes, was that server was considered as ready to
early.
In particular:
```
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:44.474+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ed5f620d34a8e61b","remote-peer-id":"ca50e9357181d758"}
../../bin/etcd-2456648: {"level":"warn","ts":"2021-01-11T09:56:49.040+0100","caller":"etcdserver/server.go:1942","msg":"failed to publish local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to connection lost"}
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:49.049+0100","caller":"etcdserver/server.go:1921","msg":"published local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","cluster-id":"34f27e83b3bc2ff","publish-timeout":"7s"}
```
was taking 5s. If this was happening concurrently with etcdctl, the
etcdctl could timeout.
The fix, requires servers to report 'ready to serve client requests' to consider them up.
Fixed also some whitelisted 'goroutines'.
We introduce a LazyCluster abstraction (instead of copy-pasted logic)
that makes clusters to be created only if there are runnable tests
in need for the infrastructure.
This CL tries to connect 2 objectives:
- Examples should be close (the same package) to the original code,
such that they can participate in documentation.
- Examples should be runnable - such that they are not getting out of
sync with underlying API/implementation.
In case of etcd-client, the examples are assuming running 'integration'
style, i.e. thay do connect to fully functional etcd-server.
That would lead to a cyclic dependencies between modules:
- server depends on client (as client need to be lightweight)
- client (for test purposes) depend on server.
Go modules does not allow to distingush testing dependency from
prod-code dependency.
Thus to meet the objective:
- The examples are getting executed within testing/integration packages against real etcd
- The examples are symlinked to 'unit' tests, such that they included in documentation.
- Long-term the unit examples should get rewritten to use 'mocks' instead of real integration tests.
- We were leaking goroutines in auth-test
- The go-routines were depending / modifying global test environment
variables (simpleTokenTTLDefault) leading to races
Removed the leaked go-routines, and expanded 'auth' package to
be covered we leaked go-routines detection.
Examplar flake: https://travis-ci.com/github/etcd-io/etcd/jobs/388806782
```
go test -timeout=5m -cpu=1 --run=Example ./client/...
ok go.etcd.io/etcd/v3/client 0.085s
testing: warning: no tests to run
PASS
Unexpected goroutines running after all test(s).
1 instances of:
text/template/parse.(*lexer).emit(...)
/usr/local/go/src/text/template/parse/lex.go:157
text/template/parse.lexText(...)
/usr/local/go/src/text/template/parse/lex.go:269 +0x4f0
text/template/parse.(*lexer).run(...)
/usr/local/go/src/text/template/parse/lex.go:230 +0x37
created by text/template/parse.lex
/usr/local/go/src/text/template/parse/lex.go:223 +0x190
FAIL go.etcd.io/etcd/v3/client/integration 0.013s
```
The flake happened e.g. in:
https://travis-ci.com/github/etcd-io/etcd/jobs/386607570
```
--- PASS: TestWatchClose (0.37s)
PASS
Unexpected goroutines running after all test(s).
1 instances of:
testing.runTests.func1.1(...)
/usr/local/go/src/testing/testing.go:1289 +0x60
created by testing.runTests.func1
/usr/local/go/src/testing/testing.go:1289 +0xdb
FAIL go.etcd.io/etcd/v3/clientv3/integration 344.389s
FAIL
```
This is implementation detail of Go testing.lib and we should not worry.
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.
Tested with:
go test ./clientv3/. -v "--run=(Example).*" --count=2
go test ./clientv3/. -v "--run=(Test).*" --count=2
go test ./integration/embed/. -v "--run=(Test).*" --count=2
This change makes the etcd package compatible with the existing Go
ecosystem for module versioning.
Used this tool to update package imports:
https://github.com/KSubedi/gomove
Leak detector is catching goroutines trying to close files which appear
runtime related:
1 instances of:
syscall.Syscall(...)
/usr/local/golang/1.8.3/go/src/syscall/asm_linux_386.s:20 +0x5
syscall.Close(...)
/usr/local/golang/1.8.3/go/src/syscall/zsyscall_linux_386.go:296 +0x3d
os.(*file).close(...)
/usr/local/golang/1.8.3/go/src/os/file_unix.go:140 +0x62
It's unlikely a user goroutine will leak on file close; whitelist it.
Calling a WaitGroup.Done() in a defer will sometimes trigger the leak
detector since the WaitGroup.Wait() will unblock before the defer
block completes. If the leak detector runs before the Done() is
rescheduled, it will spuriously report the finishing Done() as a leak.
This happens enough in CI to be irritating; whitelist it and ignore.