This CL fixes:
COVERDIR=./coverage PASSES="build_cov" && go test --tags cov -v ./tests/e2e
and is part of the effort to make:
COVERDIR=coverage PASSES="build_cov cov" ./test
fully pass.
The args passed to ./bin/etcd_test and ./bin/etcdctl_test binaries were
mismatched. The protocol of passing the arguments using
environment variables has been replaces with proper passing of flags.
How the measurement of coverage by e2e tests works:
1. COVERDIR=./coverage PASSES="build_cov" are generating
./bin/etcd_test and ./bin/etcdctl_test binaries.
2. These binaries are tests (as coverage can be computed only for
tests) [see ./main_test.go ./etcdctl/main_test.go], but this tests are
running the main logic of the server and uppon termination (or SIGTERM
signal) are writting proper .coverprofile files in the $COVERDIR folder.
The binaries used to take arguments using env variables, but its not
needed any longer. The binaries can consume any command line arguments
that either test (so --test.fooo) or the original binary can consume.
3. The tests/e2e (when compiled with the --tags cov) are starting the
_test binaries instead of the original binaries, such that the coverage
is being collected.
This is to aid with debugging the effectiveness of systems that
manually take care of cluster compaction, and have greater visibity
into recent compactions.
It can be handy to alert on the exactly how long it was since a
compaction (and also to put on dashboards) had happened.
---
Tested using a test cluster, the final result looks like this:
```
root@etcd-1:~# ETCDCTL_API=3 /tmp/test-etcd/etcdctl --endpoints=192.168.232.10:2379 compact 1012
compacted revision 1012
root@etcd-1:~# curl -s 192.168.232.10:2379/metrics | grep last
# HELP etcd_debugging_mvcc_db_compaction_last The unix time since the last db compaction. Resets to 0 on start.
# TYPE etcd_debugging_mvcc_db_compaction_last gauge
etcd_debugging_mvcc_db_compaction_last 1.595873939e+09
```
From etcd-dev discussion:
https://groups.google.com/u/2/g/etcd-dev/c/oMGSBqs_7sc
I have been working on this system called Asset Transparency[1] which
helps users verify they have received the correct contents from a URL.
If you are familiar with the "download a file, download a SHA256SUM
file, run `sha256sum -c`, etc" process? This tool helps to automate
that for users into something like this[2]:
$ tl get https://github.com/etcd-io/etcd/releases/download/v3.4.12/etcd-v3.4.12-darwin-amd64.zip
And a best practice for this Asset Transparency system is that URLs
are registered with the log as soon as possible. Why? Well, the sooner
a URL is entered the longer it can protect people consuming a URL from
unexpected content modification from say a GitHub credential
compromise.
To that end I have written a GitHub Action[3] that will automatically
do that on every release. It is easy to activate and should be hands
free after installation. So, before I enable it I want to see if there
are any concerns from maintainers. The only change to our repo will be
a new file in .github/workflows.
[1] https://www.transparencylog.com
[2] https://github.com/transparencylog/tl
[3] https://github.com/transparencylog/publish-releases-asset-transparency-action
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.
e.g.
os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
The test is vary flaky on Travis.
Seems that since (https://github.com/etcd-io/etcd/issues/7724) the
client is expected to simply ignore whether server is in AuthDisabled
mode even if the user supplies credentials.
The tests used to:
* use very large cluster (10 nodes)
* set very low timeout (1 sec)
Such setup led to frequent deadlineExceed errors or following failures:
=== RUN TestGetTokenWithoutAuth
{"level":"warn","ts":"2020-08-04T16:50:48.686+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-35573307-1ee5-441b-acc7-d073f0bd7de5/localhost:69820396562031027440","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
user_test.go:151: other errors:etcdserver: leader changed
--- FAIL: TestGetTokenWithoutAuth (10.91s)
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.
Tested with:
go test ./clientv3/. -v "--run=(Example).*" --count=2
go test ./clientv3/. -v "--run=(Test).*" --count=2
go test ./integration/embed/. -v "--run=(Test).*" --count=2