We have following communication schema:
client --- 1 ---> grpc-proxy --- 2 --- > etcd-server
There are 2 sets of flags/certs in grpc proxy [ https://github.com/etcd-io/etcd/blob/master/etcdmain/grpc_proxy.go#L140 ]:
A. (cert-file, key-file, trusted-ca-file, auto-tls) this are controlling [1] so client to proxy connection and in particular they are describing proxy public identity.
B. (cert,key, cacert ) - these are controlling [2] so what's the identity that proxy uses to make connections to the etcd-server.
If 2 (B.) contains certificate with CN and etcd-server is running with --client-cert-auth=true, the CN can be used as identity of 'client' from service perspective. This is permission escalation, that we should forbid.
If 1 (A.) contains certificate with CN - it should be considered perfectly valid. The server can (should) have full identity.
So only --cert flag (and not --cert-file flag) should be validated for empty CN.
Executed:
(cd ./integration/fixtures && ./gencerts.sh)
This in particular cereated a new client-nocn.crt (and key) that can be
used for testing grpc-proxy -> etcd-server connections.
integration/fixtures/gencerts.sh:
- refactored common logic to a helper function
- added definition for client-nocn certificate
(used for grpc-proxy -> etcd-server) communication.
This is to aid with debugging the effectiveness of systems that
manually take care of cluster compaction, and have greater visibity
into recent compactions.
It can be handy to alert on the exactly how long it was since a
compaction (and also to put on dashboards) had happened.
---
Tested using a test cluster, the final result looks like this:
```
root@etcd-1:~# ETCDCTL_API=3 /tmp/test-etcd/etcdctl --endpoints=192.168.232.10:2379 compact 1012
compacted revision 1012
root@etcd-1:~# curl -s 192.168.232.10:2379/metrics | grep last
# HELP etcd_debugging_mvcc_db_compaction_last The unix time since the last db compaction. Resets to 0 on start.
# TYPE etcd_debugging_mvcc_db_compaction_last gauge
etcd_debugging_mvcc_db_compaction_last 1.595873939e+09
```
From etcd-dev discussion:
https://groups.google.com/u/2/g/etcd-dev/c/oMGSBqs_7sc
I have been working on this system called Asset Transparency[1] which
helps users verify they have received the correct contents from a URL.
If you are familiar with the "download a file, download a SHA256SUM
file, run `sha256sum -c`, etc" process? This tool helps to automate
that for users into something like this[2]:
$ tl get https://github.com/etcd-io/etcd/releases/download/v3.4.12/etcd-v3.4.12-darwin-amd64.zip
And a best practice for this Asset Transparency system is that URLs
are registered with the log as soon as possible. Why? Well, the sooner
a URL is entered the longer it can protect people consuming a URL from
unexpected content modification from say a GitHub credential
compromise.
To that end I have written a GitHub Action[3] that will automatically
do that on every release. It is easy to activate and should be hands
free after installation. So, before I enable it I want to see if there
are any concerns from maintainers. The only change to our repo will be
a new file in .github/workflows.
[1] https://www.transparencylog.com
[2] https://github.com/transparencylog/tl
[3] https://github.com/transparencylog/publish-releases-asset-transparency-action
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.
e.g.
os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
The test is vary flaky on Travis.
Seems that since (https://github.com/etcd-io/etcd/issues/7724) the
client is expected to simply ignore whether server is in AuthDisabled
mode even if the user supplies credentials.
The tests used to:
* use very large cluster (10 nodes)
* set very low timeout (1 sec)
Such setup led to frequent deadlineExceed errors or following failures:
=== RUN TestGetTokenWithoutAuth
{"level":"warn","ts":"2020-08-04T16:50:48.686+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-35573307-1ee5-441b-acc7-d073f0bd7de5/localhost:69820396562031027440","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
user_test.go:151: other errors:etcdserver: leader changed
--- FAIL: TestGetTokenWithoutAuth (10.91s)
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.
Tested with:
go test ./clientv3/. -v "--run=(Example).*" --count=2
go test ./clientv3/. -v "--run=(Test).*" --count=2
go test ./integration/embed/. -v "--run=(Test).*" --count=2
Fixes following problems during "./etcd# go test ./..."
> go.etcd.io/etcd/v3/etcdserver/api/v2store_test
etcdserver/api/v2store/store_test.go:847:24: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
> go.etcd.io/etcd/v3/wal
wal/wal_test.go:242:68: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
The fix is needed to mitigate consequences of
https://github.com/golang/go/issues/29458 "golang breaking change" that
causes following test failures on etcd end:
--- FAIL: TestCtlV2Set (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetQuorum (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetClientTLS (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)