This fixes failed RPC rate query, where we do not need
subtraction because we already query by the status code.
Also adds grpc_method to make it more specific. Most of the
time, the failure recovers within 10-second, which is our
Prometheus scrap interval, so 'rate' query might not cover
that time window, showing as 0s, but still shows up in the graph.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
It is almost same to Documentation/v2/authentication.md because a
major part of its user interface is shared with the v2 auth. The newly
added doc includes some refinements for the v3 auth.
This commit adds jwt token support in v3 auth API.
Remaining major ToDos:
- Currently token type isn't hidden from etcdserver. In the near
future the information should be completely invisible from
etcdserver package.
- Configurable expiration of token. Currently tokens can be valid
until keys are changed.
How to use:
1. generate keys for signing and verfying jwt tokens:
$ openssl genrsa -out app.rsa 1024
$ openssl rsa -in app.rsa -pubout > app.rsa.pub
2. add command line options to etcd like below:
--auth-token-type jwt \
--auth-jwt-pub-key app.rsa.pub --auth-jwt-priv-key app.rsa \
--auth-jwt-sign-method RS512
3. launch etcd cluster
Below is a performance comparison of serializable read w/ and w/o jwt
token. Every (3) etcd node is executed on a single machine. Signing
method is RS512 and key length is 1024 bit. As the results show, jwt
based token introduces a performance overhead but it would be
acceptable for a case that requires authentication.
w/o jwt token auth (no auth):
Summary:
Total: 1.6172 secs.
Slowest: 0.0125 secs.
Fastest: 0.0001 secs.
Average: 0.0002 secs.
Stddev: 0.0004 secs.
Requests/sec: 6183.5877
Response time histogram:
0.000 [1] |
0.001 [9982] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.003 [1] |
0.004 [1] |
0.005 [0] |
0.006 [0] |
0.008 [6] |
0.009 [0] |
0.010 [1] |
0.011 [5] |
0.013 [3] |
Latency distribution:
10% in 0.0001 secs.
25% in 0.0001 secs.
50% in 0.0001 secs.
75% in 0.0001 secs.
90% in 0.0002 secs.
95% in 0.0002 secs.
99% in 0.0003 secs.
w/ jwt token auth:
Summary:
Total: 2.5364 secs.
Slowest: 0.0182 secs.
Fastest: 0.0002 secs.
Average: 0.0003 secs.
Stddev: 0.0005 secs.
Requests/sec: 3942.5185
Response time histogram:
0.000 [1] |
0.002 [9975] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.004 [0] |
0.006 [1] |
0.007 [11] |
0.009 [2] |
0.011 [4] |
0.013 [5] |
0.015 [0] |
0.016 [0] |
0.018 [1] |
Latency distribution:
10% in 0.0002 secs.
25% in 0.0002 secs.
50% in 0.0002 secs.
75% in 0.0002 secs.
90% in 0.0003 secs.
95% in 0.0003 secs.
99% in 0.0004 secs.
Keep more wal entries in memory for fast follower recovery.
10,000 was a too small number that triggers quite a few snapshots.
ZK proves that 100,000 is a reasonable number for even old less prowerful
machines.
Eventually we should provide both count and max memory (for large entries).