This commit adds flags for profiling with runtime/pprof to storage
put:
- --cpuprofile: specify a path of CPU profiling result, if it is not
empty, profiling is activated
- --memprofile: specify a path of heap profiling result, if it is not
empty, profiling is activated
Of course, the flags should be added to RootCmd ideally. However,
adding common flags that shared by children command requires the
ongoing PR: https://github.com/spf13/cobra/pull/220 . Therefore this
commit adds the flags to storage put only.
Current etcd repository has a test for benchmarking a storage backend
in storage/kvstore_bench_test.go. However, it is hard to test various
parameters (e.g. batch interval, a number of keys, etc) with the test.
This commit adds a new benchmarking subcommand "storage" to
tools/benchmark. It will encourage analysis of storage backends with
various parameter and complex workloads.
Exmaple usage:
$ ./benchmark storage put
total: 9.894173792s
average: 9.894173ms
minimum latency: 6.596991ms
maximum latency: 29.455695ms
Reports depended on writing all results to a large buffered channel and
reading from that synchronously. Similarly, requests were buffered the
same way which can take significant memory on big request strings. Instead,
have reports stream in results as they're produced then print when the
results channel closes.
Watch command run benchmark tests for watch releated operations:
1. watch keys 2. sending events to watchers.
To test 2, this test also puts keys into etcd cluster to trigger
event sending.
Besides the benchmark results showed at client side, the tester
can also monitor the server-side mem/cpu usage and also check
the metrics of slow watchers. If there are a lot of slow watchers,
it means etcd server is over-loaded by watchers.
Extend the timeout from 1s to defaultRequestTimeout 5s.
The 1s may bring unwanted burden to the target member. If the member is
busy at recovering, it has limited bandwidth for client requests. A
short timeout at client side will retry quickly while keeping the
on-going connections. Thus, etcd will queue lots of requests and
connections and takes long time to clear them. This finally causes the
timeout of member health check.
This problem is a general one that how etcd handles amounts of requests
at the same time in a good way. We don't plan to address it at current
stage.