Current benchmark doesn't have an option for configuring dial timeout
of gRPC. This commit adds --dial-timeout for the purpose. It is useful
for stopping long sticking benchmarks.
functional tester sometime experiences timeout during compaction phase. I changed the timeout calculation base on number of entries created and deleted.
FIX#6805
getting lease and keys info through raw rpcs rarely experience error such as EOF. This is considered as a failure and causes tester to clean up.
however, they are just transient problem with temporary connection issue which should not be considered as a testing failure. so we add retry logic in case of transient failure.
FIX#6754
The checkers and stressers should be composable without special cases; this
patch tries to address that while refactoring out some old cruft.
Namely,
* Single stresser/checker for a tester; built from composition
* Composite stresser via comma-separated list of stressers
* Split stressers into separate files
* Removed v2 only flags and special cases
* Rate limiter shared among key stresser and leases stresser
* Composite checker is now concurrent
* Stresser can return a Checker to check its invariants
* Each lease checker only operates on a single lease stresser
These really belong in tester code; the stressers and
checkers are higher order operations that are orchestrated
by the tester. They're not really cluster primitives.
The current tester doesn't not clean up if any of the failure injection/recovery fails. if tester fails to recover a dead node, tester hangs in the next round because the tester will keep waiting until cluster becomes healthy which is impossible since a node is down. To fix this issue, we will always clean up if any error happens during each round so that cluster will be healthy for next round.
FIX#6743
lease stresser now generates short lived leases that will expire before invariant checking.
this addition verifies that the expired leases are indeed being deleted on the sever side.
increasing lease TTL ensure that lease doesn't expire during hashes stabilization period.
I observed that it can take a long time for etcd cluster to become stable.