Persistent data should be configured in agent side.
There is no need to specify the data-dir in tester side.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
etcd tester runs etcd runner as a separate binary.
it signals sigstop to the runner when tester wants to stop stressing.
it signals sigcont to the runner when tester wants to start stressing.
when tester needs to clean up, it signals sigint to runner.
FIXES#7026
etcd panic-ed, so defrag response just blocked for "days"
when the actual 'v3rpc' path never returned.
We should catch this earlier.
ref. https://github.com/coreos/etcd/issues/7526
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
functional tester sometime experiences timeout during compaction phase. I changed the timeout calculation base on number of entries created and deleted.
FIX#6805
getting lease and keys info through raw rpcs rarely experience error such as EOF. This is considered as a failure and causes tester to clean up.
however, they are just transient problem with temporary connection issue which should not be considered as a testing failure. so we add retry logic in case of transient failure.
FIX#6754
The checkers and stressers should be composable without special cases; this
patch tries to address that while refactoring out some old cruft.
Namely,
* Single stresser/checker for a tester; built from composition
* Composite stresser via comma-separated list of stressers
* Split stressers into separate files
* Removed v2 only flags and special cases
* Rate limiter shared among key stresser and leases stresser
* Composite checker is now concurrent
* Stresser can return a Checker to check its invariants
* Each lease checker only operates on a single lease stresser
These really belong in tester code; the stressers and
checkers are higher order operations that are orchestrated
by the tester. They're not really cluster primitives.
The current tester doesn't not clean up if any of the failure injection/recovery fails. if tester fails to recover a dead node, tester hangs in the next round because the tester will keep waiting until cluster becomes healthy which is impossible since a node is down. To fix this issue, we will always clean up if any error happens during each round so that cluster will be healthy for next round.
FIX#6743
lease stresser now generates short lived leases that will expire before invariant checking.
this addition verifies that the expired leases are indeed being deleted on the sever side.
increasing lease TTL ensure that lease doesn't expire during hashes stabilization period.
I observed that it can take a long time for etcd cluster to become stable.
I move the checker logic from tester to cluster so that stressers and checkers can be initialized at the same time.
this is useful because some checker depends on stressers.
This commit adds a new option -failure-wrapper to etcd-tester. The
option receives a path of script that is used for enabling/disabling
external fault injectors. The script is called with an option "enable"
when it needs to be enabled (when failure.Inject() is called) and
called with "disabled" in an opposite case (when failure.Recover() is
called).
This commit adds a new option --failures to etcd-tester. The option
receives a comma-delimited argument like this:
"default,failpoints". The given arguments are interpreted as names of
failures and they are injected to an etcd cluster. Available failures
are default (default scenario in etcd-tester) and failpoints. If no
args are passed to the option (--failures=""), no failures are
injected during testing.
This commit decouples stresser from the tester of
functional-tester. For doing it, this commit adds a new option
--stresser to etcd-tester. The option accepts two types of stresser:
"default" and "nop". If the option is "default", etcd-tester stresses
its etcd cluster with the existing stresser. If the option is "nop",
etcd-tester does nothing for stressing.
Partially fixes https://github.com/coreos/etcd/issues/6446
The following format "http://localhost:1234" causes existing port parser to fail. Add new logic to parse the host name first then extract port.
Fixes#6409