175 Commits

Author SHA1 Message Date
Marek Siarkowicz
5a8c8b703b
Merge pull request #17807 from serathius/robustness-resumable-revision-zero
Resumable handles watch with revision zero
2024-04-16 19:41:53 +02:00
Marek Siarkowicz
dc187ce6e8 Validate bookmarkable checks the last event before progress notify
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-16 09:17:40 +02:00
Marek Siarkowicz
94a47a7cbd Resumable handles watch with revision zero
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-15 20:23:51 +02:00
Marek Siarkowicz
042e7d1a0c Add filter validation to ensure watch only includes events within selector
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-15 20:05:08 +02:00
Marek Siarkowicz
a95a307698 Add tests to watch validation
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-14 21:38:03 +02:00
Marek Siarkowicz
569693be8d Utilize WAL to patch operation history
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-14 12:09:38 +02:00
Marek Siarkowicz
452445e2d8
Merge pull request #17781 from serathius/robustness-read-limit
Remove limit from read requests after a failed write
2024-04-14 12:05:23 +02:00
Marek Siarkowicz
2e6eebef85
Merge pull request #17759 from serathius/robustness-assumptions
Add explicit checks for assumptions in robustness test validation
2024-04-13 00:19:25 +02:00
Marek Siarkowicz
d0bf8ddca4 Improve description for Kubernetes CAS operations
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-12 16:18:31 +02:00
Marek Siarkowicz
cadfc407e9 Remove limit from read requests after a failed write
Limit can cause multiple request due to pagination.
For reads after a failed write we would like to return to normal write
request as soon as possible.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-12 15:01:17 +02:00
Marek Siarkowicz
f8de338ab2 Add explicit checks for assumptions in robustness test validation
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-12 14:18:22 +02:00
Marek Siarkowicz
bfbfee0afa
Merge pull request #17768 from serathius/robustness-success-rate
[Robustness] Collect failed read operations to calculate request success rate
2024-04-12 09:46:20 +02:00
Marek Siarkowicz
718d5ba2b4 Calculate request success rate to provide signal to performance debugging
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-11 09:36:17 +02:00
Marek Siarkowicz
ae7f79fd63 Refactor append from appendFailed and appendSuccesfull
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-11 09:36:17 +02:00
Marek Siarkowicz
65130c6d21 Refactor merge succesfull and failed operation in history
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-10 21:11:46 +02:00
Marek Siarkowicz
229275d46e Refactor appendSuccesful and appendFailed methods to match
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-10 10:33:19 +02:00
Marek Siarkowicz
41ac7e33a1 Don't cache test-robustness-reports
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-09 15:59:58 +02:00
Marek Siarkowicz
6cb4c3f90d Document re-evaluating existing robustness test reports
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-08 16:58:12 +02:00
Marek Siarkowicz
3a23994fbf Make no failpoint error more readable
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-07 15:13:59 +02:00
Marek Siarkowicz
e2bb8c698f Limit a timeout in testing robustness validation
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-04-06 12:28:57 +02:00
Marek Siarkowicz
09769c4be7
Merge pull request #17642 from fuweid/fix-17506
*: LeaseTimeToLive returns error if leader changed
2024-04-02 14:55:41 +02:00
Wei Fu
d3bb6f688b *: LeaseTimeToLive returns error if leader changed
The old leader demotes lessor and all the leases' expire time will be
updated. Instead of returning incorrect remaining TTL, we should return
errors to force client retry.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2024-03-26 18:55:01 +08:00
Ivan Valdes
0976398964
tests/robustness: address golangci var-naming issues
Signed-off-by: Ivan Valdes <ivan@vald.es>
2024-03-25 16:27:05 -07:00
thirdkeyword
fbda591866 fix some typos
Signed-off-by: thirdkeyword <fliterdashen@gmail.com>
2024-03-25 10:34:44 +08:00
Chao Chen
405862e807 Fix event loss after compaction
Signed-off-by: Chao Chen <chaochn@amazon.com>
2024-03-15 14:22:37 -07:00
Madhav Jivrajani
0b27570368 tests/robustness: use WithRequireLeader in Kubernetes traffic
Kubernetes uses WithRequireLeader to modify the context passed
to the Watch() and RequestProgress() calls in order to ensure
that a leader is present in the cluster before serving the request.

This commit mimics that behaviour in the Kubernetes traffic.

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-22 15:01:33 +05:30
Marek Siarkowicz
3a351c2fec Revert "tests/robustness: check for compaction before prevKV validation"
This reverts commit 5d7f58d14be1da5135d158dad6fc43391cbf6283.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-02-21 16:15:40 +01:00
Madhav Jivrajani
5d7f58d14b tests/robustness: check for compaction before prevKV validation
We can check for the condition that Kubernetes checks for, i.e.
prevKV can be nil iff the event is not a create a event, only if
we know whether compaction has occured or not. If compaction has
occured, prevKV can be nil and that is completely valid.

This commit checks if compaction took place during the test run
by querying the /metrics endpoint. Based on if compaction occured,
we now check the Kubernetes condition in the prevKV robustness test.

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-19 17:05:59 +05:30
Madhav Jivrajani
b51a834645 tests/robustness: allow persisting result reports for successful runs
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-14 16:28:47 +05:30
Madhav Jivrajani
cdd018ad2a tests/robustness: add a robustness test for validating create events
Split off valdiating create events from the prevKV test.
The added test tests the following two:
- A create event should not exist in our past history
- A non-create event should exist in our past history

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-14 16:28:44 +05:30
Madhav Jivrajani
4fa07a1c8a tests/robustness: make merging histories work on []PersistedEvent
Event histories after merging serve as an authorotative list of
events that can be seen as ones persisted by etcd, we don't need
PrevValue as part of it.

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-14 15:44:08 +05:30
Madhav Jivrajani
9aad6700d5 tests/robustness: add robustness test for watch with PrevKV()
Kubernetes relies on the PrevKV() option in the watches it opens
against etcd. This commit adds a robustness test to validate the
same.

A watch response returned with PrevKV() is valid if:
The value in current event's prevKV matches the previous
event's value of the same key if this is not a create event.

There are cases where there can be a prevKV for a create event
as well, for example if a watch is opened after the key is creatd.
Since we don't simulate for that, we don't check for that.

Further, this adjusts revision numbers such that we can successfully create
a new replay. Needed now since we will have unit tests with
and without PrevKV co-existing and we requite creation of a
new replay everytime we validate PrevKV.

We also regenerate test data with so that prevKV exists in it

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-13 22:55:57 +05:30
Madhav Jivrajani
f0f4e8a4e8 tests/robustness: fix out of index panic in model replay
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2024-02-01 16:14:35 +05:30
Marek Siarkowicz
4d3108246d
Merge pull request #17260 from serathius/validate-watch-without-event-history
Validate watch even if event history cannot be created
2024-01-25 16:01:01 +01:00
Marek Siarkowicz
f0d73c9d12 Separate robustness test scenarios and increase number of times we run exploratory tests in nightly
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-01-16 17:17:54 +01:00
Marek Siarkowicz
c37991cf8b Validate watch even if event history cannot be created
Creation of event history requires each client to return consistent
events. If clients observed inconsistent view of some revision, merging
will fail and return diff between two clients. This however doesn't
provide hint on what kind of issue happend.

This PR helps cases where there is an error with single watch
stream (like event duplication) by running normal watch validation even
without full event history.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-01-16 16:04:03 +01:00
Marek Siarkowicz
3471ef133d Add an e2e test and robustness failpoint around recovering from snapshot backend
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2024-01-04 15:25:24 +01:00
Jongwoo Han
08d799c4cc
Correct typo from 'Kuberntes' to 'Kubernetes'
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2023-12-20 18:09:31 +09:00
Benjamin Wang
3ab54f720f install gofail in module-aware mode and ignore go.mod file
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
2023-12-11 12:37:05 +00:00
Marek Siarkowicz
5175652a8e Abort if failpoint injecton failed
If one of nodes is unhealthy the test would never finish as watchers
would never reach max revision.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-12-03 17:26:51 +01:00
Marek Siarkowicz
b71686d1e6 Refactor mocking rand
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-11-15 10:26:39 +01:00
Siyuan Zhang
834fac9fb2 robustness test: add with functions of randomizable config params in robustness test
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2023-11-15 02:08:07 +00:00
ZhouJianMS
55516234d3 exclude sleep failpoint from 1 node scenario
Signed-off-by: ZhouJianMS <zhoujian@microsoft.com>
2023-11-13 16:19:44 +08:00
ZhouJianMS
d208985aec error handling for gofailpoint
Signed-off-by: ZhouJianMS <zhoujian@microsoft.com>
2023-11-03 19:25:17 +08:00
ZhouJianMS
827dc18682 Add IO stall failpoint in raft loop
Signed-off-by: ZhouJianMS <zhoujian@microsoft.com>
2023-11-03 16:42:33 +08:00
Marek Siarkowicz
45fb4565e3
Merge pull request #16786 from serathius/robustness-drop-packet
Implement random packet dropping
2023-10-19 08:44:23 +02:00
Marek Siarkowicz
aa28a69ce0 Implement random packet dropping
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-10-18 10:14:43 +02:00
Wei Fu
aea1cd0077 feat: enable unparam lint
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-10-17 21:24:13 +08:00
Marek Siarkowicz
e51b639520
Merge pull request #16766 from serathius/robustness-member-replace
Add member replace failpoint to robustness tests
2023-10-17 13:36:21 +02:00
Marek Siarkowicz
5fed813f2e
Merge pull request #16767 from serathius/robustness-main-test
Make the main_test the entrypoint and move senario generation to separate file
2023-10-17 13:09:16 +02:00