Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
* Use mod revision for optimistic concurrency.
* Introduce range requests as more general then get
* Add kubernetes specific traffic generation, for now using pull, but
expected to evolve to use watch.
* Introduce kubernetes specific test scenario
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.
Signed-off-by: James Blair <mail@jamesblair.net>
In a [scheduled test][1], the error shows
```
2023-04-19T11:16:15.8166316Z traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```
According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.
I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.
```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
t.controlBuf.throttle()
frame, err := t.framer.fr.ReadFrame()
if t.keepaliveEnabled {
+ time.Sleep(2 * time.Millisecond)
atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
}
if err != nil {
```
`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.
And in a [scheduled test][5], the error shows
```
logger.go:130: 2023-04-22T10:45:52.646Z INFO Failed to trigger failpoint {"failpoint": "blackhole", "error": "context deadline exceeded"}
```
Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.
The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.
```
injectFailpoints(1min timeout)
failpoint.Inject
triggerBlockhole.Trigger
blackhole
waitTillSnapshot
```
> NOTE: I didn't reproduce it either. :(
Reference:
[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>
REF: #15763
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The schwag was introduced to generate swagger with authorization support
[1][1] in 2017. And in 2018, the grpc-gateway supports to render
security fields by protoc-gen-swagger [2][2]. After several years, I
think it's good to use upstream protoc supports.
NOTE:
The json's key in `rpc.swagger.json` has been reordered so that it seems
that there's a lot of changes. How to verify it:
```bash
$ # use jq -S to sort the key
$ latest_commit="https://raw.githubusercontent.com/etcd-io/etcd/228f493c7697ce3e9d3a1d831bcffad175846c75/Documentation/dev-guide/apispec/swagger/rpc.swagger.json"
$ curl -s "${latest_commit}" | jq -S . > /tmp/old.json
$ cat Documentation/dev-guide/apispec/swagger/rpc.swagger.json | jq -S . > /tmp/new.json
$ diff --color -u /tmp/old.json /tmp/new.json
```
```diff
--- /tmp/old.json 2023-04-26 10:58:07.142311861 +0800
+++ /tmp/new.json 2023-04-26 10:58:12.170299194 +0800
@@ -1523,11 +1523,14 @@
"type": "object"
},
"protobufAny": {
+ "description": "`Any` contains an arbitrary serialized protocol buffer message along with a\nURL that describes the type of the serialized message.\n\nProtobuf library provides support to pack/unpack Any values in the form\nof utility functions or additional generated methods of the Any type.\n\nExample 1: Pack and unpack a message in C++.\n\n Foo foo = ...;\n Any any;\n any.PackFrom(foo);\n ...\n if (any.UnpackTo(&foo)) {\n ...\n }\n\nExample 2: Pack and unpack a message in Java.\n\n Foo foo = ...;\n Any any = Any.pack(foo);\n ...\n if (any.is(Foo.class)) {\n foo = any.unpack(Foo.class);\n }\n\n Example 3: Pack and unpack a message in Python.\n\n foo = Foo(...)\n any = Any()\n any.Pack(foo)\n ...\n if any.Is(Foo.DESCRIPTOR):\n any.Unpack(foo)\n ...\n\n Example 4: Pack and unpack a message in Go\n\n foo := &pb.Foo{...}\n any, err := ptypes.MarshalAny(foo)\n ...\n foo := &pb.Foo{}\n if err := ptypes.UnmarshalAny(any, foo); err != nil {\n ...\n }\n\nThe pack methods provided by protobuf library will by default use\n'type.googleapis.com/full.type.name' as the type URL and the unpack\nmethods only use the fully qualified type name after the last '/'\nin the type URL, for example \"foo.bar.com/x/y.z\" will yield type\nname \"y.z\".\n\n\nJSON\n====\nThe JSON representation of an `Any` value uses the regular\nrepresentation of the deserialized, embedded message, with an\nadditional field `@type` which contains the type URL. Example:\n\n package google.profile;\n message Person {\n string first_name = 1;\n string last_name = 2;\n }\n\n {\n \"@type\": \"type.googleapis.com/google.profile.Person\",\n \"firstName\": <string>,\n \"lastName\": <string>\n }\n\nIf the embedded message type is well-known and has a custom JSON\nrepresentation, that representation will be embedded adding a field\n`value` which holds the custom JSON in addition to the `@type`\nfield. Example (for message [google.protobuf.Duration][]):\n\n {\n \"@type\": \"type.googleapis.com/google.protobuf.Duration\",\n \"value\": \"1.212s\"\n }",
"properties": {
"type_url": {
+ "description": "A URL/resource name that uniquely identifies the type of the serialized\nprotocol buffer message. This string must contain at least\none \"/\" character. The last segment of the URL's path must represent\nthe fully qualified name of the type (as in\n`path/google.protobuf.Duration`). The name should be in a canonical form\n(e.g., leading \".\" is not accepted).\n\nIn practice, teams usually precompile into the binary all types that they\nexpect it to use in the context of Any. However, for URLs which use the\nscheme `http`, `https`, or no scheme, one can optionally set up a type\nserver that maps type URLs to message definitions as follows:\n\n* If no scheme is provided, `https` is assumed.\n* An HTTP GET on the URL must yield a [google.protobuf.Type][]\n value in binary format, or produce an error.\n* Applications are allowed to cache lookup results based on the\n URL, or have them precompiled into a binary to avoid any\n lookup. Therefore, binary compatibility needs to be preserved\n on changes to types. (Use versioned type names to manage\n breaking changes.)\n\nNote: this functionality is not currently available in the official\nprotobuf release, and it is not used for type URLs beginning with\ntype.googleapis.com.\n\nSchemes other than `http`, `https` (or the empty scheme) might be\nused with implementation specific semantics.",
"type": "string"
},
"value": {
+ "description": "Must be a valid serialized protocol buffer of the above specified type.",
"format": "byte",
"type": "string"
}
```
REF:
1: <https://github.com/etcd-io/etcd/pull/7999#issuecomment-307512043>
2: <https://github.com/grpc-ecosystem/grpc-gateway/pull/547>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
It's to deflake TestAuthMemberRemove.
When the client has multiple endpoints, the client might send a request
with valid token to the follower member which hasn't received token
replicated log yet. The member will reject the request.
For instance, the maintenance.Status API will return "auth: invalid auth
token". But the client doesn't identify the error. The client won't retry to
refresh auth token. The maintenance.Status should togRPCError before return
so that the client can reflesh token. It's align with existing API.
Since the maintenance client always creates one connection to target
member, the member will have the token after refresh auth.
Maybe we can introduce a sync to wait for member is ready with token,
instead of refreshing.
Fixes: #15758
Signed-off-by: Wei Fu <fuweid89@gmail.com>