19647 Commits

Author SHA1 Message Date
Marek Siarkowicz
92366a5338 tests/robustness: Split model code into deterministic and non-deterministic
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
2023-05-05 12:25:10 +02:00
Marek Siarkowicz
cfe154209c tests/robustness: Separate describe model functions to dedicated file
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 14:03:18 +02:00
Marek Siarkowicz
2c16812841
Merge pull request #15817 from serathius/robustness-k8s-1
tests/robustness: Implement first step in validating the Kubernetes-etcd contract
2023-05-04 13:52:25 +02:00
Marek Siarkowicz
9b5680c5f1 tests/robustness: Implement first step in validating the Kubernetes-etcd contract.
* Use mod revision for optimistic concurrency.
* Introduce range requests as more general then get
* Add kubernetes specific traffic generation, for now using pull, but
  expected to evolve to use watch.
* Introduce kubernetes specific test scenario

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-05-04 13:26:54 +02:00
Benjamin Wang
ecb64030fb
Merge pull request #15821 from jmhbnz/upgrade-go-patch-release
Updated go to latest patch release 1.19.9
2023-05-04 07:56:42 +08:00
James Blair
b84e4273f7
Updated go to latest patch release 1.19.9.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-05-04 11:00:08 +12:00
Benjamin Wang
3ef5985bcd
Merge pull request #15813 from Rajalakshmi-Girish/continue-on-failure
keep going with other test suite runs when failure with one
2023-05-04 05:54:05 +08:00
Rajalakshmi Girish
c9998a7e63 keep_going with other suites when failure with one
Signed-off-by: Rajalakshmi Girish <rajalakshmi.girish1@ibm.com>
2023-05-03 00:57:49 -07:00
Benjamin Wang
5021cd924c
Merge pull request #15816 from chaochn47/update_dependency_management
dependency_management.md: document go.opentelemetry.io/otel version update is blocked
2023-05-03 12:27:05 +08:00
Chao Chen
bb060586ce dependency_management.md: document go.opentelemetry.io/otel version update is blocked
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-05-02 20:40:06 -07:00
Hitoshi Mitake
49b59cc8e5
Merge pull request #15656 from mitake/lease-timetolive-auth
protect LeaseTimeToLive with RBAC
2023-05-02 23:02:29 +09:00
Benjamin Wang
4785f5a7ba
Merge pull request #15809 from etcd-io/dependabot/github_actions/github/codeql-action-2.3.2
build(deps): bump github/codeql-action from 2.3.0 to 2.3.2
2023-05-02 06:39:40 +08:00
Benjamin Wang
b089474b01
Merge pull request #15795 from jmhbnz/deflake-roundrobin-resolver-test
tests: Deflake TestEtcdGrpcResolverRoundRobin
2023-05-02 06:09:46 +08:00
dependabot[bot]
4c4bd63fa1
build(deps): bump github/codeql-action from 2.3.0 to 2.3.2
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.3.0 to 2.3.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](b2c19fb9a2...f3feb00acb)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-01 18:03:06 +00:00
Benjamin Wang
0deaec0e4f
Merge pull request #15762 from aimuz/fix-logger
refactor(util): remove duplicate lg check
2023-04-30 05:52:11 +08:00
Marek Siarkowicz
7462c61b31
Merge pull request #15792 from fuweid/deflake-robustness-cases
tests/robustness: tune timeout policy
2023-04-29 09:04:38 +02:00
James Blair
b9533ca98b
Deflake TestEtcdGrpcResolverRoundRobin.
Increase request to 1000 to increase sample size/reduce variability and increase tolerance threshold from 10 to 15%.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-29 14:14:16 +12:00
Wei Fu
09d053e035 tests/robustness: tune timeout policy
In a [scheduled test][1], the error shows

```
2023-04-19T11:16:15.8166316Z     traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```

According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.

I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.

```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
                t.controlBuf.throttle()
                frame, err := t.framer.fr.ReadFrame()
                if t.keepaliveEnabled {
+                       time.Sleep(2 * time.Millisecond)
                        atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
                }
                if err != nil {
```

`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.

And in a [scheduled test][5], the error shows

```
logger.go:130: 2023-04-22T10:45:52.646Z	INFO	Failed to trigger failpoint	{"failpoint": "blackhole", "error": "context deadline exceeded"}
```

Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.

The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.

```
injectFailpoints(1min timeout)
  failpoint.Inject
    triggerBlockhole.Trigger
      blackhole
        waitTillSnapshot
```

> NOTE: I didn't reproduce it either. :(

Reference:

[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>

REF: #15763

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-29 07:03:47 +08:00
Marek Siarkowicz
46ab121cb7
Merge pull request #15786 from etcd-io/serathius-patch-1
Provide release date for v3.5.8
2023-04-28 15:21:53 +02:00
Marek Siarkowicz
7e2e5c68de Provide release data for v3.5.8
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-28 15:21:06 +02:00
aimuz
b052092297
refactor(util): remove duplicate lg check
lg always has a value

Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-04-28 10:43:30 +08:00
Marek Siarkowicz
7450cd886d
Merge pull request #15790 from Rajalakshmi-Girish/add-failfast-flag
Add -failfast flag when the mode is fail_fast
2023-04-27 21:16:38 +02:00
Marek Siarkowicz
cd24847086
Merge pull request #15789 from ahrtr/save_data_20230427
test: forcibly save data on panicking
2023-04-27 16:23:49 +02:00
Rajalakshmi Girish
81fccc13da Add -failfast flag when the mode is fail_fast
Signed-off-by: Rajalakshmi Girish <rajalakshmi.girish1@ibm.com>
2023-04-27 05:26:38 -07:00
Benjamin Wang
c7d81acaf0 test: forcibly save data on pinicking
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-27 14:54:35 +08:00
Benjamin Wang
6d11f8ceb5
Merge pull request #15785 from Mskxn/fix_session
close the session to avoid leak goroutine
2023-04-27 04:24:17 +08:00
Msk233
26fdf46001 close the session to avoid leak goroutine
Signed-off-by: Mskxn <118117161+Mskxn@users.noreply.github.com>
2023-04-26 20:45:13 +08:00
Hitoshi Mitake
c9b368119e tests: e2e and integration test for timetolive
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
Hitoshi Mitake
975854f07f etcdserver: protect lease timetilive with auth
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
Marek Siarkowicz
e04120042e
Merge pull request #15779 from fuweid/deprecate-schwag
chore: deprecate github.com/hexfusion/schwag
2023-04-26 11:36:31 +02:00
Marek Siarkowicz
f13b7502ef
Merge pull request #15781 from serathius/meme-readme
Incorporate xkcd dependency meme into README
2023-04-26 11:05:35 +02:00
Marek Siarkowicz
f6822b4225
Merge pull request #15783 from jmhbnz/consolidate-dockerfiles
Consolidate etcd dockerfiles
2023-04-26 11:01:44 +02:00
James Blair
ab65ee3d01
Consolidate etcd dockerfiles.
We can consolidate by using docker build args to create the individual platform Dockerfile.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-26 17:09:25 +12:00
Wei Fu
b4f49a55a5 chore: deprecate github.com/hexfusion/schwag
The schwag was introduced to generate swagger with authorization support
[1][1] in 2017. And in 2018, the grpc-gateway supports to render
security fields by protoc-gen-swagger [2][2]. After several years, I
think it's good to use upstream protoc supports.

NOTE:

The json's key in `rpc.swagger.json` has been reordered so that it seems
that there's a lot of changes. How to verify it:

```bash
$ # use jq -S to sort the key
$ latest_commit="https://raw.githubusercontent.com/etcd-io/etcd/228f493c7697ce3e9d3a1d831bcffad175846c75/Documentation/dev-guide/apispec/swagger/rpc.swagger.json"
$ curl -s "${latest_commit}"  | jq -S . > /tmp/old.json
$ cat Documentation/dev-guide/apispec/swagger/rpc.swagger.json | jq -S . > /tmp/new.json
$ diff --color -u /tmp/old.json /tmp/new.json
```

```diff
--- /tmp/old.json       2023-04-26 10:58:07.142311861 +0800
+++ /tmp/new.json       2023-04-26 10:58:12.170299194 +0800
@@ -1523,11 +1523,14 @@
       "type": "object"
     },
     "protobufAny": {
+      "description": "`Any` contains an arbitrary serialized protocol buffer message along with a\nURL that describes the type of the serialized message.\n\nProtobuf library provides support to pack/unpack Any values in the form\nof utility functions or additional generated methods of the Any type.\n\nExample 1: Pack and unpack a message in C++.\n\n    Foo foo = ...;\n    Any any;\n    any.PackFrom(foo);\n    ...\n    if (any.UnpackTo(&foo)) {\n      ...\n    }\n\nExample 2: Pack and unpack a message in Java.\n\n    Foo foo = ...;\n    Any any = Any.pack(foo);\n    ...\n    if (any.is(Foo.class)) {\n      foo = any.unpack(Foo.class);\n    }\n\n Example 3: Pack and unpack a message in Python.\n\n    foo = Foo(...)\n    any = Any()\n    any.Pack(foo)\n    ...\n    if any.Is(Foo.DESCRIPTOR):\n      any.Unpack(foo)\n      ...\n\n Example 4: Pack and unpack a message in Go\n\n     foo := &pb.Foo{...}\n     any, err := ptypes.MarshalAny(foo)\n     ...\n     foo := &pb.Foo{}\n     if err := ptypes.UnmarshalAny(any, foo); err != nil {\n       ...\n     }\n\nThe pack methods provided by protobuf library will by default use\n'type.googleapis.com/full.type.name' as the type URL and the unpack\nmethods only use the fully qualified type name after the last '/'\nin the type URL, for example \"foo.bar.com/x/y.z\" will yield type\nname \"y.z\".\n\n\nJSON\n====\nThe JSON representation of an `Any` value uses the regular\nrepresentation of the deserialized, embedded message, with an\nadditional field `@type` which contains the type URL. Example:\n\n    package google.profile;\n    message Person {\n      string first_name = 1;\n      string last_name = 2;\n    }\n\n    {\n      \"@type\": \"type.googleapis.com/google.profile.Person\",\n      \"firstName\": <string>,\n      \"lastName\": <string>\n    }\n\nIf the embedded message type is well-known and has a custom JSON\nrepresentation, that representation will be embedded adding a field\n`value` which holds the custom JSON in addition to the `@type`\nfield. Example (for message [google.protobuf.Duration][]):\n\n    {\n      \"@type\": \"type.googleapis.com/google.protobuf.Duration\",\n      \"value\": \"1.212s\"\n    }",
       "properties": {
         "type_url": {
+          "description": "A URL/resource name that uniquely identifies the type of the serialized\nprotocol buffer message. This string must contain at least\none \"/\" character. The last segment of the URL's path must represent\nthe fully qualified name of the type (as in\n`path/google.protobuf.Duration`). The name should be in a canonical form\n(e.g., leading \".\" is not accepted).\n\nIn practice, teams usually precompile into the binary all types that they\nexpect it to use in the context of Any. However, for URLs which use the\nscheme `http`, `https`, or no scheme, one can optionally set up a type\nserver that maps type URLs to message definitions as follows:\n\n* If no scheme is provided, `https` is assumed.\n* An HTTP GET on the URL must yield a [google.protobuf.Type][]\n  value in binary format, or produce an error.\n* Applications are allowed to cache lookup results based on the\n  URL, or have them precompiled into a binary to avoid any\n  lookup. Therefore, binary compatibility needs to be preserved\n  on changes to types. (Use versioned type names to manage\n  breaking changes.)\n\nNote: this functionality is not currently available in the official\nprotobuf release, and it is not used for type URLs beginning with\ntype.googleapis.com.\n\nSchemes other than `http`, `https` (or the empty scheme) might be\nused with implementation specific semantics.",
           "type": "string"
         },
         "value": {
+          "description": "Must be a valid serialized protocol buffer of the above specified type.",
           "format": "byte",
           "type": "string"
         }
```

REF:

1: <https://github.com/etcd-io/etcd/pull/7999#issuecomment-307512043>
2: <https://github.com/grpc-ecosystem/grpc-gateway/pull/547>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-26 11:14:50 +08:00
Marek Siarkowicz
045192683c Move credits to subscript
Co-authored-by: James Blair <mail@jamesblair.net>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-25 15:33:03 +02:00
Marek Siarkowicz
2fd9a1914e Incorporate xkcd dependency meme into README
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-25 14:49:50 +02:00
Benjamin Wang
4485db379e
Merge pull request #15577 from jmhbnz/add-round-robin-test
tests: Add new test for round robin resolver
2023-04-25 18:37:54 +08:00
Benjamin Wang
9b310ea316
Merge pull request #15776 from fuweid/update-deps
[2023-04-25] Bump dependencies identified by dependabot
2023-04-25 16:37:34 +08:00
Wei Fu
aa787d9f51 dependency: bump github.com/alexkohler/nakedret from 1.0.1 to 1.0.2
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-25 14:44:34 +08:00
James Blair
18e3acae0e
Add new test for round robin resolver.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-25 18:44:24 +12:00
Benjamin Wang
8c5e9ad455
Merge pull request #15759 from fuweid/deflake-TestAuthMemberRemove
server/etcdserver: togRPCError for maintenance API
2023-04-25 09:26:28 +08:00
Benjamin Wang
0f3fb04f1f
Merge pull request #15744 from ahrtr/dependency_management_20230419
Document: add guidance on dependency management
2023-04-25 06:14:16 +08:00
Benjamin Wang
1dbc9db621
Merge pull request #15772 from etcd-io/dependabot/github_actions/github/codeql-action-2.3.0
build(deps): bump github/codeql-action from 2.2.12 to 2.3.0
2023-04-25 06:12:33 +08:00
dependabot[bot]
a2426712cc
build(deps): bump github/codeql-action from 2.2.12 to 2.3.0
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.12 to 2.3.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](7df0ce3489...b2c19fb9a2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-24 18:01:36 +00:00
Benjamin Wang
d589a0b5f6 Document: add guidance on dependency management
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-24 18:29:09 +08:00
Marek Siarkowicz
c2d78a316a
Merge pull request #15761 from ahrtr/min_version_20230424
Change the minimum recommended etcd versions to run in production to 3.4.22+ and 3.5.6+
2023-04-24 10:26:35 +02:00
Benjamin Wang
146f44d35e change the minimum recommended etcd versions to run in production to 3.4.22+ and 3.5.6+
Please read https://groups.google.com/g/etcd-dev/c/8S7u6NqW6C4

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-24 07:08:28 +08:00
Benjamin Wang
211b4320c3
Merge pull request #15752 from sharathsivakumar/main
fixes for "improve description of --initial-cluster-state flag" #15743
2023-04-23 07:17:37 +08:00
sharathsivakumar
32c83becf5
fix review: Updated description of --initial-cluster-state flag
Signed-off-by: sharathsivakumar <mailssr9@gmail.com>
2023-04-22 23:16:33 +02:00
Wei Fu
1ba577e499 server/etcdserver: togRPCError for maintenance API
It's to deflake TestAuthMemberRemove.

When the client has multiple endpoints, the client might send a request
with valid token to the follower member which hasn't received token
replicated log yet. The member will reject the request.

For instance, the maintenance.Status API will return "auth: invalid auth
token". But the client doesn't identify the error. The client won't retry to
refresh auth token. The maintenance.Status should togRPCError before return
so that the client can reflesh token. It's align with existing API.

Since the maintenance client always creates one connection to target
member, the member will have the token after refresh auth.

Maybe we can introduce a sync to wait for member is ready with token,
instead of refreshing.

Fixes: #15758

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-22 18:35:53 +08:00