15688 Commits

Author SHA1 Message Date
Chris Wedgwood
c499d9b047 integration: relax leader timeout from 3s to 4s
The integration jobs fail with timeouts slightly over 3s, increase
this marginally so false failures are less prevalent.
2021-03-29 10:17:44 -07:00
Piotr Tabor
2702f9e5f2
Merge pull request #12751 from cwedgwood/nofsyncdowrite
When using --unsafe-no-fsync still write out the data
2021-03-07 11:52:33 +01:00
Chris Wedgwood
94634fc258 etcdserver: when using --unsafe-no-fsync write data
There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
2021-03-05 10:09:52 -08:00
Sam Batschelet
afd6d8a40d
Merge pull request #12740 from hexfusion/cp-12448--release-3.4
Manual cherry pick of #12448 on release 3.4
2021-03-03 13:37:20 -05:00
Sam Batschelet
9aeabe447d server: Added config parameter experimental-warning-apply-duration
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-03-03 12:14:30 -05:00
Gyuho Lee
aa7126864d version: 3.4.15
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
v3.4.15
2021-02-26 22:08:24 +00:00
Gyuho Lee
3be9460ddc
Merge pull request #12679 from chaochn47/backport_3.4_#12677
[Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server side health check in debug level
2021-02-09 15:01:19 -08:00
Chao Chen
f27ef4d343 [Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server side health check in debug level
ref. #12677
ref. 0b9cfa8677
2021-02-08 21:44:44 -08:00
Piotr Tabor
a1c5f59b59
Merge pull request #12402 from vitalif/release-3.4
etcdserver: Fix 64 KB websocket notification message limit
2021-02-03 09:19:21 +01:00
Vitaliy Filippov
a40f14d92c etcdserver: Fix 64 KB websocket notification message limit
This fixes etcd being unable to send any message longer than 64 KB as
a notification over the websocket. This was because the older version
of grpc-websocket-proxy was used and WithMaxRespBodyBufferSize option
wasn't set.
2021-01-30 00:37:02 +03:00
Sam Batschelet
d51c6c689b
Merge pull request #12645 from hexfusion/bump-dep
vendor: bump gorilla/websocket
2021-01-23 13:49:45 -05:00
Sam Batschelet
becc228c5a vendor: bump gorilla/websocket
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-01-23 11:20:53 -05:00
Piotr Tabor
0880605772
Merge pull request #12551 from kolyshkin/3.4-fix-lock
[3.4 backport] pkg/fileutil: fix F_OFD_ constants
2021-01-15 23:16:49 +01:00
Kir Kolyshkin
bea35fd2c6 pkg/fileutil: fix F_OFD_ constants
Use golang.org/x/sys/unix for F_OFD_* constants.

This fixes the issue that F_OFD_GETLK was defined incorrectly,
resulting in bugs such as https://github.com/moby/moby/issues/31182

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-14 10:42:13 -08:00
Gyuho Lee
8a03d2e961 version: 3.4.14
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
v3.4.14
2020-11-25 11:31:52 -08:00
Gyuho Lee
a4b43b388d pkg/netutil: remove unused "iptables" wrapper
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-11-25 11:31:17 -08:00
Gyuho Lee
e3b29b66a4 tools/etcd-dump-metrics: validate exec cmd args
To prevent arbitrary command invocations.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-11-25 11:30:31 -08:00
Gyuho Lee
eb0fb0e799
Merge pull request #12356 from cfc4n/automated-cherry-pick-of-#12264-upstream-release-3.4
Automated cherry pick of #12264
2020-10-12 14:26:50 -07:00
CFC4N
40b71074e8
clientv3: get AuthToken automatically when clientConn is ready.
fixes: #11954
2020-09-30 17:14:22 +08:00
Jingyi Hu
7e2d426ec0
Merge pull request #12299 from galal-hussein/fix_panic_34
[Backport 3.4] etcdserver: add ConfChangeAddLearnerNode to the list of config changes
2020-09-15 09:04:18 -07:00
galal-hussein
3019246742 etcdserver: add ConfChangeAddLearnerNode to the list of config changes
To fix a panic that happens when trying to get ids of etcd members in
force new cluster mode, the issue happen if the cluster previously had
etcd learner nodes added to the cluster

Fixes #12285
2020-09-14 17:50:57 +02:00
Joe Betz
dd1b699fc4
Merge pull request #12280 from jingyih/automated-cherry-pick-of-#12271-upstream-release-3.4
Automated cherry pick of #12271 on release 3.4
2020-09-10 11:07:54 -07:00
jingyih
f44aaf8248 integration: add flag WatchProgressNotifyInterval in integration test 2020-09-09 12:39:42 -07:00
Gyuho Lee
ae9734ed27 version: 3.4.13
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
v3.4.13
2020-08-24 12:11:28 -07:00
Gyuho Lee
781bde75e2
Merge pull request #12250 from spzala/automated-cherry-pick-of-#12242-upstream-release-3.4
Automated cherry pick of #12242
2020-08-24 12:05:03 -07:00
Sahdev P. Zala
d5ebbbceb8 pkg: file stat warning
Provide warning and doc instead of enforcing file permission.
2020-08-24 11:21:29 -04:00
Sam Batschelet
7cd5872656
Merge pull request #12244 from hexfusion/automated-cherry-pick-of-#12243-upstream-release-3.4
Automated cherry pick of #12243 on release 3.4
2020-08-21 11:24:21 -04:00
Sam Batschelet
46a0a44f95 Automated cherry pick of #12243 on release 3.4
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2020-08-21 10:14:07 -04:00
Gyuho Lee
17cef6e3e9 version: 3.4.12
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
v3.4.12
2020-08-19 09:56:24 -07:00
Gyuho Lee
c07cba001b
Merge pull request #12239 from liggitt/slow-v2-panic-3.4
[3.4] etcdserver: Avoid panics logging slow v2 requests in integration tests
2020-08-19 09:55:08 -07:00
Jordan Liggitt
b8878eac45 etcdserver: Avoid panics logging slow v2 requests in integration tests 2020-08-19 11:30:39 -04:00
Gyuho Lee
e71e0c5c88
Merge pull request #12226 from jingyih/fix_backport_PR12216
*: add plog logging to the backport of PR12216
v3.4.11
2020-08-18 08:48:09 -07:00
Gyuho Lee
bc44e367c3 version: 3.4.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 08:46:13 -07:00
Gyuho Lee
299e0f17aa Revert "etcdserver/api/v3rpc: "MemberList" never return non-empty ClientURLs"
This reverts commit 0372cfc7ab1052ac616ca34551f83657a8fd2e3e.
2020-08-18 08:45:38 -07:00
jingyih
75d5e78d1f *: fix backport of PR12216
Fix bugs introduced in commit c60dabf
2020-08-16 15:01:18 +08:00
jingyih
c60dabf2f3 *: add experimental flag for watch notify interval
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-15 10:24:25 -07:00
Gyuho Lee
8a4afdbcc2
Merge pull request #12189 from jingyih/automated-cherry-pick-of-#11452-#12187-upstream-release-3.4
Automated cherry pick of #11452 #12187 on release 3.4
2020-08-13 21:38:05 -07:00
jingyih
6fcab5af9f clientv3: remove excessive watch cancel logging 2020-08-13 13:51:21 +08:00
Gyuho Lee
008074187c etcdserver: add OS level FD metrics
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.

e.g.

os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 18:38:35 -07:00
Gyuho Lee
cf558ee8b7 pkg/runtime: optimize FDUsage by removing sort
No need sort when we just want the counts.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 18:38:17 -07:00
Ted Yu
e800c62eca clientv3: log warning in case of error sending request 2020-07-30 22:54:35 +08:00
Gyuho Lee
0372cfc7ab etcdserver/api/v3rpc: "MemberList" never return non-empty ClientURLs
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

cr https://code.amazon.com/reviews/CR-29712724
2020-07-16 16:29:51 -07:00
Gyuho Lee
18dfb9cca3 version: 3.4.10
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
v3.4.10
2020-07-16 15:16:20 -07:00
Jingyi Hu
7b8270416d
Merge pull request #12106 from bart0sh/PR001-cherry-pick-change-protobuf-field-type-from-int-to-int64
etcdserver: change protobuf field type from int to int64 (#12000)
2020-07-16 00:45:16 +08:00
Sahdev Zala
a2c37485dd
Merge pull request #12127 from spzala/automated-cherry-pick-of-#12012-upstream-release-3.4
Automated cherry pick of #12012
2020-07-13 10:53:52 -04:00
Hitoshi Mitake
67bfc310f0 Documentation: note on data encryption 2020-07-13 09:50:30 -04:00
Yuchen Zhou
ed28c768a3 etcdserver: change protobuf field type from int to int64 (#12000) 2020-07-08 10:21:10 +03:00
Gyuho Lee
d3a702a09d
Merge pull request #12112 from spzala/automated-cherry-pick-of-#12018-upstream-release-3.4
Automated cherry pick of #12018
2020-07-07 10:32:18 -07:00
Sahdev P. Zala
319331192e pkg: consider umask when use MkdirAll
os.MkdirAll creates directory before umask so make sure that a desired
permission is set after creating a directory with MkdirAll. Use the
existing TouchDirAll function which checks for permission if dir is already
exist and when create a new dir.
2020-07-07 11:46:31 -04:00
Gyuho Lee
2acdf88406
Merge pull request #12076 from cfc4n/automated-cherry-pick-of-#11987-upstream-release-3.4
Automated cherry pick of #11987
2020-07-06 13:01:24 -07:00