142 Commits

Author SHA1 Message Date
Benjamin Wang
109873dcb6 etctserver: add failpoints walBeforeSync and walAfterSync
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-02-09 07:06:46 +08:00
Chris Wedgwood
94634fc258 etcdserver: when using --unsafe-no-fsync write data
There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
2021-03-05 10:09:52 -08:00
Gyuho Lee
4571e528f4 wal: check out of range slice in "ReadAll", "decoder"
wal: add slice bound checks in decoder

CHANGELOG-3.5: add wal slice bound check
CHANGELOG-3.5: add "decodeRecord"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-06-22 11:57:43 -04:00
Sahdev P. Zala
7d1cf64049 wal: fix panic when decoder not set
Handle the related panic and clarify doc.
2020-06-21 16:21:34 -04:00
David Crawshaw
78f67988aa
etcdserver, et al: add --unsafe-no-fsync flag
This makes it possible to run an etcd node for testing and development
without placing lots of load on the file system.

Fixes #11930.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-06-04 20:19:28 -07:00
tangcong
ed231df7c0 wal: fix crc mismatch crash bug 2020-05-20 11:37:04 -07:00
Viacheslav Biriukov
87fc3c9e57 etcdserver,wal: fix inconsistencies in WAL and snapshot
etcdserver/*, wal/*: changes to snapshots and wal logic
etcdserver/*: changes to snapshots and wal logic to fix #10219
etcdserver/*, wal/*: add Sync method
etcdserver/*, wal/*: find valid snapshots by cross checking snap files and wal snap entries
etcdserver/*, wal/*:Add comments, clean up error messages and tests
etcdserver/*, wal/*: Remove orphaned .snap.db files during Release

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-05-15 08:40:09 -07:00
Gyuho Lee
e99399d0dc wal: add "etcd_wal_writes_bytes_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-04-01 09:30:09 -07:00
Gyuho Lee
34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
Joshua Coutinho
a0c889d14b wal: add a test for wal cleanup, improve comments
To add test coverage of wal cleanup.
2019-05-10 22:36:26 +01:00
jcoutin
f7f7e9c762 wal: Improve cleanup for robustness and debuggability
Rename wal with '.suffix.<timestamp>' instead of delete it and call cleanup when perr in a 'defer'ed statement.
2019-05-07 21:38:40 +01:00
Joshua Coutinho
51035bfd84 wal: cleanup wal directory if creation fails
delete <data-dir>/member/wal if any operation after the rename in
wal.Create fails to avoid reading an inconsistent WAL on restart.

Fixes #10688
2019-05-04 01:58:57 +01:00
shivaramr
9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Shreyas Rao
914e5edb00 wal: include logger in WAL returned by openAtIndex
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
2019-04-02 13:09:10 +05:30
shreyas-s-rao
3d6862fe0d wal: add Verify function to perform corruption check on wal contents
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
2019-03-12 22:25:25 +05:30
Gyuho Lee
038fd844ac wal: update Go import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:55 -07:00
Gyuho Lee
b0b966c43c wal: document, clean up fsync histogram
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee
e15ce28168 wal: add missing logs, improve pipeline test coverage
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-17 11:01:31 -07:00
Gyuho Lee
f3d9a85697 wal: add warnings on fsync, flock fail paths
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-03 14:01:06 -07:00
Gyuho Lee
fdbedacc83 wal: support structured logger
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-16 17:36:00 -07:00
lorneli
7c50c06fb8 wal: tiny refactor
a. add comment of reopening file in cut function.
b. add const frameSizeBytes in decoder.
c. return directly if locked files empty in ReleaseLockTo function.
2017-09-07 02:50:37 +08:00
Anthony Romano
fe1ddab714 wal: fall back to closing wal if locked dir rename fails
Detecting windows at compile time isn't enough since etcd might be
on linux but the fs is backed by windows.

Fixes: #8178
Fixes: #6984
2017-07-20 13:30:41 -07:00
Gyu-Ho Lee
aca2abd8fe *: use 'io.Seek*' for go1.7+
For https://github.com/coreos/etcd/issues/6174.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-20 15:15:24 -07:00
Tess Rinearson
39c733ebe7 wal: use path/filepath instead of path
Use the path/filepath package instead of the path package. The
path package assumes slash-separated paths, which doesn't work
on Windows. But path/filepath manipulates filename paths in a way
that's compatible across OSes.
2017-03-15 17:30:23 -07:00
Peter Mattis
ab03a42f06 raft: add Ready.MustSync
Add Ready.MustSync which indicates that the hard state and raft log
entries in a Ready message must be synchronously written to persistent
storage.
2017-02-13 15:13:21 -05:00
Anthony Romano
c26ebe3262 Merge pull request #6453 from vimalk78/wal-optimize-marshal-outside-lock
wal/wal.go: optimized WAL.SaveSnapshot to do Marshal outside the mutex lock
2016-10-03 11:50:11 -07:00
Gyu-Ho Lee
f5588526cc wal: set PageWriter offset in file encoder 2016-09-28 11:03:24 -07:00
Vimal Kumar
f4ec303d1b wal/wal.go: modified WAL.SaveSnapshot to do the Marshal before aquiring the mutex 2016-09-28 10:35:19 +05:30
Vimal Kumar
064411b51c wal/wal.go : improved coverage by testing WAL.Save which causes a WAL.cut to happen 2016-09-21 16:50:55 +05:30
Gyu-Ho Lee
ccb46d2024 wal: simplify dir.Close call 2016-09-09 09:23:55 +09:00
Anthony Romano
bd7107bd4b wal: fsync directory after wal file rename
Fixes #6368
2016-09-08 00:09:16 -07:00
Aaron Lehmann
af4f82228c wal: hold file lock while renaming WAL directory on non-Windows
Windows requires this lock to be released before the directory is
renamed. But on unix-like operating systems, releasing the lock and
trying to reacquire it immediately can be flaky if a process is forked
around the same time. The file descriptors are marked as close-on-exec
by the Go runtime, but there is a window between the fork and exec where
another process will be holding the lock.
2016-08-26 09:27:51 -07:00
Anthony Romano
f1ead43482 wal: zero out wal tail past its first zero record
Whenever the WAL is opened for writes, it should write zeroes to its tail
starting from the first zero record. Otherwise, if there are entries past
the first zero record due to a torn write, any new writes that overlap the
old entries will lead to a garbage record on the tail and cause a CRC
mismatch.
2016-08-25 14:24:46 -07:00
Aaron Lehmann
2b996b6038 wal: Export SegmentSizeBytes as a variable
In test situations, it's useful to create smaller than usual WAL files
to test rotation and to avoid the overhead of preallocation on old-style
filesystems that don't handle it efficiently. This commit changes
segmentSizeBytes to an exported variable so that tests can override it
from an init() function.
2016-08-09 15:38:30 -07:00
Anthony Romano
5991209c2d wal: release wal locks before renaming directory on init
Fixes #5852
2016-07-02 12:14:37 -07:00
Gyu-Ho Lee
6cfc03a5f9 wal: use CreateDirAll 2016-06-22 15:57:55 -07:00
Gyu-Ho Lee
b4aa4607cb wal: use bytes.Equal, other minor updates
- Replace reflect.Equal with bytes.Equal where possible
- Remove some TODOs
- Some minor simplifications
2016-06-13 01:33:53 -07:00
Gyu-Ho Lee
3243795522 wal: simplify boolean return 2016-06-11 10:36:52 -07:00
Gyu-Ho Lee
4570eddc2c wal: PrivateFileMode/DirMode as in pkg/fileutil
To make it consistent with pkg/fileutil
2016-06-10 15:20:57 -07:00
Anthony Romano
39eaa37dcf wal: warn if sync exceeds a second 2016-06-08 11:03:18 -07:00
Anthony Romano
5be39d2c84 wal: don't preallocate on old tail file
Code is only there to handle an edge case where the tail wasn't preallocated
already (e.g., via old etcd version or a crash). It also triggers tmpfs
corruption, so remove it.
2016-06-06 11:31:25 -07:00
Anthony Romano
71a9d6fc8b wal: don't warn when opening wal directory with stale tmp files 2016-05-31 06:25:23 -07:00
Gyu-Ho Lee
4a5befc2de wal: update LICENSE header 2016-05-12 20:50:04 -07:00
Anthony Romano
17391336af wal: atomically initialize wal directory
Fixes #5270
2016-05-11 16:50:17 -07:00
Xiang Li
0fb7cb8b00 *: add disk operation metrics for monitoring 2016-05-11 09:36:45 -07:00
Anthony Romano
bfe3a3d08e wal: fix tail corruption
On ReadAll, WAL seeks to the end of the last record in the tail. If the tail did not
end with preallocated space, the decoder would report 0 as the last offset and begin
writing at offset 0 of the tail.

Fixes #4903
2016-04-01 15:05:52 -07:00
Anthony Romano
bd832e5b0a *: migrate Godeps to vendor/ 2016-03-22 17:10:28 -07:00
Anthony Romano
0df732c052 wal: pre-create segment files
Pipeline file creation and allocation so it overlaps writes to the log.

Fixes #4773
2016-03-21 11:56:53 -07:00
Anthony Romano
24b806d2ee wal: preallocate WAL files with initial size equal to segment size
Avoids having to update file size metadata during fdatasync on common path.

Fixes #4755
2016-03-21 11:56:53 -07:00
Anthony Romano
aafe717f2f fileutil: support file extending preallocate 2016-03-21 09:42:30 -07:00