mirror of
https://github.com/etcd-io/etcd.git
synced 2024-09-27 06:25:44 +00:00
Let's say there is command which outputs one line and exit with error.
There are three goroutines to acquire the lock:
1. ep.read()
2. ep.waitSaveExitErr()
3. ep.Expect()
When ep.read goroutine reads the log but it doesn't acquire the lock in
time, the ep.waitSaveExitErr acquires the lock and updates the
`exitErr`. And then ep.Expect acquires lock but it doesn't see any log
yet and then returns err.
It's hard to reproduce it in local. Add the extra sleep can reproduce it.
```diff
diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go
index a512a3ce4..602bea73f 100644
--- a/pkg/expect/expect.go
+++ b/pkg/expect/expect.go
@@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error {
printDebugLines := os.Getenv("EXPECT_DEBUG") != ""
l, err := r.ReadString('\n')
+ time.Sleep(10 * time.Millisecond)
ep.mu.Lock()
defer ep.mu.Unlock()
```
See it once in Github Action [1]. In order to fix it, the patch introduces
`readCloseCh` to wait for ep.read to get all the data and retry it.
[1]: https://github.com/etcd-io/etcd/pull/16137#issuecomment-1605838518
Signed-off-by: Wei Fu <fuweid89@gmail.com>