The file `zap_raft.go` adds the raft.Logger proxy logger on top of `*zap.Logger`.
Adding a proxy requires adding the option `zap.AddCallerSkip(1)`,
so that the logging message specifies the correct caller,
two of the three constructors in the `zap_raft.go` adds this option.
This commit fixes the third constructor so that it also adds `zap.AddCallerSkip`.
Before fix:
`{"level":"info","ts":"2021-07-22T17:46:01.435Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"bd07d29169ff0c5a [logterm: 2, index: 8, vote: 38447ba545569bbe] ignored MsgPreVote from c7baeaad79d6d5ed [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 10)"}`
After fix:
`{"level":"info","ts":"2021-07-22T17:46:51.227Z","logger":"raft","caller":"raft/raft.go:859","msg":"bd07d29169ff0c5a [logterm: 2, index: 8, vote: c7baeaad79d6d5ed] ignored MsgPreVote from 38447ba545569bbe [logterm: 2, index: 8] at term 2: lease is not expired (remaining ticks: 9)"}`
During the refactoring process, duplicate logging
of the send buffer overflow event was added.
Each of these log lines logs exactly the same information, the logging
context is sufficient to distinguish the cause.
Additionally, the unnecessary context (in parentheses) in the log
message was removed, which was necessary without the zap context (with
the old logger), but now only confuses.
If one of the nodes in the cluster has lost a dns record,
restarting the second node will break it.
This PR makes an attempt to add a comparison without using a resolver,
which allows to protect cluster from dns errors and does not break
the current logic of comparing urls in the URLStringsEqual function.
You can read more in the issue #7798Fixes#7798
we found a lease leak issue:
if a new member(by member add) is recovered by snapshot, and then
become leader, the lease will never expire afterwards. leader will
log the revoke failure caused by "invalid auth token", since the
token provider is not functional, and drops all generated token
from upper layer, which in this case, is the lease revoking
routine.
When running 100 times in row those tests flaked around 10-20%. Based on
some experimentation 10 keys was enough to ensure that wal snapshot is
created and prevented any flakes.