Address https://github.com/coreos/etcd/issues/6754.
In case there are network errors or unexpected EOF errors
in TimeToLive http requests to leader, we translate that into
ErrNoLeader, and expects the client to retry its request.
The cost of bcrypt password checking is quite high (almost 100ms on a
modern machine) so executing it in apply loop will be
problematic. This commit exclude the checking mechanism to the API
layer. The password checking is validated with the OCC like way
similar to the auth of serializable get.
This commit also removes a unit test of Authenticate RPC from
auth/store_test.go. It is because the RPC now accepts an auth request
unconditionally and delegates the checking functionality to
authStore.CheckPassword() (so a unit test for CheckPassword() is
added). The combination of the two functionalities can be tested by
e2e (e.g. TestCtlV3AuthWriteKey).
Fixes https://github.com/coreos/etcd/issues/6530
When running test suites for a client locally I'm getting spammed by log
lines such as:
etcdserver: apply entries took too long [14.226771ms for 1 entries]
The comments in #6278 mention there were future plans of increasing the
threshold for logging these warnings, but it hadn't been done yet.
If we promote the lessor before finish applying all
entries from the last term, we might incorrectly renew
the already revoked leases.
Here is an example:
- Term 1: revoke lease A accepted by raft
- Old leader failed, new election happened
- Term 2: promote
- Term 2: keep alive A succeed. A now has 10 seconds TTL
- Term 2: revoke lease A from Term 1 got committed and applied
- Term 2: the lease A with 10 seconds TTL is revoked
To solve this, the new leader MUST apply all entries from old term
before promote its lessor to start accept renew requests.
When the non Leader etcd server receives a LeaseTimeToLive on a nonexistent lease, it responds with a nil resp and a nil error The invoking function parses the nil resp and results a segmentation fault.
I fix the bug by making sure the lease not found error is returned so that the invoking function parses the the error message instead.
fix#6537
When 1000 leases expired at the same time, etcd takes more than 5 seconds to clean them. This means that even after the leases have expired, keys associated with leases are still accessible. I increase the deletion throughput by parallelizing leases deletion process.
All outstanding goroutines now go into the etcdserver waitgroup. goroutines are
shutdown with a "stopping" channel which is closed when the run() goroutine
shutsdown. The done channel will only close once the waitgroup is totally cleared.
Added support into the v2 API to fix an issue (6433) where if there is a semicolon
and fields after it the API would return an "invalid Content-type" message even
if the content type was actually correct
If a user upgrades etcd from 2.3.x to 3.0 and shutdown the
cluster immediately without triggering any new backend writes,
then the consistent index in backend would be zero.
The user cannot restart etcdserver due to today's strick index
match checking. We now have to lose this a bit for this case.