mirror of
https://github.com/etcd-io/etcd.git
synced 2024-09-27 06:25:44 +00:00

Current membership changing functionality of etcd seems to have a problem which can cause deadlock. How to produce: 1. construct N node cluster 2. add N new nodes with etcdctl member add, without starting the new members What happens: After finishing add N nodes, a total number of the cluster becomes 2 * N and a quorum number of the cluster becomes N + 1. It means membership change requires at least N + 1 nodes because Raft treats membership information in its log like other ordinal log append requests. Assume the peer URLs of the added nodes are wrong because of miss operation or bugs in wrapping program which launch etcd. In such a case, both of adding and removing members are impossible because the quorum isn't preserved. Of course ordinal requests cannot be served. The cluster would seem to be deadlock. Of course, the best practice of adding new nodes is adding one node and let the node start one by one. However, the effect of this problem is so serious. I think preventing the problem forcibly would be valuable. Solution: This patch lets etcd forbid adding a new node if the operation changes quorum and the number of changed quorum is larger than a number of running nodes. If etcd is launched with a newly added option -strict-reconfig-check, the checking logic is activated. If the option isn't passed, default behavior of reconfig is kept. Fixes https://github.com/coreos/etcd/issues/3477
41 lines
1.7 KiB
Go
41 lines
1.7 KiB
Go
// Copyright 2015 CoreOS, Inc.
|
|
//
|
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
|
// you may not use this file except in compliance with the License.
|
|
// You may obtain a copy of the License at
|
|
//
|
|
// http://www.apache.org/licenses/LICENSE-2.0
|
|
//
|
|
// Unless required by applicable law or agreed to in writing, software
|
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
// See the License for the specific language governing permissions and
|
|
// limitations under the License.
|
|
|
|
package etcdserver
|
|
|
|
import (
|
|
"errors"
|
|
|
|
etcdErr "github.com/coreos/etcd/error"
|
|
)
|
|
|
|
var (
|
|
ErrUnknownMethod = errors.New("etcdserver: unknown method")
|
|
ErrStopped = errors.New("etcdserver: server stopped")
|
|
ErrIDRemoved = errors.New("etcdserver: ID removed")
|
|
ErrIDExists = errors.New("etcdserver: ID exists")
|
|
ErrIDNotFound = errors.New("etcdserver: ID not found")
|
|
ErrPeerURLexists = errors.New("etcdserver: peerURL exists")
|
|
ErrCanceled = errors.New("etcdserver: request cancelled")
|
|
ErrTimeout = errors.New("etcdserver: request timed out")
|
|
ErrTimeoutDueToLeaderFail = errors.New("etcdserver: request timed out, possibly due to previous leader failure")
|
|
ErrTimeoutDueToConnectionLost = errors.New("etcdserver: request timed out, possibly due to connection lost")
|
|
ErrNotEnoughStartedMembers = errors.New("etcdserver: re-configuration failed due to not enough started members")
|
|
)
|
|
|
|
func isKeyNotFound(err error) bool {
|
|
e, ok := err.(*etcdErr.Error)
|
|
return ok && e.ErrorCode == etcdErr.EcodeKeyNotFound
|
|
}
|