mirror of
https://github.com/etcd-io/etcd.git
synced 2024-09-27 06:25:44 +00:00
contrib/systemd: remove v2 service files
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
This commit is contained in:
parent
0ef9ef3c74
commit
c0f6597c02
@ -1,4 +0,0 @@
|
||||
rclone.conf
|
||||
bin
|
||||
etcd2-backup.tgz
|
||||
*~
|
@ -1,9 +0,0 @@
|
||||
[Service]
|
||||
Environment="ETCD_RESTORE_MASTER_ADV_PEER_URLS=http://172.17.4.51:2379"
|
||||
Environment="RCLONE_ENDPOINT=s3-chom-testing-backups:chom-testing-backups/mytest"
|
||||
Environment="RCLONE_CONFIG_PATH=/etc/rclone.conf"
|
||||
Environment="ETCD_DATA_DIR=/var/lib/etcd2"
|
||||
Environment="ETCD_BACKUP_DIR=/var/lib/etcd2-backup"
|
||||
Environment="ETCD_RESTORE_DIR=/var/lib/etcd2-restore"
|
||||
Environment="RCLONE_CHECKSUM=true"
|
||||
|
@ -1,251 +0,0 @@
|
||||
# etcd2-backup-coreos
|
||||
|
||||
Remote backup and multi-node restore services for etcd2 clusters on CoreOS Linux.
|
||||
|
||||
**Warning:** This package is only intended for use on CoreOS Linux.
|
||||
|
||||
## Terminology
|
||||
|
||||
**Founding member** : The node which is the first member of the new recovered cluster. It is this node's rclone backup data (only) that will be used to restore the cluster. The rest of the nodes will join the cluster with no data, and simply catch up with the **founding member**.
|
||||
|
||||
## Configuration
|
||||
|
||||
Before installing etcd2-backup, configure `30-etcd2-backup-restore.conf`:
|
||||
|
||||
```
|
||||
[Service]
|
||||
Environment="ETCD_RESTORE_MASTER_ADV_PEER_URLS=<http://host:port>"
|
||||
Environment="RCLONE_ENDPOINT=remote-name:path/to/backups"
|
||||
```
|
||||
|
||||
Assuming a deployment to CoreOS with etcd2, only change:
|
||||
|
||||
* `ETCD_RESTORE_MASTER_ADV_PEER_URLS`
|
||||
This is the new advertised peer url of the new etcd2 node that will be the founding member of the new restored cluster. We will call this node the **founding member**.
|
||||
|
||||
* `RCLONE_ENDPOINT`
|
||||
The rclone endpoint to which backups will be stored.
|
||||
|
||||
Feel free to point any number of machines at the same RCLONE_ENDPOINT, path and all. Backups for each machine are stored in a sub-folder named with the machine ID (%m in systemd parlance)
|
||||
|
||||
* `./rclone.conf`
|
||||
The rclone configuration file which will be installed. Must list a `[section]` which matches `RCLONE_ENDPOINT`'s remote-name component.
|
||||
|
||||
An easy way to generate this config file is to [install rclone](http://rclone.org/install/) on a local machine. Then follow the [configuration instructions](http://rclone.org/docs/) to generate an `rclone.conf` file.
|
||||
|
||||
To adjust backup frequency, edit `./etcd2-backup.timer`
|
||||
|
||||
## Installation
|
||||
|
||||
Once those things are configured, run `./build`.
|
||||
|
||||
The `build` script generates a tarball for copying to CoreOS instances. The tarball contains the `etcd2-backup-install` script.
|
||||
|
||||
After extracting the contents of the tar file and running the install script, three new systemd services are added. One service, `etcd2-backup`, performs periodic etcd backups, while the other two services, `etcd2-restore` and `etcd2-join`, handle restore procedures.
|
||||
|
||||
* `etcd2-backup.service`
|
||||
A oneshot service which calls `etcdctl backup` and syncs the backups to the rclone endpoint (using an rclone container, of course). `etcd2-backup.timer` is responsible for periodically running this service.
|
||||
|
||||
* `etcd2-restore.service`
|
||||
A oneshot service which wipes all etcd2 data and restores a single-node cluster from the rclone backup. This is for restoring the **founding member** only.
|
||||
|
||||
* `etcd2-join.service`
|
||||
A oneshot service which wipes all etcd2 data and re-joins the new cluster. This is for adding members **after** the **founding member** has successfully established the new cluster via `etcd2-restore.service`
|
||||
|
||||
## Recovery
|
||||
|
||||
This assumes that the cluster has lost quorum and is not recoverable. Otherwise try to heal the cluster first.
|
||||
|
||||
### Backup Freshness
|
||||
|
||||
Two factors contribute to the relative freshness or staleness of a backup. The `etcd2-backup.timer` takes a backup every 30 seconds by default, and the etcd `snapshot-count` option controls how many transactions are committed between each write of the snapshot to permanent storage. Given those parameters, we can compute the upper bound on the outdatedness of a backup.
|
||||
Assumptions:
|
||||
* transaction rate is a constant `1000 transactions / second`
|
||||
* `etcd2-backup.timer` is configured for a 30 second interval
|
||||
* `etcd2 snapshot-count=10000`
|
||||
|
||||
```
|
||||
max-missed-seconds= (10000 transactions / (1000 transactions / second)) + 30 seconds = 40 seconds
|
||||
```
|
||||
|
||||
### Recovery Procedure
|
||||
|
||||
1. Make sure `etcd2.service` and `etcd2-backup.timer` are stopped on all nodes in the cluster
|
||||
|
||||
2. Restore the **founding member** by starting `etcd2-restore.service` and then, if successful, `etcd2.service`
|
||||
|
||||
3. Restore the rest of the cluster **one at a time**. Start `etcd2-join.service`, and then, if successful, `etcd2.service`. Please verify with `etcdctl cluster-health` that the expected set of nodes is present and healthy after each node joins.
|
||||
|
||||
4. Verify that the data is sane (enough). If so, kick off `etcd2-backup.timer` on all nodes and, hopefully, go back to bed.
|
||||
|
||||
## Retroactively change the founding member
|
||||
|
||||
It is necessary to change the cluster's founding member in order to restore a cluster from any other node's data.
|
||||
|
||||
Change the value of `ETCD_RESTORE_MASTER_ADV_PEER_URLS` in `30-etcd2-backup-restore.conf` to the advertised peer url of the new founding member. Repeat the install process above on all nodes in the cluster, then proceed with the [recovery procedure](README.md#recovery-procedure).
|
||||
|
||||
## Example
|
||||
|
||||
Let's pretend that we have an initial 3 node CoreOS cluster that we want to back up to S3.
|
||||
|
||||
|
||||
| ETCD_NAME | ETCD_ADVERTISED_PEER_URL |
|
||||
| ------------- |:-------------:|
|
||||
| e1 | http://172.17.4.51:2379 |
|
||||
| e2 | http://172.17.4.52:2379 |
|
||||
| e3 | http://172.17.4.53:2379 |
|
||||
|
||||
In the event that the cluster fails, we want to restore from `e1`'s backup
|
||||
|
||||
## Configuration
|
||||
|
||||
```
|
||||
[Service]
|
||||
Environment="ETCD_RESTORE_MASTER_ADV_PEER_URLS=http://172.17.4.51:2379"
|
||||
Environment="RCLONE_ENDPOINT=s3-testing-conf:s3://etcd2-backup-bucket/backups"
|
||||
```
|
||||
|
||||
The `./rclone.conf` file must contain a `[section]` matching `RCLONE_ENDPOINTS`'s remote-name component.
|
||||
|
||||
```
|
||||
[s3-testing-conf]
|
||||
type = s3
|
||||
access_key_id = xxxxxxxx
|
||||
secret_access_key = xxxxxx
|
||||
region = us-west-1
|
||||
endpoint =
|
||||
location_constraint =
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```sh
|
||||
cd etcd2-backup
|
||||
./build
|
||||
scp etcd2-backup.tgz core@e1:~/
|
||||
ssh core@e1
|
||||
e1 $ mkdir -p ~/etcd2-backup
|
||||
e1 $ mv etcd2-backup.tgz etcd2-backup/
|
||||
e1 $ cd etcd2-backup
|
||||
e1 $ tar zxvf ~/etcd2-backup.tgz
|
||||
e1 $ ./etcd2-backup-install
|
||||
# Only do the following two commands if this node should generate backups
|
||||
e1 $ sudo systemctl enable etcd2-backup.timer
|
||||
e1 $ sudo systemctl start etcd2-backup.timer
|
||||
|
||||
e1 $ exit
|
||||
```
|
||||
|
||||
Now `e1`'s etcd data will be backed up to `s3://etcd2-backup-bucket/backups/<e1-machine-id>/` according to the schedule described in `etcd2-backup.timer`.
|
||||
|
||||
Repeat the process for `e2` and `e3`. To stop a node from generating backups, omit enabling and starting `etcd2-backup.timer`.
|
||||
|
||||
## Restore the cluster
|
||||
|
||||
Let's assume that a mischievous friend decided it would be a good idea to corrupt the etcd2 data-dir on ALL of the nodes (`e1`,`e2`,`e3`). Simply restore the cluster from `e1`'s backup.
|
||||
|
||||
Here's how to recover:
|
||||
|
||||
```sh
|
||||
# First, ENSURE etcd2 and etcd2-backup are not running on any nodes
|
||||
for node in e{1..3};do
|
||||
ssh core@$node "sudo systemctl stop etcd2.service etcd2-backup.{timer,service}"
|
||||
done
|
||||
|
||||
ssh core@e1 "sudo systemctl start etcd2-restore.service && sudo systemctl start etcd2.service"
|
||||
|
||||
for node in e{2..3};do
|
||||
ssh core@$node "sudo systemctl start etcd2-join.service && sudo systemctl start etcd2.service"
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
After e2 and e3 finish catching up, the cluster should be back to normal.
|
||||
|
||||
## Migrate the cluster
|
||||
|
||||
The same friend who corrupted the etcd2 data-dirs decided that to have more fun. This time, the friend dumps coffee on the machines hosting `e1`, `e2` and `e3`. There is a horrible smell, and the machines are dead.
|
||||
|
||||
Luckily, there's a new 3-node etcd2 cluster ready to go, along with the S3 backup for `e1` from the old cluster.
|
||||
|
||||
The new cluster configuration looks like this. Assume that etcd2-backup is not installed. (If it is, make sure it's not running on any nodes)
|
||||
|
||||
| ETCD_NAME | ETCD_ADVERTISED_PEER_URL |
|
||||
| ------------- |:-------------:|
|
||||
| q1 | http://172.17.8.201:2379 |
|
||||
| q2 | http://172.17.8.202:2379 |
|
||||
| q3 | http://172.17.8.203:2379 |
|
||||
|
||||
We will assume `q1` is the chosen founding member, though picking any node is fine.
|
||||
|
||||
## Migrate the remote backup
|
||||
|
||||
First, copy the backup from `e1`'s backup folder to `q1`'s backup folder. I will show the S3 example.
|
||||
|
||||
```sh
|
||||
# Make sure to remove q1's backup directory, if it exists already
|
||||
aws s3 rm --recursive s3://etcd2-backup-bucket/backups/<q1-machine-id>
|
||||
aws s3 cp --recursive s3://etcd2-backup-bucket/backups/<e1-machine-id> s3://etcd2-backup-bucket/backups/<q1-machine-id>
|
||||
```
|
||||
|
||||
## Configure the New Cluster
|
||||
|
||||
```
|
||||
[Service]
|
||||
Environment="ETCD_RESTORE_MASTER_ADV_PEER_URLS=http://172.17.8.201:2379"
|
||||
Environment="RCLONE_ENDPOINT=s3-testing-conf:s3://etcd2-backup-bucket/backups"
|
||||
```
|
||||
|
||||
Since this is a new cluster, each new node will have a new `machine-id` and will not clobber the backups from the old cluster, even though `RCLONE_ENDPOINT` is the same for both the old `e` cluster and the new `q` cluster.
|
||||
|
||||
## Installation
|
||||
|
||||
We first want to install the configured etcd2-backup package on all nodes, but not start any services yet.
|
||||
|
||||
```sh
|
||||
cd etcd2-backup
|
||||
./build
|
||||
for node in q{1..3};do
|
||||
scp etcd2-backup.tgz core@$node:~/
|
||||
ssh core@$node "mkdir -p ~/etcd2-backup"
|
||||
ssh core@$node "mv etcd2-backup.tgz etcd2-backup/"
|
||||
ssh core@$node " cd etcd2-backup"
|
||||
ssh core@$node " tar zxvf ~/etcd2-backup.tgz"
|
||||
ssh core@$node " ./etcd2-backup-install"
|
||||
done
|
||||
```
|
||||
|
||||
## Migrate the Cluster
|
||||
|
||||
With `q1` as the founding member.
|
||||
|
||||
```sh
|
||||
# First, make SURE etcd2 and etcd2-backup are not running on any nodes
|
||||
|
||||
for node in q{1..3};do
|
||||
ssh core@$node "sudo systemctl stop etcd2.service"
|
||||
done
|
||||
|
||||
ssh core@q1 "sudo systemctl start etcd2-restore.service && sudo systemctl start etcd2.service"
|
||||
|
||||
for node in q{2..3};do
|
||||
ssh core@$node "sudo systemctl start etcd2-join.service && sudo systemctl start etcd2.service"
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
After confirming the cluster has migrated properly, start and enable `etcd2-backup.timer` on at least one node.
|
||||
|
||||
```sh
|
||||
ssh core@q1 "sudo systemctl enable etcd2-backup.service && sudo systemctl start etcd2-backup.service"
|
||||
```
|
||||
|
||||
There should now be periodic backups going to: `s3://etcd2-backup-bucket/backups/<q1-machine-id>`
|
||||
|
||||
## Words of caution
|
||||
|
||||
1. Notice the `sleep 10` commands that follow starting `etcd2-join.service` and then `etcd2.service`. This sleep is there to allow the member that joined to cluster time to catch up on the cluster state before we attempt to add the next member. This involves sending the entire snapshot over the network. If the dataset is large, the network between nodes is slow, disks are already bogged down, or the system is otherwise overutilized, try increasing the sleep.
|
||||
|
||||
In the case of large data sets, it is recommended to copy the data directory produced by `etcd2-restore` on the founding member to the other nodes before running `etcd2-join` on them. This will avoid etcd transferring the entire snapshot to every node after it joins the cluster.
|
||||
|
||||
2. It is not recommended clients be allowed to access the etcd2 cluster **until** all members have been added and finished catching up.
|
@ -1,25 +0,0 @@
|
||||
#!/bin/bash -e
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
||||
cd "${SCRIPT_DIR}"
|
||||
|
||||
if [ ! -f "./rclone.conf" ];then
|
||||
echo "Could not find $(pwd)/rclone.conf"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p ./bin
|
||||
|
||||
GOPATH=$(pwd) go build -o ./bin/etcd2-restore etcd2-restore.go
|
||||
|
||||
tar cfz ./etcd2-backup.tgz \
|
||||
*.{service,timer,conf} \
|
||||
etcd2-join \
|
||||
bin/etcd2-restore \
|
||||
rclone.conf \
|
||||
etcd2-backup-install
|
||||
|
||||
printf "Install package saved at\n\t -> $(pwd)/etcd2-backup.tgz\n\n"
|
||||
|
||||
printf "Copy to target machine and deploy.\n $> tar zxvf etcd2-backup.tgz && ./etcd2-backup-install\n\n"
|
||||
echo "WARNING: this tarball contains your rclone secrets. Be careful!"
|
@ -1,33 +0,0 @@
|
||||
#!/bin/bash -e
|
||||
|
||||
if [ ! -f /etc/os-release ];then
|
||||
echo "Could not find /etc/os-release. This is not CoreOS Linux"
|
||||
exit 1
|
||||
fi
|
||||
. /etc/os-release
|
||||
if [ ! "$ID" == "coreos" ];then
|
||||
echo "os-release error: Detected ID=$ID: this is not CoreOS Linux"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
||||
cd "${SCRIPT_DIR}"
|
||||
|
||||
sudo cp ./rclone.conf /etc/
|
||||
|
||||
sudo mkdir -p /opt/bin
|
||||
|
||||
sudo mv etcd2-join bin/etcd2-restore /opt/bin
|
||||
sudo mv *.{service,timer} /etc/systemd/system
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
|
||||
for jobtype in restore backup join;do
|
||||
sudo mkdir -p /var/run/systemd/system/etcd2-${jobtype}.service.d
|
||||
sudo cp 30-etcd2-backup-restore.conf /var/run/systemd/system/etcd2-${jobtype}.service.d/
|
||||
sudo ln -sf /var/run/systemd/system/etcd2{,-${jobtype}}.service.d/20-cloudinit.conf
|
||||
done
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
|
||||
echo "etcd2-backup install complete!"
|
@ -1,29 +0,0 @@
|
||||
[Unit]
|
||||
Description=rclone powered etcd2 backup service
|
||||
After=etcd2.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
|
||||
ExecStartPre=/usr/bin/rm -rf ${ETCD_BACKUP_DIR}
|
||||
ExecStartPre=/usr/bin/mkdir -p ${ETCD_BACKUP_DIR}/member/snap
|
||||
ExecStartPre=/usr/bin/echo ETCD_DATA_DIR: ${ETCD_DATA_DIR}
|
||||
ExecStartPre=/usr/bin/echo ETCD_BACKUP_DIR: ${ETCD_BACKUP_DIR}
|
||||
ExecStartPre=/usr/bin/etcdctl backup --data-dir=${ETCD_DATA_DIR} --backup-dir=${ETCD_BACKUP_DIR}
|
||||
ExecStartPre=/usr/bin/touch ${ETCD_BACKUP_DIR}/member/snap/iamhere.txt
|
||||
|
||||
# Copy the last backup, in case the new upload gets corrupted
|
||||
ExecStartPre=-/usr/bin/docker run --rm \
|
||||
-v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
|
||||
quay.io/coreos/rclone:latest --config /etc/rclone.conf --checksum=${RCLONE_CHECKSUM} \
|
||||
copy ${RCLONE_ENDPOINT}/%m ${RCLONE_ENDPOINT}/%m_backup
|
||||
|
||||
# Upload new backup
|
||||
ExecStart=/usr/bin/docker run --rm \
|
||||
-v ${ETCD_BACKUP_DIR}:/etcd2backup \
|
||||
-v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
|
||||
quay.io/coreos/rclone:latest --config ${RCLONE_CONFIG_PATH} --checksum=${RCLONE_CHECKSUM} \
|
||||
copy /etcd2backup/ ${RCLONE_ENDPOINT}/%m/
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
@ -1,9 +0,0 @@
|
||||
[Unit]
|
||||
Description=etcd2-backup service timer
|
||||
|
||||
[Timer]
|
||||
OnBootSec=1min
|
||||
OnUnitActiveSec=30sec
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
@ -1,39 +0,0 @@
|
||||
#!/bin/bash -e
|
||||
|
||||
# Copyright 2015 CoreOS, Inc.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http:#www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
if [ $# -lt 3 ];then
|
||||
echo "USAGE: $0 <master_advertise_peer_urls> <target_name> <target_peer_url>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
function convertDropin {
|
||||
sed -e 's/^Added.*$/[Service]/g' -e 's/="/=/g' -e 's/^ETCD_/Environment="ETCD_/g'
|
||||
}
|
||||
masterAdvUrl=$1
|
||||
targetName=$2
|
||||
targetUrl=$3
|
||||
|
||||
cmd="etcdctl --peers ${masterAdvUrl} member add ${targetName} ${targetUrl}"
|
||||
|
||||
ENV_VARS=`$cmd`
|
||||
echo "${ENV_VARS}" | convertDropin > 40-boostrap-cluster.conf
|
||||
|
||||
sudo mv 40-boostrap-cluster.conf /var/run/systemd/system/etcd2.service.d/
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl cat etcd2.service
|
||||
echo "You can now start etcd2"
|
||||
|
||||
|
@ -1,13 +0,0 @@
|
||||
[Unit]
|
||||
Description=Add etcd2 node to existing cluster
|
||||
Conflicts=etcd2.service etcd2-backup.service
|
||||
Before=etcd2.service etcd2-backup.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStartPre=/usr/bin/rm -rf ${ETCD_DATA_DIR}/member
|
||||
ExecStartPre=/usr/bin/chown -R etcd:etcd ${ETCD_DATA_DIR}
|
||||
ExecStart=/opt/bin/etcd2-join ${ETCD_RESTORE_MASTER_ADV_PEER_URLS} ${ETCD_NAME} ${ETCD_INITIAL_ADVERTISE_PEER_URLS}
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
@ -1,126 +0,0 @@
|
||||
// Copyright 2015 The etcd Authors
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path"
|
||||
"regexp"
|
||||
"time"
|
||||
)
|
||||
|
||||
var (
|
||||
etcdctlPath string
|
||||
etcdPath string
|
||||
etcdRestoreDir string
|
||||
etcdName string
|
||||
etcdPeerUrls string
|
||||
)
|
||||
|
||||
func main() {
|
||||
flag.StringVar(&etcdctlPath, "etcdctl-path", "/usr/bin/etcdctl", "absolute path to etcdctl executable")
|
||||
flag.StringVar(&etcdPath, "etcd-path", "/usr/bin/etcd2", "absolute path to etcd2 executable")
|
||||
flag.StringVar(&etcdRestoreDir, "etcd-restore-dir", "/var/lib/etcd2-restore", "absolute path to etcd2 restore dir")
|
||||
flag.StringVar(&etcdName, "etcd-name", "default", "name of etcd2 node")
|
||||
flag.StringVar(&etcdPeerUrls, "etcd-peer-urls", "", "advertise peer urls")
|
||||
|
||||
flag.Parse()
|
||||
|
||||
if etcdPeerUrls == "" {
|
||||
panic("must set -etcd-peer-urls")
|
||||
}
|
||||
|
||||
if finfo, err := os.Stat(etcdRestoreDir); err != nil {
|
||||
panic(err)
|
||||
} else {
|
||||
if !finfo.IsDir() {
|
||||
panic(fmt.Errorf("%s is not a directory", etcdRestoreDir))
|
||||
}
|
||||
}
|
||||
|
||||
if !path.IsAbs(etcdctlPath) {
|
||||
panic(fmt.Sprintf("etcdctl-path %s is not absolute", etcdctlPath))
|
||||
}
|
||||
|
||||
if !path.IsAbs(etcdPath) {
|
||||
panic(fmt.Sprintf("etcd-path %s is not absolute", etcdPath))
|
||||
}
|
||||
|
||||
if err := restoreEtcd(); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
|
||||
func restoreEtcd() error {
|
||||
etcdCmd := exec.Command(etcdPath, "--force-new-cluster", "--data-dir", etcdRestoreDir)
|
||||
|
||||
etcdCmd.Stdout = os.Stdout
|
||||
etcdCmd.Stderr = os.Stderr
|
||||
|
||||
if err := etcdCmd.Start(); err != nil {
|
||||
return fmt.Errorf("Could not start etcd2: %s", err)
|
||||
}
|
||||
defer etcdCmd.Wait()
|
||||
defer etcdCmd.Process.Kill()
|
||||
|
||||
return runCommands(10, 2*time.Second)
|
||||
}
|
||||
|
||||
var (
|
||||
clusterHealthRegex = regexp.MustCompile(".*cluster is healthy.*")
|
||||
lineSplit = regexp.MustCompile("\n+")
|
||||
colonSplit = regexp.MustCompile(`\:`)
|
||||
)
|
||||
|
||||
func runCommands(maxRetry int, interval time.Duration) error {
|
||||
var retryCnt int
|
||||
for retryCnt = 1; retryCnt <= maxRetry; retryCnt++ {
|
||||
out, err := exec.Command(etcdctlPath, "cluster-health").CombinedOutput()
|
||||
if err == nil && clusterHealthRegex.Match(out) {
|
||||
break
|
||||
}
|
||||
fmt.Printf("Error: %s: %s\n", err, string(out))
|
||||
time.Sleep(interval)
|
||||
}
|
||||
|
||||
if retryCnt > maxRetry {
|
||||
return fmt.Errorf("Timed out waiting for healthy cluster\n")
|
||||
}
|
||||
|
||||
var (
|
||||
memberID string
|
||||
out []byte
|
||||
err error
|
||||
)
|
||||
if out, err = exec.Command(etcdctlPath, "member", "list").CombinedOutput(); err != nil {
|
||||
return fmt.Errorf("Error calling member list: %s", err)
|
||||
}
|
||||
members := lineSplit.Split(string(out), 2)
|
||||
if len(members) < 1 {
|
||||
return fmt.Errorf("Could not find a cluster member from: \"%s\"", members)
|
||||
}
|
||||
parts := colonSplit.Split(members[0], 2)
|
||||
if len(parts) < 2 {
|
||||
return fmt.Errorf("Could not parse member id from: \"%s\"", members[0])
|
||||
}
|
||||
memberID = parts[0]
|
||||
|
||||
out, err = exec.Command(etcdctlPath, "member", "update", memberID, etcdPeerUrls).CombinedOutput()
|
||||
fmt.Printf("member update result: %s\n", string(out))
|
||||
return err
|
||||
}
|
@ -1,26 +0,0 @@
|
||||
[Unit]
|
||||
Description=Restore single-node etcd2 node from rclone endpoint
|
||||
Conflicts=etcd2.service etcd2-backup.service
|
||||
Before=etcd2.service etcd2-backup.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStartPre=/usr/bin/rm -rf ${ETCD_DATA_DIR}/member
|
||||
ExecStartPre=/usr/bin/mkdir -p ${ETCD_RESTORE_DIR}
|
||||
ExecStartPre=/usr/bin/rm -rf ${ETCD_RESTORE_DIR}/member
|
||||
|
||||
# Copy the last backup from rclone endpoint
|
||||
ExecStartPre=/usr/bin/docker run --rm \
|
||||
-v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
|
||||
-v ${ETCD_RESTORE_DIR}:/etcd2backup \
|
||||
quay.io/coreos/rclone:latest \
|
||||
--config /etc/rclone.conf --checksum=${RCLONE_CHECKSUM} \
|
||||
copy ${RCLONE_ENDPOINT}/%m /etcd2backup
|
||||
|
||||
ExecStartPre=/usr/bin/ls -R ${ETCD_RESTORE_DIR}
|
||||
ExecStartPre=/opt/bin/etcd2-restore -etcd-name ${ETCD_NAME} -etcd-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS}
|
||||
ExecStartPre=/usr/bin/cp -r ${ETCD_RESTORE_DIR}/member ${ETCD_DATA_DIR}/member
|
||||
ExecStart=/usr/bin/chown -R etcd:etcd ${ETCD_DATA_DIR}/member
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
Loading…
x
Reference in New Issue
Block a user