Currently our sample systemd service file `contrib/systemd/etcd.service`
have startup/shutdown dependency as below:
[Unit]
After=network.target
For some rare condition, e.g. bare matel deployment with slow network
startup, IP could not be assigned e arly enough before etcd default
`ETCD_HEARTBEAT_INTERVAL="100"` and `ETCD_ELECTION_TIMEOUT="1000"` get
timeouted, after graceful system reboot.
This cause etcd false negative classify itself use unhealthy, therefore
stop rejoining the remaining online cluster members.
This PR introduce:
- `etcd.service`: Ensure startup after `network-online.target` and
`time-sync.target`, so effective network connectivity and synced
time is available.
The logic is concept proof by
<https://github.com/alvistack/ansible-role-etcd/tree/develop>; also
works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple node with loopback mount.
Also see:
- <https://github.com/ceph/ceph/pull/36776>
- <https://github.com/etcd-io/etcd/pull/12259>
- <https://github.com/cri-o/cri-o/pull/4128>
- <https://github.com/kubernetes/release/pull/1504>
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
From (systemd NetworkTarget description)[https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/]:
```
[...]since the shutdown ordering of units in systemd is the reverse of the startup ordering, any unit that is order After=network.target can be sure that it is stopped before the network is shut down if the system is powered off. This allows services to cleanly terminate connections before going down, instead of abruptly losing connectivity for ongoing connections, leaving them in an undefined state.[...]
```