How to migrate etcd data directory to a dedicated filesystem?

This document (000020050) is provided subject to the disclaimer at the end of this document.

Situation

Task

When running large Rancher installations or large clusters, it may be necessary to reduce IO contention on the disks for etcd. By default, etcd data is stored in folder /var/lib/etcd, which is most likely stored on the root file system. To avoid sharing the disk IOPS with other system components, it might be a good idea to migrate the etcd data directory to a dedicated file system to improve performance.

Pre-requisites

RKE cluster
Root access to all etcd nodes.
A new file system with at least 2GB free, but we recommend 8GB or higher. Please work with your systems team to create and mount the file system.
Etcd backups should be configured and verified.
Schedule at least an hour of downtime during your change management maintenance window.
It is highly recommended to pause/halt any new deployments and CI/CD jobs during this change window.

Resolution

Before making any changes, please take an etcd snapshot using one of the following:

For new clusters

For a new cluster, please see our installation documentation

NOTE: Please make sure you have a file system mounted to "/var/lib/etcd/" before creating the cluster.

For existing clusters

Option A - In-place migration

SSH into the first etcd node and become root.

Stop etcd container

docker update --restart=no etcd && docker stop etcd

Verify etcd is stopped, and there are no open files.
```
lsof | grep '/var/lib/etcd/'
```
Move etcd data to a temporary location
```
mv /var/lib/etcd /var/lib/etcd_tmp
```
Create a new file system and mount it to "/var/lib/etcd." Please work with your systems team for this step.
Verify new file systems
```
df -H /var/lib/etcd
```
Move etcd data from temporary location to new file system
```
rsync -av --progress /var/lib/etcd_tmp/ /var/lib/etcd/
```

Restart etcd

docker update --restart=yes etcd && docker start etcd

Verify etcd health
```
docker exec -it etcd member list
```
Repeat the process until all etcd nodes have been updated.
Once all nodes have been updated, please cleanup the temporary data.
```
rm -rf /var/lib/etcd_tmp/
```

Option B - Rolling replacement

Create a new node with the dedicated file system mount at "/var/lib/etcd/."
Join the new nodes to the existing cluster.
Waiting for cluster upgrade to finish.
Verify etcd health
```
docker exec -it etcd member list
```
Remove old nodes from the cluster using documentation
Repeat the process until all etcd nodes have been replaced.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.