How to shutdown a Kubernetes cluster (Rancher Kubernetes Engine (RKE) CLI provisioned or Rancher v2.x Custom clusters)

This document (000020031) is provided subject to the disclaimer at the end of this document.

Situation

Task

This article provides instructions for safely shutting down a Kubernetes cluster provisioned via the Rancher Kubernetes Engine (RKE) CLI or a Rancher v2.x provisioned Custom Cluster.

Requirements

A Kubernetes cluster launched with the RKE CLI or from Rancher 2.x as a Custom Cluster

Background

If you have a need to shut down the infrastructure running a Kubernetes cluster (datacenter maintenance, migration, etc.) this guide will provide steps in the proper order to ensure a safe cluster shutdown. This guide has command examples for RKE-deployed clusters but the order of operations and the process is similar for most Kubernetes distributions.

Please ensure you complete an etcd backup before continuing this process. A guide regarding the backup and restore process can be found here.

Solution

N.B. If you have nodes that share worker, control plane, or etcd roles, postpone the docker stop and shutdown operations until worker or control plane containers have been stopped.

Draining nodes.

For all nodes, prior to stopping the containers, run:

kubectl get nodes

To identify the desired node, then run:

kubectl drain <node name>

This will safely evict any pods, and you can proceed with the following steps to a shutdown.

Shutting down the workers nodes

For each worker node:

ssh into the worker node
stop kubelet and kube-proxy by running sudo docker stop kubelet kube-proxy
stop docker by running sudo service docker stop or sudo systemctl stop docker
shutdown the system sudo shutdown now

Shutting down the control plane nodes

For each control plane node:

ssh into the control plane node
stop kubelet and kube-proxy by running sudo docker stop kubelet kube-proxy
stop kube-scheduler and kube-controller-manager by running sudo docker stop kube-scheduler kube-controller-manager
stop kube-apiserver by running sudo docker stop kube-apiserver
stop docker by running sudo service docker stop or sudo systemctl stop docker
shutdown the system sudo shutdown now

Shutting down the etcd nodes

For each etcd node:

ssh into the etcd node
stop kubelet and kube-proxy by running sudo docker stop kubelet kube-proxy
stop etcd by running sudo docker stop etcd
stop docker by running sudo service docker stop or sudo systemctl stop docker
shutdown the system sudo shutdown now

Shutting down storage

Shut down any persistent storage devices that you might have in your datacenter (such as NAS storage devices) if applicable. It iss important that you do this after shutting everything else down to prevent data loss/corruption for containers requiring persistency.

N.B. If you are running a cluster that was not deployed through RKE then the order of the process is still the same, however the commands may vary. For instance, some distributions run kubelet and other control plane items as a service on the node rather than in docker. Check documentation for the specific Kubernetes distribution for information as to how to stop these services.

Starting a Kubernetes cluster up after shutdown

Kubernetes is good about recovering from a cluster shutdown and requires little intervention, though there is a specific order in which things should be powered back on to minimize errors.

Power on any storage devices if applicable.

Check with your storage vendor on how to properly power on you storage devices and verify that they are ready.
For each etcd node:
1. Power on the system/start the instance.
2. Log into the system via ssh.
3. Ensure docker has started sudo service docker status or sudo systemctl status docker
4. Ensure etcd and kubelet’s status shows Up in Docker sudo docker ps
For each control plane node:
1. Power on the system/start the instance.
2. Log into the system via ssh.
3. Ensure docker has started sudo service docker status or sudo systemctl status docker
4. Ensure kube-apiserver, kube-scheduler, kube-controller-manager, and kubelet’s status shows Up in Docker sudo docker ps
For each worker node:
1. Power on the system/start the instance.
2. Log into the system via ssh.
3. Ensure docker has started sudo service docker status or sudo systemctl status docker
4. Ensure kubelet’s status shows Up in Docker sudo docker ps
Log into the Rancher UI (or use kubectl) and check your various projects to ensure workloads have started as expected. This may take a few minutes depending on the number of workloads and your server capacity.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.