What is the process performed by Rancher v2.x when upgrading a Rancher managed Kubernetes cluster?

This document (000020193) is provided subject to the disclaimer at the end of this document.

Situation

Question

Pre-requisites

Running Rancher v2.0.x - v2.3.x. Note, Kubernetes upgrades will be changing in v2.4.x, see Further Reading below.

RKE CLI v0.2.x+

Answer

Rancher, either through the UI or API, can be used to upgrade a Kubernetes cluster that was provisioned using the "Custom" option or on cloud infrastructure such as AWS EC2 or Azure. This can be accomplished by editing the cluster and selecting the desired Kubernetes version. Clusters provisioned with the RKE CLI can also be upgraded by editing the kubernetes_version key in the cluster YAML file. This will trigger an update of all the Kubernetes components in the order listed below:

Etcd plane

Each etcd container is updated, one node at a time. If the etcd version has not changed between versions of Kubernetes, no action is taken. The process consists of:

Downloading etcd image
Stopping and renaming old etcd container (backend datastore is preserved on host)
Creating and starting new etcd container
Running etcd health check
Removing old etcd container

For RKE CLI provisioned clusters, the etcd-rolling-snapshot container is also upgraded if a new version is available.

Control plane

Every Kubernetes update will require the control plane components to be updated. All control plane nodes are updated in parallel. The process consists of:

Downloading hyperkube image, which is used by all control plane components.
Stopping and renaming old kube-apiserver container
Creating and starting new kube-apiserver container
Running kube-apiserver health check
Removing old kube-apiserver container
Stopping and renaming old kube-controller-manager container
Creating and starting new kube-controller-manager container
Running kube-controller-manager health check
Removing old kube-controller-manager container
Stopping and renaming old kube-scheduler container
Creating and starting new kube-scheduler container
Running kube-scheduler health check
Removing old kube-scheduler container

Worker plane

Every Kubernetes update will require the worker components to be updated. These components run on all nodes, including the control plane and etcd. Nodes are updating in parallel. The process consists of:

Downloading hyperkube image (if not already present)
Stopping and renaming old kubelet container
Creating and starting new kubelet container
Running kubelet health check
Removing old kubelet container
Stopping and renaming old kube-proxy container
Creating and starting new kube-proxy container
Running kube-proxy health check
Removing old kube-proxy container

Addons & user workloads

Once Kubernetes etcd, control plane, and worker components have been updated, the latest manifests for addons are applied. This includes, but is not limited to KubeDNS/CoreDNS, Nginx Ingress, Metrics Server, and CNI plugin (Calico, Weave, Flannel, Canal). Depending on the manifest deltas and the upgrade strategy defined in the manifest, pods and their corresponding containers may or may not be removed and recreated. Please be aware that some of these addons are critical for your cluster to operator correctly and you may experience brief outages if these workloads are restarted. For example, when KubeDNS/CoreDNS is restarted, you could have issues resolving hostname to IP addresses. When the Nginx Ingress is restarted, layer 7 http/https traffic from outside your cluster to your workloads may get interrupted. When your CNI plugin is restarted on each node, the workloads running on the node may temporarily not be able to reach workloads running on other nodes. The best way to minimize outages or disruptions is to make sure you have proper fault tolerance in your cluster.

The kubelet automatically destroys and recreates all user workload pods when the spec hash value is changed. This value will change for a pod if the Kubernetes upgrade involves any field changes in the pod manifest, such as a new field or the removal of a deprecated field. As a best practice, it's best to assume all your pods and containers will be destroyed and recreated during a Kubernetes upgrade. This is more likely to happen for major/minor releases and less likely for patch releases.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.