Best Practices Rancher
This document (000020105) is provided subject to the disclaimer at the end of this document.
Situation
This article aims to provide a number of checks that can be evaluated to ensure best practices are in place when planning, building or preparing a Rancher 2.x and Kubernetes environment.
1. Architecture
1.1 Nodes
Understanding workload resource needs in downstream clusters upfront can help choose an appropriate node configuration; some nodes may need different configurations; however, all nodes of the same role are generally configured the same.
Standardize on supported versions and ensure minimum requirements are met:
- Confirm the OS is covered in the supported versions
- Resource needs can vary based on cluster size and workload, however, in general, no less than 8GB of memory and 2 vCPUs is recommended
- SSD storage is recommended, especially for nodes with the
etcd
role - Firewall rules allow connectivity for nodes (k3s, RKE)
- A static IP for all nodes is required, if using DHCP, all nodes should have a reserved address
- Swap is disabled on the nodes
- NTP is enabled on the nodes
1.2 Separation of concerns
The Rancher management cluster should be dedicated to running the Rancher deployment, additional workloads added to the cluster can contend for resources and impact the performance and predictability of Rancher.
This is also important to consider in downstream clusters, the etcd and control plane nodes (RKE), and server nodes (k3s) should be dedicated to the purpose. When possible, it is recommended that each node have a single role, for example, separate nodes for the etcd and control plane roles.
Rancher management cluster
- Check for any unexpected pods running in the cluster:
kubectl get pods --all-namespaces
- Check for any single points of failure or discrepancies in OS, kernel and CRI version:
kubectl get nodes -o wide
Downstream cluster
- Check for any unexpected pods running on server nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/master=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
-
- Check for any unexpected pods running on etcd nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/etcd=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
-
- Check for any unexpected pods running on control plane nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/controlplane=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
1.3 High Availability
Ensure nodes within a cluster are spread across separate failure boundaries as much as possible. This could mean VMs running on separate physical hosts, data centres, switches, storage pools, etc. If running in a cloud environment, instances in separate availability zones.
For High Availability in Rancher, a Kubernetes install is required.
-
- When deploying the Rancher management cluster it is recommended to use the following configuration:
Distribution | Recommendation |
k3s | 2 server nodes |
RKE | 3 nodes with all roles |
-
- Confirm the components of all clusters and external datastores (k3s) are satisfying minimum HA requirements:
Component | Minimum | Recommended | Notes |
external datastore | 2 | 2 or greater | The external datastore should provide failover to a standby using the datastore-endpoint |
server nodes | 2 | 2 or greater | Allow tolerance for at least 1 server node failure |
agent nodes | 2 | N/A | Allow tolerance for at least 1 agent node failure, scale up to meet the workload needs |
Component | Minimum | Recommended | Notes |
etcd nodes | 3 | 3 | To maintain quorum it is important to have an uneven # of nodes to provide tolerance for at least 1 node failure |
control plane nodes | 2 | 2 | Allow tolerance for at least 1 node failure |
worker nodes | 2 | N/A | Allow tolerance for at least 1 worker node failure, scale up to meet the workload needs |
Cloud provider
The following commands can also be used with clusters configured with a cloud provider to review the instance type and availability zones of each node.
- Kubernetes v1.17 or earlier:
kubectl get nodes -L beta.kubernetes.io/instance-type -L failure-domain.beta.kubernetes.io/zone
- Kubernetes v1.17 or greater:
kubectl get nodes -L node.kubernetes.io/instance-type -L topology.kubernetes.io/zone
These labels may not be available on all cloud providers.
1.4 Load balancer
To provide a consistent endpoint for the Rancher management cluster, a load balancer is highly recommended to ensure the Rancher agents, UI, and API connectivity can effectively reach the Rancher deployment.
The load balancer is configured:
- Within close proximity of the Rancher management cluster to reduce latency
- For high availability, with all Rancher management nodes configured as upstream targets
- With a health check to the following path:
Distribution | Health check path |
k3s | /ping |
RKE | /healthz |
A health check interval is generally recommended at 30 seconds or less
1.5 Proximity and latency
For performance reasons, it is recommended to avoid spreading cluster nodes over long distances and unreliable networks. For example, nodes could be in separate AZs in the same region, the same datacenter, or separate nearby data centres.
This is particularly important for etcd nodes which are sensitive to network latency, the RTT between etcd nodes in the cluster will determine the minimum time to complete a commit .
- Network latency and bandwidth is adequate between locations that the cluster nodes will be provisioned
A tool like mtr
to gather connectivity statistics between locations over a long sample period can be useful to report on the packet loss and latency.
Generally latency between etcd nodes is recommended at 5s or less
1.6 Datastore
It is important to ensure that the chosen datastore is capable of handling requests inline with the workload of the cluster.
Allocation of resources, storage performance, and tuning of the datastore may be needed over time, this could be due to an increase in churn in a cluster, downstream clusters growing in size, or the number of downstream clusters Rancher is managing increases.
Confirm the recommended options are met for the distribution in use:
With an external datastore the general performance requirements include:
- SSD or similar storage providing 1,000 IOPs or greater performance
- Datastore servers are assigned 2 vCPUs and 4GB memory or greater
- A low latency connection to the datastore endpoint from all k3s server nodes
MySQL 5.7 is recommended . If running in a cloud provider, you may wish to utilise a managed database service .
To confirm the storage performance of etcd nodes is capable of handling the workload, a benchmark tool like fio
can be used.
- Nodes with the
etcd
role have SSD or similar storage providing high IOPs and low latency
On large downstream or Rancher environments, tuning etcd may be needed, including adding dedicated disk for etcd.
1.7 CIDR selection
The cluster and service CIDRs cannot be changed once a cluster is provisioned.
For this reason, it is important to future proof by changing the ranges to avoid routing overlaps with other areas of the network and potential cluster IP exhaustion if the defaults are not suitable.
- The default CIDR ranges do not overlap with any area of the network
Network | Default CIDR |
Cluster | 10.42.0.0/16 |
Service | 10.43.0.0/16 |
Reducing the CIDR sizes can lower the number of IPs available and therefore total number of pods and services in the cluster. In a large cluster, the CIDR ranges may need to be increased .
1.8 Authorized cluster endpoint
At times connecting directly to a downstream cluster may be desired, this could be to reduce latency, avoid interruption if Rancher is unavailable, or that a high frequency of external API calls occur, for example, external monitoring, or a CI/CD pipeline.
- Check for any use cases where an authorized cluster endpoint is needed
2. Best Practices
2.1 Installing Rancher
It is highly encouraged to install Rancher on a Kubernetes cluster in an HA configuration .
If starting with small resource requirements, at the very minimum always install on a Kubernetes cluster with a single node, this provides a future path to adding nodes at a later date.
The design of the single node Docker install is for short-lived testing environments, migration from a Docker to a Kubernetes install is not possible.
- Rancher is installed on a Kubernetes cluster, even if that is a single node cluster
2.2 Rancher Resources
The minimum resource requirements for nodes in the Rancher management cluster need to scale to match the number of downstream clusters and nodes; this may change over time and need reviewing as changes occur in the environment.
- Verify that nodes in the Rancher management cluster meet at least the minimum requirements:
Resource Requirements CPU/Memory Rancher v2.4.0 and greater CPU/Memory Rancher v2.4.0 and earlier Network Port requirements
2.3 Chart options
When installing the Rancher helm chart, the default options may not always be the best fit for specific environments.
- The Rancher helm chart is installed with the desired options
-
replicas
- the default number of Rancher replicas (3
) may not suit your cluster, for example, a k3s cluster with 2 x server nodes using areplicas
value of2
will ensure only one Rancher pod is running per node. -
antiAffinity
- the defaultpreferred
scheduling can mean Rancher pods become imbalanced during the lifetime of a cluster, usingrequired
can ensure Rancher is always scheduled on unique nodes
To confirm the options provided on an existing Rancher install with helm v3, the following command can be used helm get values rancher -n cattle-system
2.4 Supported versions
When choosing or maintaining the components for Rancher and Kubernetes clusters the product lifecycle and support matrix can be used to ensure the versions and OS configurations are certified and maintained.
-
- All Rancher and Kubernetes cluster versions are under maintenance and certified
As versions are a moving target, checking the current stable releases and planning for future upgrades on a schedule is recommended.
2.5 Recurring snapshots and backups
It is important to configure snapshots on a recurring schedule and store these externally to the cluster for disaster recovery.
- Recurring snapshots are configured for the distribution in use
Distribution | Configuration |
k3s | Configure snapshots and backups on the external datastore, this can differ depending on the chosen database |
RKE | Configure recurring snapshots of etcd, with an S3 compatible endpoint for off-node copies |
In addition to a recurring schedule, it's important to take one-time snapshots of etcd (RKE) , or datastore (k3s) before and after significant changes.
The Rancher backup operator can also be used on any distribution to backup the related objects that Rancher needs to function, this can be used to migrate Rancher between clusters.
2.6 Provisioning
Provisioning nodes and resources for Rancher and downstream clusters in a repeatable and automated way will greatly improve the supportability of Rancher and Kubernetes. This allows nodes to be replaced in a cluster easily, and new clusters created in a consistent way.
The below points can help prepare the Rancher and Kubernetes environment with integrations and modern approaches to managing resources, such as infrastructure as code, CI/CD, immutable infrastructure, and configuration management:
- Manifests and configuration data are stored in source control, treated as the source of truth for containerized applications
- Automated build, deployment and/or configuration management
The rancher2 terraform provider and pulumi package can be used to manage clusters and resources as code.
2.7 Managing node lifecycle
When making significant planned changes it is important to drain nodes that are being affected to avoid disrupting in-flight connections, such as restarting Docker, patching, shutting down or removing nodes.
For example, the kube-proxy
component manages iptables rules on nodes to manage service endpoints, if a node is suddenly shutdown, stale endpoints and orphaned pods can be left in place for a period of time causing connectivity issues.
In some cases during an unplanned issue, draining can be automated, such as when a node may be terminated, restarted, or shutdown.
- A process is in place to drain before planned disruptive changes are performed on a node
- Where possible, node draining during the shutdown sequence is automated, for example, with a systemd or similar service
3. Operating Kubernetes
3.1 Capacity planning and Monitoring
It is recommended to measure resource usage of all clusters by enabling monitoring in Rancher, or your chosen solution. It is recommended to alert on resource thresholds and events in the cluster.
On supported platforms, using Cluster Autoscaler can be used to ensure the number of nodes is right-sized for the pod workload. Combining this with Horizontal Pod Autoscaler provides both application and infrastructure scaling capabilities.
- Monitoring is enabled for the Rancher and downstream clusters
- Alert notifiers are configured to stay informed if an alarm or event occurs
- A process for adding/removing nodes is established, automated if possible
3.2 Probes
In the defence against service and pod related failures, liveness and readiness probes are very useful; these can be in the form of HTTP requests, commands, or TCP connections.
- Liveness and Readiness probes are configured where necessary
- Probes do not rely on the success of upstream dependencies, only the running application in the pod
3.3 Resources
Assigning resource requests to pods allows the kube-scheduler
to make more informed placement decisions, avoiding the "bin packing" of pods onto nodes and resource contention.
Limits also offer value in the form of a safety net against pods consuming an undesired amount of resources.
In addition to defining requests and limits for pods, it can also be useful to reserve capacity on nodes to prevent allocating resources that may be consumed by the kubelet and other system daemons, like Docker.
- All pods define resource requests and have limits configured where necessary
- Nodes have system and daemon reservations where necessary
When Rancher Monitoring is enabled, the graphs in Grafana can be used to find a baseline of CPU and Memory for resource requests
3.4 OS Limits
Containerized applications can consume high amounts of OS resources, such as open files, connections, processes, filesystem space and inodes.
Often the defaults are adequate; however, establishing a standardized image for all nodes can help establish a baseline for all configuration and tuning.
In general, the below can be used to confirm the OS limits allow for adequate headroom for the workloads
- File descriptor usage:
cat /proc/sys/fs/file-nr
- User ulimits:
ulimit -a
Or, a particular process can be checked:
cat /proc/PID/limits
- Conntrack limits:
cat /proc/sys/net/netfilter/nf_conntrack_max
cat /proc/sys/net/netfilter/nf_conntrack_count
- Filesystem space and inode usage:
df -h
anddf -ih
Requirements for Linux can differ slightly depending on the distribution, refer to the Linux Requirements for more information.
3.5 Log rotation
To prevent large log files from accumulating, and apply a desired retention period it is recommended to rotate OS, pod log files, and configure an external log service to stream logs off the nodes for a longer-term lifecycle and easier searching.
Containers
- Log rotation is configured for the container logs
- An external logging service is configured as needed
The below arguments for the INSTALL_K3S_EXEC
environment variable can be used as an example to rotate container logs:
INSTALL_K3S_EXEC="--kubelet-arg container-log-max-files=5 --kubelet-arg container-log-max-size=100Mi"
- Log rotation is configured for the container logs
- An external logging service is configured as needed
Rotating container logs can be accomplished by configuring logrotate or the /etc/daemon.json
filewith a size and retention configuration.
OS
Rotation of log files on nodes is also important, especially if a long node lifecycle is expected.
3.6 DNS scalability
DNS is a critical service running within the cluster. DNS queries are distributed throughout the cluster, where the availability depends on the accessibility of the CoreDNS pods in the service.
The Nodelocal DNS cache is a redesign on the architecture and is recommended for clusters that may experience high DNS workload or issues.
If a cluster has experienced a DNS issue, or high DNS workload is expected:
-
Check the output of
conntrack -S
on related nodes.
High amounts of the insert_failed
counter can be indicative of a conntrack race condition, Nodelocal DNS cache is recommended to mitigate this.
Status
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020105
- Creation Date: 09-Nov-2021
- Modified Date:09-Nov-2021
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com