Tuning for nodes with a high number of CPUs allocated

This document (000020731) is provided subject to the disclaimer at the end of this document.

Environment

An RKE cluster built by Rancher, or the RKE CLI

Situation

Some components in a Kubernetes cluster apply a linear scaling mechanism, often based on the number of CPU cores allocated.

For nodes that have a high number CPU cores allocated the defaults can create a steep scaling curve and can introduce issues.

Two components provided with RKE that scale in this way are kube-proxy and ingress-nginx. However, additional workloads (like nginx) may be deployed to the cluster and also need consideration.

Adjusting the scaling for these components can avoid these issues.

Resolution

kube-proxy

As explained in the Kubernetes GitHub issue here, the default scaling of the conntrack-max setting allocates 32K of memory per CPU core.

This can manifest in the below events in OS logs:

kernel: nf_conntrack: falling back to vmalloc.

This static default can present issues with contiguous memory being allocated for the conntrack table, or reach unnecessary levels of space allocated. When observed frequently, this has been associated with network instability.

As a starting point, the suggestion is to halve this amount for a cluster with affected nodes, this can be done by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI.

kubeproxy:
  extra_args:
    conntrack-max-per-core: '16384'

ingress-nginx

A common configuration of nginx is to set the worker_processes to auto. When set, nginx will scale the worker_processes to the number of CPU cores on the node. This can result in high numbers of PIDs and consume open files with the threads consumed (number of cores * 32 (default thread_pool size)).
* http://nginx.org/en/docs/ngx_core_module.html#worker_processes
* http://nginx.org/en/docs/ngx_core_module.html#thread_pool

This can be adjusted by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI. An example of 8 worker_processes is used below. For a nodes that may process a high amount of ingress traffic, you may wish to use a higher number.

ingress:
  provider: nginx
  options:
    worker-processes: "8"

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.