SUSE Support

Here When You Need Us

Run Nvidia GPU workloads on Kubernetes RKE2 cluster

This document (000021962) is provided subject to the disclaimer at the end of this document.

Environment

RKE2: v1.32.5+rke2r1

Node OS: Ubuntu 24.04

Platform: Proxmox

GPU: PCI passthrough

 

 


Situation

Installing NVIDIA GPU Operator on RKE2 Clusters

There are two supported methods for installing the NVIDIA GPU Operator on an RKE2 cluster, depending on the underlying operating system:

1. RHEL & Ubuntu (Container-Native Approach)

This is fully supported method by NVIDIA. It allows for a truly container-native deployment, where all required components—including the GPU driver, NVIDIA Container Toolkit-are managed and run as containers directly on the cluster. No manual pre-installation on the host is needed.

2. Other Linux Distributions (Manual Nvidia Driver and Container Toolkit Installation)

For other Linux distributions, you must manually install the following on each node before deploying the GPU Operator:

  • NVIDIA GPU Driver

  • NVIDIA Container Toolkit

Once the prerequisites are in place, you can install the GPU Operator to manage the GPUs in the cluster.

⚠️ Note: Ensure that the versions of the NVIDIA GPU Driver, Container Toolkit, and GPU Operator are compatible with each other.

Also, as Nvidia Drivers are not signed by default, it is likely that secure boot must be disabled on any VMs used.

Resolution

1. RHEL & Ubuntu (Container-Native Approach)

  • Prepare the namespace and helm repository
kubectl create ns gpu-operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
helm repo update
  • Install the Nvidia GPU Operator
helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --version=v25.3.2 \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia \
    --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
    --set-string toolkit.env[3].value=true

⚠️ Note the version used in the --version flag was used at the time of writing and should be reviewed before installing

2. Other Linux Distributions (Manual Nvidia Driver and Container Toolkit Installation)

  • Install the Nvidia GPU Driver
sudo apt install nvidia-driver-570
  • Install the Nvidia Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
   sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
   sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
sudo apt-get install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

sudo nvidia-ctk runtime configure --runtime=containerd

⚠️ Note the version used in the NVIDIA_CONTAINER_TOOLKIT_VERSION variable was used at the time of writing and should be reviewed before installing

  • Prepare the namespace and helm repository
kubectl create ns gpu-operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
helm repo update
  • Install the Nvidia GPU Operator on RKE2
helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --version=v25.3.2 \
    --set driver.enabled=false \
    --set toolkit.enabled=false \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia \
    --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
    --set-string toolkit.env[3].value=true

⚠️ Note the version used in the --version flag was used at the time of writing and should be reviewed before installing

Additional Information

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021962
  • Creation Date: 06-Aug-2025
  • Modified Date:10-Aug-2025
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.