SUSE Virtualization Node Failure Strategy

Share
Share

SUSE Virtualization is a cloud-native hyperconverged infrastructure platform solution optimized for running virtual machine and container workloads in the data center, multi-cloud and edge environments.

This article would focus on the node failure strategy of SUSE Virtualization. You can find how Kubernetes handles node failures and how SUSE Virtualization leverages these capabilities to ensure high availability and resilience of workloads.

Kubernetes Node Failure Handling


Before diving into SUSE Virtualization’s specific strategies, it’s important to understand how Kubernetes manages node failures.

First, we need to understand the common conditions that Kubernetes monitors for nodes:

  • Ready: Indicates whether the node is healthy and able to accept pods.
  • NotReady: Indicates that the node is lacking resources or kubelet is not functioning properly.
  • Unknown: Indicates that the node’s status is not known, often due to network issues.

So, how does Kubernetes transition nodes between Ready and Unknown states? From this documentation, Kubernetes uses a heartbeat mechanism to monitor the health of nodes. There are three key parameters involved in this process:

  • node-status-update-frequency: The duration that the kubelet waits before sending a status update to the node controller. Default is 10 seconds.
  • node-monitor-period: The frequency at which the node controller checks the status of nodes. Default is 5 seconds
  • node-monitor-grace-period: The time that the node controller waits before marking a node as NotReady or Unknown after missing heartbeats. Default is 40 seconds.

node-status-update-frequency determines how often the kubelet sends status updates to the node controller. The node controller checks the status of nodes every node-monitor-period. If a node fails to send a heartbeat within the node-monitor-grace-period, the node controller marks it as NotReady or Unknown.

So, by default, if a node misses 4 consecutive heartbeats (40 seconds), it will be marked as NotReady or Unknown.

SUSE Virtualization Node Failure Strategy


SUSE Virtualization provides the cloud-native capabilities for the virtual machine workloads. It means that SUSE Virtualization needs to handle node failures effectively to ensure high availability and resilience of virtual machine workloads.

SUSE Virtualization leverages Kubevirt as the virtualization layer to manage virtual machines on Kubernetes. The key point for the node failure strategy in SUSE Virtualization is to ensure that virtual machines are automatically rescheduled to healthy nodes when a node failure is detected.

So, we need to ensure two things:

  1. How Kubevirt detects node failures.
  2. How SUSE Virtualization reschedules virtual machines.

For this first point, SUSE Virtualization relies on the same Kubernetes node failure detection mechanism described above: when a node is marked as NotReady or Unknown, Kubevirt treats it as failed and initiates its node failure handling logic.

The node failure usually means the non-graceful shutdown. In this case, even though Kubevirt triggers the migration, the migration usually fails with an orphan volumeattachment. After k8s v1.28, the taint node.kubernetes.io/out-of-service could be used to mark the node as out-of-service and the corresponding orphan resource (e.g. volumeattachment) will be cleaned up automatically.

So, for the second point, SUSE Virtualization will automatically reschedule the virtual machines and add the node.kubernetes.io/out-of-service taint to the failed node. This ensures that any orphan resources are cleaned up, and the virtual machines can be successfully rescheduled to healthy nodes.

Demo

  1. Create a SUSE Virtualization cluster with multiple nodes. (Use two nodes cluster for demo)
# kubectl get nodes
NAME               STATUS   ROLES                AGE   VERSION
harvester-node-0   Ready    control-plane,etcd   37h   v1.34.2+rke2r1
harvester-node-1   Ready    <none>               37h   v1.34.2+rke2r1
  1. Create a virtual machine on harvester-node-1.
# kubectl get vmi
NAME        AGE   PHASE     IP           NODENAME           READY
demo-vm01   52s   Running   10.52.1.37   harvester-node-1   True
  1. Simulate a node failure by shutting down harvester-node-1 or disconnecting it from the network.
# date -u +"%Y-%m-%d %H:%M:%S %Z";ip link set dev mgmt-br down
2025-12-28 21:04:26 UTC
  1. Check the node condition
  - lastHeartbeatTime: "2025-12-28T21:01:47Z"
    lastTransitionTime: "2025-12-28T21:05:12Z"
    message: Kubelet stopped posting node status.
    reason: NodeStatusUnknown
    status: Unknown
    type: Ready

We can see that after around 40 seconds, the node is marked as Unknown. 2025-12-28T21:05:12Z – 2025-12-28 21:04:26Z = 46 seconds

  1. Check the migration was triggered
  taints:
  - effect: NoSchedule
    key: kubevirt.io/drain
  1. Check the Virtual Machine was rescheduled to harvester-node-0
# kubectl get vmi
NAME        AGE     PHASE     IP           NODENAME           READY
demo-vm01   2m11s   Running   10.52.0.88   harvester-node-0   True
  1. Check the out-of-service taint was added to harvester-node-1
  taints:
  - effect: NoExecute
    key: node.kubernetes.io/out-of-service

Conclusion


In summary, SUSE Virtualization effectively leverages Kubernetes’ node failure detection mechanisms and Kubevirt’s virtualization capabilities to ensure high availability and resilience of virtual machine workloads. By automatically rescheduling virtual machines to healthy nodes and cleaning up orphan resources, SUSE Virtualization provides a robust solution for handling node failures in a cloud-native environment.

Here are extra configurations that you may consider to tune the node failure detection sensitivity on your SUSE Virtualization cluster:

VMForceResetPolicy:
  Enabled: true
  Period: 15  # in seconds, default is 15 seconds
  VMMigrationTimeout: 180  # in seconds, default is 180 seconds
  • Period: Indicates how long to wait before adding the kubevirt.io/drain taint to the failed nodes. It will trigger the migration.
  • VMMigrationTimeout: Indicates how long to wait before adding the node.kubernetes.io/out-of-service taint to the failed nodes. It will force cleanup the orphan resources.

With these configurations, you can adjust the sensitivity of node failure detection and the timing of virtual machine rescheduling to better suit your specific use case and workload requirements.

By default, the migration would be done with 45 seconds (Kubernetes default detection time) + 15 seconds (Period) + 180 seconds (VMMigrationTimeout) + the time taken for migration itself. So, you might need 4 ~ 5 minutes to complete the whole process.

To decrease this time, you can consider adjusting the Period and VMMigrationTimeout values first. If you need more sensitivity, you can also consider adjusting the Kubernetes node failure detection parameters as mentioned in the first section.

Appendix


Harvester CSI Driver which is used to manage the volume related operations in the downstream cluster on SUSE Virtualization also implements its own node failure strategy. It will help the downstream cluster have better resilience when the corresponding nodes of the downstream cluster fail.

Share
(Visited 1 times, 1 visits today)
Avatar photo
289 views
Vicente Cheng Experienced in distributed (storage) systems such as Ceph, Longhorn. Mainly focus on Kubernetes storage functionality now. Work as the Engineering Manager to deliver an innovative, stable system design.