Cluster or node provisioning stuck with node taint "node.cloudprovider.kubernetes.io/uninitialized"
This document (000022013) is provided subject to the disclaimer at the end of this document.
Environment
- A Rancher-provisioned RKE2 cluster with a Cloud Provider configured
Situation
During the provisioning of RKE2 clusters, the machines are stuck with the status 'waiting for cluster agent'. The rke2-server service is running and pods are being created, but a number of them are in a pending state due to scheduling errors.
Example: The vSphere CPI (Cloud Provider Interface) is unable to locate the virtual machine in vSphere, which results in the node being uninitialised. In the downstream cluster, the cloud controller manager pod logs indicate this error locating the virtual machine:
search.go:186] Did not find node node1.example.com in vc=example.com and datacenter=datacentre1
nodemanager.go:160] WhichVCandDCByNodeID failed using VM name. Err: No VM found
nodemanager.go:205] shakeOutNodeIDLookup failed. Err=No VM found
node_controller.go:233] error syncing 'node1.example.com: failed to get instance metadata for node node1.example.com: failed to get instance ID from cloud provider: No VM found, requeuing
node_controller.go:244] "Unhandled Error" err="error syncing 'node1.example.com': failed to get instance metadata for node node1.example.com: failed to get instance ID from cloud provider: No VM found, requeuing"
node_controller.go:271] Update 1 nodes status took 57.912µs.
Resolution
In order to resolve this issue, validate and correct the Cloud Provider configuration for the affected cluster, as required.
In the example above, with the vSphere Cloud Provider, you would need to check the Add-on: vSphere CPI configuration for the cluster, to ensure the correct vCenter and Data Center was configured, as well as validating that VMware Tools was running successfully in the virtual machine, and its hostname was correctly configured.
Cause
The node.cloudprovider.kubernetes.io/uninitialized taint is added to new nodes in clusters where a Cloud Provider is configured. This taint is removed by the CPI once it successfully queries and sets the spec.providerID on the node. If there is a problem with the CPI configuration and this cannot be successfully queried, then the node will remain in this state and fail to complete provisioning. If this is the first node in the cluster then the cluster itself will be stuck in provisioning.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000022013
- Creation Date: 26-Aug-2025
- Modified Date:19-Sep-2025
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com