SUSE Support

Here When You Need Us

RKE2 Snaphots failing due to large configmap

This document (000021272) is provided subject to the disclaimer at the end of this document.

Environment

RKE2 <v1.25.15, <1.26.10, <1.27.7 and <1.28.3


Situation

At some point snapshots may start failing to complete. Viewing the logs in rke2-server.service should show:

level=error msg="failed to save local snapshot data to configmap: ConfigMap \"rke2-etcd-snapshots\" is invalid: []: Too long: must have at most 1048576 bytes"

 

Resolution

This issue has been fixed in v1.28.3 and has been back ported to 1.25.15, 1.26.10 and v1.27.7. 

If an upgrade is not possible the following steps can be taken to manually clean the config map:

 

  • Save copies of etcd snapshots in another folder as a precaution.
  • Reduce the etcd snapshots retention of snapshots on the downstream cluster and disable S3 backups temporarily.
  • Edit  the 'rke2-etcd-snapshots' ConfigMap on 'kube-system' on the downstream cluster and emptied it out of its data (only keeping the manifest metadata): 'kubectl edit ConfigMap -n kube-system rke2-etcd-snapshots'.
  • After saving the edits above, Fleet shoould trigger all of the snapshots it missed. 
  • Change the snapshot schedule to every 5 minutes to allow it to apply its retention settings and clean up the snapshots. This will happen after waiting for the 5-minute period. 
  • Clean the on-demand snapshots since they do not get cleaned automatically by the retention settings. To do this, delete them on the local filesystem of each node. After a few mins Rancher will reconcile the changes, and the old on-demand snapshots will be removed from the UI.
  • Re-enable S3 snapshots and verify if new snapshots were being taken there.
  • Set back the schedule of snapshots to the value it was before before.

Cause

If the number of etcd nodes and snapshot retention count is too high, the rke2-etcd-snapshots configmap will grow too large and eventually the rke2-server process will be unable to save the configmap as it has grown over 1MB.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021272
  • Creation Date: 15-Nov-2023
  • Modified Date:10-Jan-2024
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.