RKE2 upgrades causing dataloss to application deployed using helm chart

This document (000020726) is provided subject to the disclaimer at the end of this document.

Environment

  • On RKE2 versions below 1.21.12

Situation

We see applications managed by Helm being uninstalled and re-installed when they are in a broken state during RKE2 upgrades. This behavior is noticed on all RKE2 upgrades below 1.21.12

Resolution

Recent releases of RKE2 allow customization of the Helm job behavior to reduce the probability of data loss when deploying stateful applications. Users may:

  • Set the failurePolicy: abort on the HelmChart spec to tell Helm to leave the release in a failed state if the upgrade does not succeed.
  • Set the helmcharts.helm.cattle.io/unmanaged annotation on the HelmChart resource to prevent the Helm controller from acting on the chart at all, so that the HelmChart resource may be removed from the cluster without triggering uninstallation of the Helm Release.

If you are currently experiencing data loss during upgrades, it may be necessary to perform a manual upgrade of the RKE2 cluster, and coordinate the upgrade with changes to the HelmChart manifests to take advantage of the new features. However, before performing upgrades, you need to ensure that the following conditions are met.

NOTE: If you are not confident in following these steps, please open a ticket with the Rancher Support team to involve the engineering team for further assistance.

  1. Stop the rke2-server service on all server nodes.
  2. Upgrade the RKE2 binary or package to the latest patch release available for your current Kubernetes minor version.
  3. Update the affected manifests to add the new fields as necessary to obtain the desired behavior, on any nodes where the manifests are present. If no nodes contain the manifests, pick one node to deploy the manifests and place them on disk so that they are applied immediately during system startup. Details of the fields are explained below.
  4. Start the rke2-server service on all server nodes.

New Fields:

  • helmcharts.helm.cattle.io/unmanaged annotation on the HelmChart Custom Resource HelmChart resources with this annotation present will not be processed by the Helm controller. Add this annotation if you plan to remove the HelmCharts resources and begin managing the application via another method.
  • spec.failurePolicy on the HelmChart and HelmChartConfig Custom Resource HelmCharts where the HelmChart or corresponding HelmChartConfig set the failurePolicy field to abort will leave the Helm release in a failed state. The administrator is expected to manually assess the failure and restore the release to a functional state, using commonly available Helm CLI tools.
  • spec.repoCA on the HelmChart Custom Resource. This new field allows for use of a private CA on the Helm repository. Use this when hosting charts on a server that does not have a public CA Certificate in order to avoid certificate errors when installing or upgrading the chart.

Cause

RKE2 upgrades packaged components using bundled HelmChart manifests. These resources trigger Jobs that wrap the Helm CLI tool. As all packaged components must be upgraded to ensure a functional system, if any Helm Releases are stuck in an invalid state (Failed, Pending, etc) at the time of the upgrade, those releases are uninstalled and reinstalled to reset the system to a known-good state.

If user-provided HelmChart manifests are used to deploy stateful applications where uninstallation of the Helm chart may cause data loss, this behavior may not be desired. For example, when Longhorn is deployed using a HelmChart manifest, an uninstall of the release will also delete all the Longhorn Custom Resources, potentially causing data loss. The actual volume content is not deleted, but Longhorn will lose the data mapping the content to Persistent Volumes.

Status

Reported to Engineering

Additional Information

https://jira.suse.com/browse/SURE-3911

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020726
  • Creation Date: 16-Aug-2022
  • Modified Date:06-Sep-2022
    • SUSE Rancher Longhorn

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center