Cattle Cluster Agent flapping with runtime error: slice bounds out of range [:3] with capacity 0

This document (000021006) is provided subject to the disclaimer at the end of this document.

Environment

This bug seems to affect all current Rancher versions as of March 2023.

Situation

A downstream cluster flaps between Active and Unavailable. The cattle-cluster-agent logs show errors like the following:
0308 15:04:59.702627 55 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:0, signed:true, code:0x2} (runtime error: slice bounds out of range [:3] with capacity 0)
goroutine 3160 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3cee760, 0xc00bef0660})
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe})
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/runtime/runtime.go:48 +0x75
panic({0x3cee760, 0xc00bef0660})
/usr/lib64/go/1.17/src/runtime/panic.go:1038 +0x215
github.com/rancher/rancher/pkg/catalogv2/helm.decodeHelm3({0x0, 0xc9d32d})
/go/src/github.com/rancher/rancher/pkg/catalogv2/helm/helm3.go:124 +0x1b1
github.com/rancher/rancher/pkg/catalogv2/helm.fromHelm3Data({0x0, 0xc004d60540}, 0x3fc4c34)
/go/src/github.com/rancher/rancher/pkg/catalogv2/helm/helm3.go:23 +0x25
github.com/rancher/rancher/pkg/catalogv2/helm.ToRelease({0x47b3a20, 0xc004d60540}, 0x6c696877206e6564)
/go/src/github.com/rancher/rancher/pkg/catalogv2/helm/release.go:74 +0x3eb
github.com/rancher/rancher/pkg/controllers/dashboard/helm.(*appHandler).OnSecretChange(0xc00baa1950, {0xc005b81080, 0x2d}, 0xc004d60540)
/go/src/github.com/rancher/rancher/pkg/controllers/dashboard/helm/apps.go:170 +0xa5

Resolution

This error seems to be caused by bad Helm release data on the downstream cluster. The first check should be if any releases have no release data stored. The below command will list all Helm release secrets when run against the downstream cluster. If the data column shows 0, that release secret has no release data.
kubectl get secrets -A | grep helm.sh/release.v1 
All secrets with no release data on the downstream cluster need to be deleted to allow cattle-cluster-agent to start properly.
kubectl delete secrets -n <NAMESPACE> <SECRET_NAME>
Once the bad Helm release secrets are removed, cattle-cluster-agent pods should successfully start. If desired, the current cattle-cluster-agent pods in a CrashLoopBackOff can be deleted to speed up this process.

 

Cause

Occasionally, Helm release secrets are improperly stored, causing the release data to not be present. cattle-cluster-agent checks the Helm data everytime it starts, but will fail if there is no release data to check

Additional Information

Reported in GitHub issue 35971: https://github.com/rancher/rancher/issues/35971
Per the GitHub issue, this is scheduled for the 2023-Q2 releases

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021006
  • Creation Date: 08-Mar-2023
  • Modified Date:08-Mar-2023
    • SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center