Automate Pacemaker Cluster Failure Message clean up
This document (000021926) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15 High Availability all Service Packs
Situation
A cluster has failed resource messages that do not get cleaned up automatically and requires manual intervention. The desired behavior is to have a failed message cleared after a set number of seconds.
Resolution
To have a SLES High Availability pacemaker cluster clear resource and monitor failure messages after a set time, in seconds, run the command below to set a default failure timeout.
crm configure rsc_defaults failure-timeout=86400
This command will allow the cluster to remove the failed resource message after 86400 seconds, or 24 hours.
This is a global setting, if a resource has a failure-timeout explicitly set than it will abide by that setting and not the global value.
Cause
The default behavior for SLES High Availability clusters is to have no failure-timeout and require manual intervention to clear messages.
Additional Information
Additional information about the failure-timeout setting is found in our documentation here:
https://documentation.suse.com/sle-ha/15-SP7/html/SLE-HA-all/sec-ha-config-basics-constraints.html#sec-ha-config-basics-failover
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021926
- Creation Date: 21-Jul-2025
- Modified Date:24-Jul-2025
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com