Logs about an incident on a Pacemaker Cluster are lost because of the log file turn-over policy

This document (000020390) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12
 

Situation

An incident happens on a Pacemaker Cluster, but the system's log entries point to a pengine log file that was overwritten by the cluster log's turn-over policy, hence the information in it do not match with the date/time of the incident, so no information available to perform a proper analysis.

Resolution

Include the directory /var/lib/pacemaker/pengine containing all the pengine log files into your Daily Backup plan/policy.

And also add in your incident procedures and/or disaster recovery procedures, the creation of the cluster hb_report immediately after experiencing a problem or an incident with the cluster, the hb_report will collect the pengine log files and other information about the cluster. Please do not let to pass many hours or days between the incident and the creation of the hb_report.

For the creation of the cluster hb_report, check the TID 000017501

Cause

The resource agent uses the cluster to store information, and if that information changes, then the Pacemaker takes it as the cluster configuration has changed, and writes a new pengine log file.

For the SAP HANA related use cases, especially for the Scale-Out scenario, the cluster uses the attribute to store information about the HANA, it updates every few seconds and writes an entry in the logs, this leads to a very fast turn-over of the logs.

Additional Information

The following options specify how many pe* files should be kept:

 

        pe-error-series-max=
        pe-warn-series-max=
        pe-input-series-max=

 

They are to be added to the "property cib-bootstrap-options:" section of the Cluster Information Base (cib) using the "crm configure edit" command.

Allowed are integer values, the value "-1" will store files unlimited and probably create an out of disk space condition at some point. When looking for values, please monitor how many files are written per day and calculate how many files should be kept for how long. This is an example from a test system:

 

hana01:/var/lib/pacemaker/pengine # ls -l * | grep "Sep 30"| wc -l
3511

 

This is the total value of files created on a given day, including input, warn and error files. 
The majority of the files created will be state changes (pe-input*bz2), if the goal is to keep files for e.g. two weeks, the following values might be considered:

 

        pe-error-series-max="-1"
        pe-warn-series-max="5000"
        pe-input-series-max="50000"

 

Once the maximum number has been reached, pacemaker will start to overwrite existing log files. If /var/lib/pacemaker/pengine is backed up every day, the numbers might be adjusted in a way, that the backup contains the most recent changes only. 

 

For more information see: 
SLE 12 SP5 based pacemaker deployment:
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_available_cluster_options.html


SLE 15 based pacemaker deployment:
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/options.html#cluster-options
man 7 pacemaker-schedulerd

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020390
  • Creation Date: 17-Sep-2021
  • Modified Date:04-Oct-2021
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center