HANA cluster failures due to full /tmp filesystem

This document (000021331) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications 15 SP5
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP3
SUSE Linux Enterprise Server for SAP Applications 15 SP2
SUSE Linux Enterprise Server for SAP Applications 15 SP1
SUSE Linux Enterprise Server for SAP Applications 12 SP5

Situation

The primary node in the HANA cluster experienced a resource failure, resulting in an instance stop and subsequent failover to the secondary node. Analysis of the logs from the HANA cluster resource agents showed that all HANA_CALL operations failed, leading to the HANA instance being marked with a status of HANA_STATE_DEFECT, as indicated by the logs:

SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: INFO: ACT: hdbnsutil not answering - using global.ini as fallback - srmode=
SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: ERROR: ACT: check_for_primary:  we didn't expect srmode to be: DUMP: <00000000  0a  |.|#01200000001>
SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: WARNING: ACT: sht_monitor_clone: HANA_STATE_DEFECT (primary/secondary state could not be detected at this point of time)


The cluster responded effectively by stopping the HANA instance on the primary node and successfully failing over to the secondary node, which restored the functionality of the HANA cluster. However, after this failover process, the original primary instance node was unable to start. Further investigation into this issue revealed that the /tmp filesystem on the primary node was completely full, with 100% usage recorded at the time of the incident. 

Resolution

The '/tmp' filesystem plays a critical role in the operation of both the HANA instance and the overall cluster stack. In scenarios where issues arise due to this filesystem being full, resolving them typically involves either cleaning up or expanding the /tmp filesystem. Once free space is available again in /tmp, the HANA cluster can be restarted and is expected to resume normal operations.

To address the issue of '/tmp' filesystem usage, a maintenance update has been released for HANA cluster resource agents for both Scale-Up and Scale-Out configurations. This fix avoids the use of the '/tmp' filesystem, ensuring that the SAP HANA resource agents remain operational even when '/tmp' is full. It is recommended to update the SAPHanaSR or SAPHanaSR-ScaleOut package to the latest version, or at least to the versions specified below:

SUSE versionSAPHanaSR versionSAPHanaSR-ScaleOut version
SLES12 SP5 for SAPSAPHanaSR-0.162.2-3.32.2SAPHanaSR-ScaleOut-0.185.0-3.32.1
SLES15 (SP1, SP2, SP3, SP4, SP5) for SAP  SAPHanaSR-0.162.2-150000.4.34.1SAPHanaSR-ScaleOut-0.185.1-150000.39.1

Cause

The root cause of the problem was the full utilization of the /tmp filesystem, leading to the failures in all HANA_CALL operations, initiated by the HANA cluster resource agents, resulting in the HANA instance being marked with a status of HANA_STATE_DEFECT. On such scenario the cluster triggers a failover to the secondary node, provided that the HANA system replication is in 'SOK' status. To recover from this issue and restart the original primary instance, it is necessary to clear some space in the /tmp filesystem.

Status

Reported to Engineering

Additional Information

Note: Beginning with the SAPHanaSR-0.162.2 resource agent version, only SAP HANA releases that utilize Python 3 are supported in a Pacemaker HA cluster. SAP HANA 2.0 SPS05 revision 059 and subsequent versions provide Python 3 and the HA/DR provider hook method srConnectionChanged() with multi-target aware parameters. For further details, refer to the following TID:
https://www.suse.com/support/kb/doc/?id=000021361


 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021331
  • Creation Date: 19-Feb-2024
  • Modified Date:19-Feb-2024
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center