HANA fail-over and secondary recovery operations fail due to excessive os.system() execution times

This document (000020835) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12

Situation

Usage of SAPHanaMultiTarget python hook script appears to be leading to a system failure, as recovery timeouts were exceeded and cluster failover operations blocked due to the hook script still being in the process of executing (the hook script is used to recover from a situation where there is an isolated 'primary' HANA server with no accessible 'secondary'). 

 

Resolution

In order to reduce the impact of the issue, the hook script was optimized by reducing the number of os.system() calls from 3 to 1 (the actual duration of the fork() call does not change, the optimization is the reduction of the number of them used to conduct the hook script's business).

The script optimization first appeared in:

SUSE Linux Enterprise Server for SAP Applications 15.x
SAPHanaSR-ScaleOut-0.181.0-30.1.noarch.rpm  March 2022

SUSE Linux Enterprise Server for SAP Applications 12.x
SAPHanaSR-ScaleOut-0.181.0-3.26.1.noarch.rpm  March 2022

Cause

Coredump analysis showed long wait durations (tens of seconds) for a system clone of a running process (clone is a major system operation in the glibc fork() function. Normally a process should be cloned in milliseconds or less).

Using kernel trace data from the system clone enter and exit operations, it was possible to match the 'long clones' to the exact number of os.system() calls that the hook script made. Also, the elapsed time between the long clones in the system trace data exactly matched the elapsed time between the hook script's logged uses of os.system().

From this it was deduced that any long duration of the hook script was caused only in HANA python when using os.system() and only due to the nature of the HANA process, it's size, number of executing elements and business. This was reported to SAP with supporting data.

It's important to realize there is no bug in a SUSE element here. The long fork() duration is not normal and is a product of the way HANA python os.system() calls are being made. 

Additional Information

The python hook script runs external programs via the python os.system() function and the python interpreter used to run it is part of the SAP HANA database system and effectively runs as a thread in a HANA process. The os.system() python function uses fork() as part of executing the external program.

It was also found that use of the HANA feature to perform an "RTE Dump", can introduce long recovery wait times and timeouts. 

RTE dump operates with a global lock on the HANA database meaning the server can't be stopped safely (e.g. by something like a cluster failover). A secondary server can't rejoin the primary until this process has completed. SAP advises that the use of the RTE Dump feature on a production system is not recommended.

It is suspected that HANA external authentication (e.g. via ldap) can also add to delays of the os.system() call. Use of a local SAP HANA administrator user should greatly reduce these delays.

Note that any other scripts using the same HANA os.system() calls are also liable to suffer the same performance issues until the issue is fixed by SAP.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020835
  • Creation Date: 02-Nov-2022
  • Modified Date:02-Nov-2022
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center