SAP Instances failed stop on shutdown (PACEMAKER, SYSTEMD, SAP)

This document (7022671) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 12

Situation

This describes a deprecated advanced tuning. Please do not apply or use this, if you do not understand the consequences, in that case ask your Linux System Administrator for help.

This document was supposed to provide a workaround for issues between systemd and a LSB standard init script. This will be completely obsolete in the future as a working systemd service for SAP Instances will be available for SLES 15.

In a Linux with systemd any application call with

    su - <somenameofsomeuser>
   
will result in a move into a user slice. This is especially true in case of SAP Instances and Databases handled by the pacemaker cluster service.

The resource agent will invoke a command with

    su - <SID>adm
   
and this will lead to a user.slice visible in systemd-cgls that looks like

| `-user-1003.slice
|   |-session-c24.scope
|   | |-7148 /usr/sap/HA1/ASCS00/exe/sapstartsrv pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as -D -u ha1adm
|   | |-7311 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
|   | |-7321 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
|   | |-7336 ms.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
|   | `-7337 en.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as


which in general poses no problem or issue.

There might be an issue if an Administrator forgets about the Cluster and the resources and issues a

    shutdown
   
or

    reboot
   
on the system. As the systemd will only allow for 90 seconds for user slices and there might be other dependencies, these kind of system shutdown or reboot frequently leads to a "failed stop" of some SAP Resource. Or the killing of a big SAP Resource.

One indication of this would be, the Admin issues a "shutdown" but the machine unexpectedly "reboots". If one would look into the log files one might actually see that the node was fenced by the cluster because of a "failed stop".

Please keep in mind, that this is an issue that stems from the Admin committing this command without taking his system into consideration. With a proper sequence like for example

    systemctl stop pacemaker
   
    shutdown -h now

   
one will never encounter this issue.

Resolution

As systemd session setup will move any application started with

    su - <somenameofsomeuser>
   
this can be bypassed by modifying the su part of the pam stack

add a new su-session file

cp /etc/pam.d/common-session /etc/pam.d/su-session

common-session normally looks like

session required        pam_limits.so  
session required        pam_unix.so     try_first_pass
session optional        pam_umask.so   
session optional        pam_systemd.so
session optional        pam_gnome_keyring.so    auto_start only_if=gdm,gdm-password,lxdm,lightdm
session optional        pam_env.so     


add a line in the su-session with

session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/SAPUsers

making it into

session required        pam_limits.so  
session required        pam_unix.so     try_first_pass
session optional        pam_umask.so   
session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/SAPUsers
session optional        pam_systemd.so
session optional        pam_gnome_keyring.so    auto_start only_if=gdm,gdm-password,lxdm,lightdm
session optional        pam_env.so     


and adding a file

/etc/SAPUsers

which contains the names of the SAP Admin Users, in this example, SID is HA1 these would be

ardmore:~ # cat /etc/SAPUsers
ha1adm
sapadm



with the su-session file in place, modify the

/etc/pam.d/su

from the default

#%PAM-1.0
auth     sufficient     pam_rootok.so
auth     include        common-auth
account  sufficient     pam_rootok.so
account  include        common-account
password include        common-password
session  include        common-session
session  optional       pam_xauth.so


to

#%PAM-1.0
auth     sufficient     pam_rootok.so
auth     include        common-auth
account  sufficient     pam_rootok.so
account  include        common-account
password include        common-password
session  include        su-session
session  optional       pam_xauth.so


This new entry works as follows. During a session setup called by su it will check whether the Username provided is in the file /etc/SAPUsers and if yes it will do "success=1", meaning it will

    SKIP
   
the next one (ONE , 1=one) line in the pam stack, which is in this case pam_systemd.so and by this bypassing the user.slice creation of systemd

The result is that the SAP Instance will now stay in the pacemaker system.slice

`-system.slice
  |-pacemaker.service
  | |-2196 /usr/sbin/pacemakerd -f
  | |-2198 /usr/lib64/pacemaker/cib
  | |-2199 /usr/lib64/pacemaker/stonithd
  | |-2200 /usr/lib64/pacemaker/lrmd
  | |-2201 /usr/lib64/pacemaker/attrd
  | |-2202 /usr/lib64/pacemaker/pengine
  | |-2203 /usr/lib64/pacemaker/crmd
  | |-4125 /usr/sap/HA1/ASCS00/exe/sapstartsrv pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as -D -u ha1adm
  | |-4296 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
  | |-4311 ms.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
  | `-4312 en.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as


There is no need to worry about the security of the /etc/SAPUsers file as it is only a list.

Generally this approach could be used on any non-systemd aware Resource inside or outside of the cluster.

Please keep in mind that there is a TasksMax Limit for Slices, which could be hit in case too many applications end up in the pacemaker system.slice, so increasing

    DefaultTasksMax=

in /etc/systemd/system.conf might be advisable.
 

Cause

Jan 5, 2021 - GMARCROFT - Clarified where to find the user-.slice information.
 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7022671
  • Creation Date: 20-Feb-2018
  • Modified Date:31-Mar-2022
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center