SUSE Support

Here When You Need Us

SUSE HA for HANA cluster node fenced at shutdown, despite of systemd integration

This document (000021046) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise for SAP Applications 15

Situation

It is best practice to stop SAP instances manually in a defined procedure before shutting down the overall system. However, in some cases an automated shutdown might be desireable.

If the whole system is shutdown including HANA and SUSE HA, the node gets fenced. This happens because systemd SAP<sid>_<nr> service´s sapstartsrv is stopped before HANA and systemd prevents the RA from re-starting the sapstartsrv.
This leads to an RA stop failure and node fence.

It looks like this in the system log:
# last reboot -n 1 
reboot   system boot  5.14.21-150400.2 Tue Apr 18 12:34   still running

# grep "2023-04-18T12:3.*rsc_SAPHana.*stop.fail" /var/log/messages
2023-04-18T12:31:26.386765+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: SAP Instance S07-HDB00 stop failed

# grep "2023-04-18T12:3.*rsc_SAPHana.*\[14955\]" /var/log/messages
2023-04-18T12:31:24.725478+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: INFO: RA ==== begin action stop_clone (0.162.1) ====
2023-04-18T12:31:26.359308+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: WARNING: ACT: systemd service SAPS07_00.service is not active, it will be started using systemd
2023-04-18T12:31:26.377772+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: error during start of systemd unit SAPS07_00.service!
2023-04-18T12:31:26.386765+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: SAP Instance S07-HDB00 stop failed:
2023-04-18T12:31:26.397922+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: INFO: RA ==== end action stop_clone with rc=1 (0.162.1) (3s)====
2023-04-18T12:31:26.404212+02:00 pizbuin02 pacemaker-execd[11397]:  notice: rsc_SAPHana_S07_HDB00_stop_0[14955] error output [ Error: NIECONN_BROKEN (No such file or directory), NiRawRead failed in plugin_sapfrecv() ]
2023-04-18T12:31:26.404282+02:00 pizbuin02 pacemaker-execd[11397]:  notice: rsc_SAPHana_S07_HDB00_stop_0[14955] error output [ Error: NIECONN_REFUSED (Connection refused), NiRawConnect failed in plugin_fopen() ]

# grep "2023-04-18T12:3.*systemd.*SAP.*service" /var/log/messages 
2023-04-18T12:31:21.437203+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14272]: INFO: ACT: systemd service SAPS07_00.service is active
2023-04-18T12:31:26.359308+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: WARNING: ACT: systemd service SAPS07_00.service is not active, it will be started using systemd
2023-04-18T12:31:26.371081+02:00 pizbuin02 systemd[1]: Requested transaction contradicts existing jobs: Transaction for SAPS07_00.service/start is destructive (cryptsetup.target has 'stop' job queued, but 'start' is included in transaction).
2023-04-18T12:31:26.377772+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: error during start of systemd unit SAPS07_00.service!

Resolution

In case systemd-style init is used for the HANA database, it might be
desired to have the SAP instance service stopping after pacemaker at
system shutdown. A drop-in file might help.
Example SID is S07, instance number is 00.

1. Check the HANA database instance´s systemd service:
---
# systemctl list-unit-files | grep -i sap
...
# systemd-cgls -u SAP.slice
...
---

2. Create and show pacemaker service drop-in file that defines the dependency:
---
# mkdir -p /etc/systemd/system/pacemaker.service.d/
# cat <<EOF >/etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
[Unit]
Description=pacemaker needs SAP instance service
Documentation=man:SAPHanaSR_basic_cluster(7)
Wants=SAPS07_00.service
After=SAPS07_00.service
EOF
# cat /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
...
---

3. Activate and check pacemaker dependency to SAP instance service:
---
# systemctl daemon-reload
# systemctl show pacemaker.service | grep SAPS07_00
Wants=SAPS07_00.service resource-agents-deps.target dbus.service
After=system.slice network.target corosync.service resource-agents-deps.target basic.target rsyslog.service SAPS07_00.service systemd-journald.socket sysinit.target time-sync.target dbus.service sbd.service
# systemd-delta | grep pacemaker
[EXTENDED]   /usr/lib/systemd/system/pacemaker.service -> /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
---

Cause

If the whole system is shutdown including HANA and SUSE HA, the node gets fenced. This happens because systemd SAP<sid>_<nr> service´s sapstartsrv is stopped before HANA and systemd prevents the RA from re-starting the sapstartsrv.
This leads to an RA stop failure and node fence.

Additional Information

See also:

- Manual pages
  systemctl(1), systemd.unit(5), SAPHanaSR_basic_cluster(7)
- Blog article
  https://www.suse.com/c/handover-for-the-next-round-sap-on-suse-cluster-and-systemd-native-integration/

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021046
  • Creation Date: 18-Apr-2023
  • Modified Date:19-Apr-2023
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.