SAP HANA Cockpit with SUSE HA integration greatly improves data integrity
The new SAPHanaSR takeover blocker improves the SUSE HA integration with SAP administration solutions like SAP HANA Cockpit. This solution closes the gap between the native SAP tools and the SUSE SAPHanaSR cluster solution. One of the most critical administration mistakes is to run a manual takeover on the secondary SAP HANA instance without taking the SUSE cluster into account. Using the takeover blocker SAP HANA is now able to differ between a cluster requested and a manually started takeover. Manual takeover requests are blocked during normal cluster operation. This guarantees the data integrity. This blogs describes how to set up the takeover blocker and how to do a manual takeover using the correct SUSE cluster maintenance procedure.
How can SAP HANA differ between cluster requested and manual takeover requests?
SAP HANA has so called HA/DR providers. These are python method calls which are used by SAP HANA if any defined event is coming up. The evens for takeover are rising before and after a takeover. The matching methods are named preTakeover() and postTakeover(). SUSE provides the new hook script susTkOver.py. The script susTkOver.py interacts with the cluster to figure out, if the request is an allowed or forbidden takeover request.
What are the requirements for SAP HANA integration using susTkOver.py?
You need the newest update package for SAPHanaSR. The minimum version covering susTkOver.py is 0.160.1. At the SAP HANA side you should have a version newer than 2.0 SPS4. If you need the optimized error message, which tells you why the takeover has been blocked, you need SAP HANA version 2.0 SPS6. susTkOver will be available for SLES for SAP 15 SP4 and later also for SP3. At the moment of writing this blog the susTkOver.py is only available for SAP HANA SR scale-up.
Which steps are needed to implement the takeover blocker?
Add susTkOver to the SAP HANA global.ini
Use the hook from the SAPHanaSR package /usr/share/SAPHanaSR/susTkOver.py. The hook must be configured on all SAP HANA cluster nodes. In global.ini, the section [ha_dr_provider_sustkover] needs to be created. The optional section [trace] needs to be adapted. Refer to the man page susTkOver.py(7) for details on this HA/DR provider hook script.
[ha_dr_provider_sustkover] provider = susTkOver path = /usr/share/SAPHanaSR/ execution_order = 2 [trace] ha_dr_sustkover = info ...
Use methods documented by SAP to change the global.ini. Never edit global.ini manually, if SAP HANA is started.
Allowing <sid>adm to access the SUSE cluster
SAP HANA and the HA/DR providers are running as user <sid>adm. To allow this user to interact with the cluster, the hook script is using the sudo command.
The user <sid>adm must be able to set the cluster attributes hana_<sid>_site_srHook_*. The SAP HANA system replication hook needs password free access. The following example limits the sudo access to exactly setting the needed attribute. Replace the <sid> by the lowercase SAP system ID (like ha1). The entries can be added to a new file like /etc/sudoers.d/SAPHanaSR so that the original /etc/sudoers file does not need to be edited.
# SAPHanaSR-ScaleUp entries for writing srHook cluster attribute and SAPHanaSR-hookHelper <sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_<sid>_site_srHook_* <sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper *
While the entry for crm_attribute allows SAPHanaSR.py to interact with the cluster, susTkOver.py is using the new SAPHanaSR hook helper named SAPHanaSR-hookHelper.
Check the resulting sudo configuration using the sudo-list mode:
sudo -U <sid>adm -l | grep "NOPASSWD.*/usr/sbin/SAPHanaSR-hookHelper"
Testing your cluster
As always, before releasing your cluster for productive usage you need to test the new setup carefully. These tests should also include:
- the test of a manual takeover which needs to be blocked
- a takeover triggered by the cluster
- a manual takeover within a valid maintenance procedure
Testing is essential to find possible configuration errors. Also have a look at the man page susTkOver.py (7). This man page describes also how to find the corresponding log entries in the SAP HANA trace files. Some examples:
Example for checking the system log for susTkOver setting HANA system replication status in the CIB properties section. To be executed on respective HANA primary site’s master nameserver.
# grep "sudo.*SAPHanaSR-hookHelper" /var/log/messages
Example for checking the HANA tracefiles for when the hook script has been loaded. To be executed on both site’s master nameservers.
# su - <sid>adm ~> cdtrace ~> grep HADR.*load.*susTkOver nameserver_*.trc ~> grep susTkOver.init nameserver_*.trc
Example for showing intentionally blocked manual takeover attempts. To be executed on respective HANA secondary site’s master nameserver.
# su - sleadm ~> cdtrace ~> grep susTkOver.preTakeover.*failed.*50277 nameserver_*.trc
How to perform a manual takeover using the new SAP HANA integration?
SUSE has defined a generic maintenance procedure which is valid for all typical kinds of maintenance in a SUSE cluster.
The generic maintenance procedure has three major steps. See also man page SAPHanaSR_maintenance_examples (7).
- Set the multi-state resource to maintenance mode
- Doing your manual maintenance – ending up with again a pair of a running primary and secondary. This might include an exchange of the replication “roles”, so the former primary instance is now the secondary and vice versa.
- Ending the maintenance mode by
- Refreshing the multi-state resource, so the cluster could figure-out the changed SR topology
- Get the multi-state resource back into normal cluster operation by ending the maintenance mode
Where to get more information?
SUSE will publish an updated setup guide for the performance optimized scenario in the scale-up configuration. This guide will cover the step-by-step procedure to setup a cluster including the new takeover blocker SAP HANA integration. The guide for SLES for SAP 15 SP4 will be available at: https://documentation.suse.com/sbp/all/single-html/SLES4SAP-hana-sr-guide-PerfOpt-15/.
As always please have a look on our man pages of the packages SAPHanaSR, ClusterTools2 and others. In special for this use case consult the man pages SAPHanaSR (7), SAPHanaSR-hookHelper (8), susTkOver.py (7), SAPHanaSR-showAttr (8) and SAPHanaSR_maintenance_examples (7).
To learn more about how to do a manual takeover, you can read my blog Optimal SAP HANA maintenance procedure using handshake takeover for SUSE clusters.
What to take with
SAPHanaSR version 0.160.1 includes a new hook script susTkOver.py. This script allows to block manual takeover requests during normal cluster operation. It is very easy to implement this feature in the SUSE cluster. You only need to activate susTkOver.py on all SAP HANA instances in your SUSE cluster and to configure sudo properly on all cluster nodes to allow the script to interact with the SUSE cluster.
The new takeover blocker is the next step of SAP HANA integration. It greatly improves the data integrity.
Please also read our other blogs about #TowardsZeroDowntime.