SAP HANA Scale-Out System Replication for large ERP Systems
Today we see increasing demand for transactional workload such as ERP based on S/4HANA using SAP HANA scale-out systems. Accordingly now also multi-target system replication comes into focus of our customers.
Since 2016 SUSE offers SAPHanaSR-ScaleOut, an high availability solution for SAP HANA scale-out systems. SAP HANA scale-out systems typically covered analytical workloads only (BW on SAP HANA). Multi-target system replication was not needed at that time as the replication could be done by rebuilding the analytical data again from scratch from the operational data. In consequence SUSE’s SAPHanaSR-ScaleOut package initially has been targeting BW on SAP HANA deployments and did not support multi-target system replication.
In this blog series you will learn how SUSE HA and SAPHanaSR-ScaleOut could cover the specific needs for ERP-style HANA scale-out systems.
The blog series consists of the following parts:
- Scale-out large ERP systems (this article)
- SAPHanaSR-ScaleOut for Multi-Target Architecture and Principles
- Migrating to multi-target
- Scale-out maintenance examples (will follow soon)
- A reference part contains links to more detailed information, like specific requirements and configuration items. (link will follow soon)
This might differ from classical blog structures – we will see how it works for the given topic.
Just in case let us explain the term “Scale Out”: “Scale out means combining multiple independent computers into one system. The main reason … is to overcome the limitations of a single … server.” See more details how SAP defines and explains the SAP HANA scale-out concept at SAP help portal. A general introduction into SUSE SAP HANA system replication automation is given by the blog article SAPHanaSR-ScaleOut: Automating SAP HANA System Replication for Scale-Out Installations with SLES for SAP Applications. Please also read the setup guide SAP HANA System Replication Scale-Out – Performance Optimized Scenario and the manual page SAPHanaSR-ScaleOut(7).
Now let us start with SUSE HA for SAP HANA Scale-Out System Replication for large ERP Systems. We will work along five questions:
- Why is HANA scale-out for large ERP systems a challenge?
- What is the basic solution for HANA scale-out ERP systems?
- How do HANA scale-out ERP systems look like from SUSE HA perspective?
- Where can I find further information?
- What to take with?
Why is HANA scale-out for large ERP systems a challenge?
When comparing the two workloads, analytical (BW) versus transactional (ERP), the later one shows specific characteristics and requirements:
- Size: Large ERP systems do not fit into one single VM or cloud instance, SAP might allow ERP scale-out on request.
- Regular start and stop of large SAP HANA database instance takes more than an hour.
- Shutdown of broken large indexserver takes around an hour. Adding a special HA/DR provider hook covering the srServiceStateChanged() event could this long stop times.
- Small number of worker nodes, usually two per site. This supports the good practice to limit the number of cross-node joins in the transactional use case.
- No standby nodes. This implies there is only one master name server candidate. If that node gets lost the cluster needs to manage a “headless” SAP HANA. This also implies that neither data nor log could be failover from one node to an other. So the solution does not need to switch /hana/data nor /hana/log. Local disks are sufficient for these areas.
- SAP HANA tends to stay inactive on infrastructure failures instead of reporting errors. Particularly failing NFS server needs good recovery.
- Optionally active/active read-enabled setup.
- Easy setup.
- But heavy SLAs.
What is the basic solution for HANA scale-out ERP systems?
SAPHanaSR-ScaleOut version 0.164 already provides all functionality to implement high availability for large scale-out ERP systems without multi-target system replication. An appropriate configuration contains:
- Only one HANA master name server configured
- SAPHanaController and SAPHanaTopology resources as usual, update RPM with adaptive start/stop timeouts
- HA/DR provider hook srConnectionChanged() script for correct detection of system replication status event in failure situations
- HA/DR provider hook srServiceStateChanged() script for fast shutdown of HANA indexserver (please ask SAP for details)
- SAP HANA data and logs on local disks, no host auto-failover, no storage API needed
- NFS share /hana/shared/ is provided locally, not across sites
- Optional Filesystem resource for bind mount of NFS share, see https://www.suse.com/support/kb/doc/?id=000019904
- Optional location constraints for IP address on secondary HANA, for active-active read-enabled feature
- Disk less SBD or disk based SBD with three devices
Do not forget about the third site, always needed for scale-out. Our graphic shows the majority maker (MM) on a 3rd site. This allows also continue running the cluster, if one complete site goes down. The resource agents SAPHanaTopology, SAPHanaController and the HA/DR provider hook script SAPHanaSR.py, all from package SAPHanaSR-ScaleOut, are part of the cluster setup. At this point you might expect a configuration example.
The short story: The cluster configuration looks quite similar to the ones described in our setup guides.
Even more details and configuration examples are available with the RA’s manual pages ocf_suse_SAPHanaTopology(7), ocf_suse_SAPHanaController(7), ocf_heartbeat_IPAddr2(7) and the hook script manual page SAPHanaSR.py(7). Manual page SAPHanaSR-ScaleOut_basic_cluster(7) covers details on cluster basic configuration, like parallel fencing.
As with all SUSE HA for SAP HANA scale-out setups, disk less SBD as well as disk based SBD is a recommended STONITH mechanism. Disk based SBD always should use three devices. Advantages of disk less SBD are less complexity and shorter fencing times.
Special with the ERP-style setup is non-existing standby nodes and no SAP HANA host auto-fail over, large nodes and long timeouts, and finally the high SLA expectations. This aspects might be addressed by adapting the configuration of SAP HANA, SUSE HA cluster and infrastructure.
Only one Master Name Server Candidate
On SAP HANA level, only one master name server candidate should be configured.
--- [landscape] ... master = suse11:31001 worker = suse11 suse12 active_master = suse11:31001 roles_suse11 = worker roles_suse12 = worker ... --- Example: SAP HANA nameserver.ini
The HA/DR provider
An HA/DR provider hook script for the srConnectionChanged() event is needed to ensure correct detection of the system replication status in failure situations. The package SAPHanaSR-ScaleOut includes this script already. Add the script to the SAP HANA config file global.ini on both sites.
--- [ha_dr_provider_saphanasr] provider = SAPHanaSR path = /usr/share/SAPHanaSR-ScaleOut/ execution_order = 1 ... [trace] ha_dr_saphanasr = info --- Example: HANA global.ini
Please note this script is different from the one shipped for scale-up.
Configure proper Permissions
The SAPHanaSR HA/DR provider hook script is called by SAP HANA on srConnectionChanged() events and writes the current system replication status into a SUSE HA cluster attribute. For this purpose, the Linux user <sid>adm will run the command crm_attribute. A sudo permission needs to be defined, like this simple example for SID SLE:
--- sleadm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_sle_* --- Example: /etc/sudoers.d/SAPHanaSR for SID SLE
The given example fits for scale-up and scale-out multi-target setups as well. More granular rules are possible. Please see manual page SAPHanaSR.py(7) for details on attributes and permissions.
Optionally another HA/DR provider hook script for the srServiceStateChanged() event could be used to handle indexserver issues. Finally ask SAP, if SAP HANA’s built-in NFS server could be used here as we have no host auto-failover. Please contact SAP for details on configuration options, particularly the hook script for srServiceStateChanged() events.
Corosync and Cluster Resources
Good practice is to use redundant corosync rings on SUSE HA cluster level. Find details in the above mentioned setup guides, our HA product documentation and manual page corosync.conf(5). Also a good practice is to tune the resource agent timeouts to match individual SAP HANA’s behavior. Please refer to manual pages SAPHanaSR-ScaleOut(7), ocf_suse_SAPHanaController(7), ocf_suse_SAPHanaTopology(7), SAPHanaSR-ScaleOut_basic_cluster(7) and SAPHanaSR.py.(7) for details. If the SAP HANA active/active read-enabled secondary feature is required, the secondary IP address can be configure as HA cluster resource like shown below. The example works only for SAP HANA scale-out without auto-host failover.
--- primitive rsc_ip_ro_SLE_HDB00 IPAddr2 \ op monitor interval=10s timeout=20s \ params ip=192.168.178.199 colocation col_ip_ro_on_secondary_SLE_HDB00 2000: rsc_ip_ro_SLE_HDB00:Started msl_SAPHanaCon_SLE_HDB00:Slave location loc_ip_ro_not_master_SLE_HDB00 rsc_ip_ro_SLE_HDB00 rule -inf: hana_sle_roles ne master1:master:worker:master --- Example: CIB second IP address for HANA read-enabled
Find more information on cluster resource constraints in our HA product documentation. Manual pages ocf_heartbeat_IPAddr2(7) and SAPHanaSR-showAttr(8) are covering details on the IPAddr2 resource agent and the SAP HANA resource attributes.
Further Infrastructure Components
It’s important to prepare all infrastructure components on all levels for fault tolerance. The storage for SAP HANA data and log filesystems and the NFS servers for /hana/shared/ are crucial. Storage for HANA data and log filesystems can be either on local RAID, on SAN storage or on dedicated highly available and performant NFS shares. As usual, it is important to obey SAP certification rules.
The SAP HANA Fast Restart feature on RAM-tmfps as well as HANA on persistent memory (Intel Optane DC Persistent Memory and IBM vPMEM) can be used, as long as they are transparent to SUSE HA.
For more information from SAP HANA perspective, please look at SAP, e.g. blogs HANA startup tuning – part 2 and SAP HANA and Persistent Memory. Details on persistent memory from Linux perspective can be found in SUSE product release notes and TIDs.
One highly available NFS service for /hana/shared/ is set up per site. This filesystem holds programs for managing and monitoring the HANA database. Human admins and HA clusters are helpless if they are missing this filesystem. Network should be redundant on infrastructure level and Linux bonding might be used. In ideal case, SUSE HA corosync communication runs on redundant links and has one dedicated network. For example one corosync ring runs on the bonded application network, the other runs on a dedicated corosync link.
How do HANA scale-out ERP systems look like from SUSE HA perspective?
First we have a look at the Linux cluster nodes and resource with the common command crm_mon -1r:
--- Stack: corosync Current DC: suse00 (version 2.0.1+20190417.13d370ca9-3.9.1-2.0.1+20190417.13d370ca9) Last updated: Tue Jul 20 16:30:50 2021 Last change: Tue Jul 20 16:30:06 2021 by root via crm_attribute on suse11 5 nodes configured 12 resources configured Online: [ suse00 suse11 suse12 suse21 suse22 ] Full list of resources: rsc_stonith_sbd (stonith:external/sbd): Started suse00 Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00] Started: [ suse11 suse12 suse21 suse22 ] Stopped: [ suse00 ] Clone Set: msl_SAPHanaCon_HA1_HDB00 [rsc_SAPHanaCon_HA1_HDB00] (promotable) Masters: [ suse11 ] Slaves: [ suse12 suse21 suse22 ] Stopped: [ suse00 ] rsc_ip_sec_HA1_HDB00 (ocf::heartbeat:IPaddr2): Started suse21 rsc_ip_HA1_HDB00 (ocf::heartbeat:IPaddr2): Started suse11 --- Example: Output of crm_mon -1r
As outlined in the overview picture we see five nodes. Four nodes are running HANA resources. One node does not run HANA resources – this is the majority maker. Something special for scale-out is the second IP address. This IP address provides the access to the HANA active/active read-enabled side. See also our explanation above. Other than that nothing special in this resource list.
The SAPHanaSR cluster attributes
The cluster manages SAP HANA and decides by stored attributes. Now let us look closer at these attributes. The command SAPHanaSR-showAttr prints the Linux cluster representation of SAP HANA specifics like sites, server roles and system replication status.
--- Glo cib-time prim sec srHook srmode sync_state ---------------------------------------------------------------- HA1 Tue Jul 20 16:34:08 2021 S1 S2 SOK syncmem SOK Resource maintenance ------------------------------------- cln_SAPHanaTop_HA1_HDB00 false msl_SAPHanaCon_HA1_HDB00 false Site lpt lss mns srr ------------------------------- S1 1626791599 4 suse11 P S2 30 4 suse21 S Hosts clone_state node_state roles score site ----------------------------------------------------------------------- suse00 online ::: suse11 PROMOTED online master1:master:worker:master 150 S1 suse12 DEMOTED online slave:slave:worker:slave -10000 S1 suse21 DEMOTED online master1:master:worker:master 100 S2 suse22 DEMOTED online slave:slave:worker:slave -12200 S2 --- Example: Output of SAPHanaSR-showAttr
Let’s just put a spot one some of the interesting items.
- The “Glo(bal)” section shows the global “srHook” attribute that reflects the SAP HANA system replication status as reported by the HA/DR provider script. The attribute changes whenever SAP HANA raises an srConnectionChanged() event (and the Linux cluster is functional). This information helps to decide whether a SAP HANA take-over can be initiated in case the primary site fails. Only if the “srHook” attribute is “SOK”, the cluster will initiate a take-over. This “sync_state” attribute reflects the SAP HANA system replication status as reported by the SAPHanaController RA monitor. The RA sets this attribute whenever processing a monitor or probe action – basically the call is SAP HANA’s systemReplicationStatus.py. This happens on regular base, defined by the monitor interval, as well as on start/stop/promote/demote operations. We have two attributes for SAP HANA’s system replication status? – Yes! The polling-based attribute sync_state is a fallback for lost srHook events.
- The “Site” section “lss” column shows SAP HANA’s overall landscape status per site. The SAPHanaTopology RA monitor basically calls SAP HANA’s landscapeHostConfiguration.py script and updates this attribute accordingly. As long as SAP HANA does not report “lss”as “1”, no take-over will happen. A value of “0” indicates a fatal internal communication error that made it impossible to detect the current landscape status. See manual page ocf_suse_SAPHanaController(7) for details. The attribute “srr” indicates the detected system replication role. “P” is the abbreviation for “primary” and “S” (easy to guess) for “secondary”. The SAPHanaTopology resource agent sets these values to allow operation in maintenance windows of SAP HANA. The attribute “mns” indicates the current identified active master names server of the site.
- In the “Hosts” section the “roles” column shows actual and configured roles for HANA on each node. Since we do not have standby nodes, actual and configured is always the same for a given host once SAP HANA is running. This output reflects the entries we made in HANA’s nameserver.ini file. The SAPHanaTopology RA updates these attributes during each monitoe run. The majority maker suse00 has no SAP HANA roles at all, obviously. The “score” column shows what scores SAPHanaController uses for placing the roles, like primary SAP HANA master name server, primary worker and more on the right hosts. Mapping the HANA roles and sites to Linux HA resource status is the tricky part.
Manual page SAPHanaSR-showAttr(8) explains the meaning of all fields.
Where can I find further information?
Find more details on requirements, parameters and configuration options in the setup guide as well as in manual pages SAPHanaSR-ScaleOut(7), ocf_suse_SAPHanaController(7) and SAPHanaSR.py(7).
Please have a look at the reference part of this blog series. (link will follow soon)
What to take with?
- HANA scale-out systems for ERP have different characteristics and higher SLAs than BW.
- The outlined concept shows how to use the existing SAPHanaSR-ScaleOut 0.164 package to meet that requirements, except multi-target system replication.
- The new SAPHanaSR-ScaleOut 0.180 package adds scale-out multi-target system replication.
- Existing scale-out systems are ready for upgrade and preparation for multi-target system replication. Procedure and tools are provided. Again this is for RA version 0.180.x and newer.
- Increased SLAs, multi-target system replication and HA automation makes HANA scale-out installations even more complex.
- Well documented configuration, test cases and maintenance procedures will help.