This blog article is part of a blog series, explaining the SAPHanaSR Scale-Out solution. This article explains the SAPHanaSR technology and HANA System Replication at an overview level. The follow-up article in this series, written by my colleague Fabian Herschel, describes the solution at a more detailed level.
The SAP HANA System Replication Automation (SAPHanaSR) is now also available for SAP HANA Scale-Out scenarios, starting with SUSE Linux Enterprise Server for SAP Applications 12 Service Pack 2. SAPHanaSR Scale-Out provides an automatic failover between SAP HANA nodes with configured System Replication for complex HANA Scale-Out configurations running in two different locations. This technology is included in SLES for SAP 12 SP2 without any additional cost.
SAPHanaSR makes use of the SAP HANA System Replication technology, included with SAP HANA, that replicates data between HANA nodes over the network, directly from the memory of a primary HANA system into the memory of secondary HANA system (memory preload). It ensures that data between HANA systems is reliably in-sync and consistent. Usually the HANA systems are located at different sites, that can be located i.e. in two different rooms in a datacenter or even in different datacenters on different geographical locations. This allows to implement various High Availability and / or Disaster Recovery scenarios. With HANA Scale-Up System-Replication, a HANA system consists of exactly two HANA nodes, one per site. With HANA Scale-Out System Replication, a HANA systems consists of two or more nodes per site. In this case each node of the primary HANA Scale-Out system gets a corresponding System Replication node of the secondary system, a so called System Replication pair.
Since the data is already pre-loaded in the memory, a secondary HANA system might be able to quickly takeover the operation. However, the process for a take-over is a complex manual process, that requires manual interaction on the Linux command line. First, a system or node failure has to be detected. Second, the HANA System Replication take-over process has to be initiated. It is in the System Administrations responsibility to get a consistent picture of the HANA System Replication topology in order to execute the right steps on right nodes. Third, the service IP addresses have to be re-configured to the new primary nodes to ensure, that clients can re-connect to the system.
The SUSE SAPHanaSR technology automates these processes required for a successful System Replication take-over. This includes the monitoring of all SAP HANA nodes, the System Replication take-over process and the failover of the service IP addresses. Furthermore SAPHanaSR constantly gathers information about the HANA system landscape and therefore always has a consistent picture of the HANA System Replication topology, allowing to take appropriate actions on the right nodes.
Therefore SAPHanaSR significantly reduces the time of the System Replication take-over, increases the reliability of the whole failover process (monitoring, SR take-over, service IP failover) and thus reduces the downtime of a HANA system to a minimum in case of a node failure.
The underlying technology of SAPHanaSR is based on the Open Source cluster framework Pacemaker & Corosync plus dedicated Resource Agents for SAP HANA (SAPHanaController, SAPHanaTopology). All nodes of the primary and secondary HANA systems are configured in an active-active multi-node high availability cluster. Additionally, a dedicated majority-maker node helps to make the right decision, which of the two HANA sites will survive. The SAP HANA resource agents are monitoring and controlling the SAP HANA nodes, i.e. starting and stopping them and perfomring the System Replication take-over operations. in case of a network connectivity loss between the two HANA sites, the cluster STONITH fencing mechanism make sure, that the cluster will never start two SAP HANA primary systems at the same time. This avoids the very dangerous dual-primary scenarios, where both HANA systems might be able to accept client connections, causing diverging data sets in the two databases. Furthermore, a special algorithm (LPA – last primary arbitration) makes sure, that even some administrative mistakes causing a dual-primary are detected and handled correctly.
SAPHanaSR Scale-Out enhances the System Replication automation from Scale-Up to complex HANA Scale-Out scenarios. These systems consist of two or more HANA nodes per site. SAPHanaSR Scale-Out has been tested with clusters up to 37 nodes. Similar to SAPHanaSR Scale-Up, multpile System Replication configuration scenarios are supported:
- HANA Scale-Out with System Replication between two HANA systems located at two different sites
- HANA Scale-Out with enabled autohost failover and System Replication between two HANA systems at two different sites
- HANA Scale-Out with with multiple containers in one database (MCD)
SAPHanaSR Scale-Out is included with SLES for SAP Applications 12 Service Pack 2. Detailed best-practice guides, explaining the setup and configuration in detail are available on the SUSE website as well as in the SAPHanaSR RPM package.
Additionally to the SAPHanaSR technology, SUSE Linux Enterprise Server for SAP applications offers a lot of additional features around the topic “Towards Zero Downtime”. This includes a comprehensive set of RAS features (Reliability, Availability, Serviceability), a full system rollback with one click technology based on BTRFS and Snapper and an optionally available Kernel Live Patching functionality, allowing System Administrators to install Linux Kernel security patches with a reboot. More information about these features can be found on our website: https://www.suse.com/products/sles-for-sap/