Drive Superior SAP High Availability on Azure with SUSE and Microsoft

Share
Share

Co-authored by Sanoop Nampoothiri, Principal Software Architect at Microsoft  

Running mission-critical SAP workloads requires robust high availability (HA) solutions. For years, SUSE and Microsoft have partnered to deliver secure, reliable and cost-effective SAP deployments on Azure. This collaboration focuses on innovation, joint quality assurance and continuous improvement, ensuring your SAP systems remain resilient and performant. 

Forge a Strong Partnership for SAP on Azure

SUSE and Microsoft have a long-standing partnership, particularly in supporting SAP workloads on Azure. Today, Azure hosts some of the world’s largest SAP deployments. From the very beginning, SUSE and Microsoft played a pioneering role in bringing SAP to Azure, providing the operating system foundation for the initial reference architectures. This collaboration extends to topics like security, reliability or cost of operations, bringing innovations fast and with a superior quality to the customers. Whenever SUSE introduces innovations, Microsoft works to ensure those scenarios are supported on Azure — and likewise, SUSE collaborates to bring Microsoft-driven innovations to its platforms, highlighting a strong two-way partnership. Ongoing support for the feature might be split between Microsoft and SUSE depending on if you have a bring your own subscription or pay-as-you-go system. Regular meetings between product management and engineering teams ensure constant close synchronization. 

Rigorously Testing for Unmatched Quality 

Quality assurance tests are broadly conducted for SAP workloads on Azure. As part of any Linux release, whether it is a major release like SLES 12 to SLES 15 or a minor support pack update, a large set of quality assurance tests are conducted. An internal framework is used to validate the scenarios before approving that latest release for running SAP workloads on Azure. The goal is to not just make sure that SAP workloads can run on Azure, but that they run with the same level of performance, high availability, and automation that customers expect from Azure. 

Key aspects of this joint QA effort include: 

  1. High Availability Validation: Heavy focus is placed on high availability. They validate any changes to the HA configuration, simulate several failover scenarios, such as network failures and disk failures, and then see how the failover responds. This ensures the failover experience remains seamless with the latest release. 
  2. SDAF (SAP Deployment Automation Framework): SDAF streamlines and standardizes SAP deployments by covering a wide range of scenarios. Comprehensive testing of SDAF on the new SLES release ensures that the entire SAP stack from infrastructure to application components can be provisioned and configured automatically, reducing manual effort and minimizing deployment risks. 
     
    To address concerns about delays in approval due to issues found after general availability, SUSE and Microsoft collaborated closely and launched joint QA improvements for SAP and Azure. The idea is to do these quality assurance tests much earlier in the release cycle, in the preview state, instead of after the product is generally available. They extended a platform framework with SDAF from Microsoft and the open QA framework from SUSE, automating a bunch of these tests. By automating this and doing it early, they can catch issues much faster and engage SUSE in fixing them before it becomes GA. This allows them to approve these latest releases for Azure immediately as soon as they are available. Overall, the effort has been to provide a truly integrated platform for SAP customers on Azure, so that they can run their mission-critical applications on Azure with SUSE confidently. 

Evolving SAP Reference Architectures for Azure 

The results of years of collaboration with SUSE have produced great results. The SAP on Azure reference architectures are a great example of the partnership. Over the years, we have released several reference architectures covering different business scenarios and customer requirements. 

Key advancements in reference architecture include: 

  1. Increased Uptime: Whenever an update to the reference architecture is released, new features from SUSE and Azure are adopted. For example, when Azure started supporting availability zones, reference architectures were updated to support availability zones. The virtual machine scale set is another example. The SUSE HANA Check Server, a hook SUSE offers for catching index server crashes in HANA, detects index server crashes and initiates failover, which is a faster recovery process. 
  2. Lowering Total Cost of Ownership: Focus on lowering total cost of ownership for customers, ensuring they get a balanced or flexible option where they can balance cost versus high availability. The multi-ID reference architecture is one example, where there was demand from customers to consolidate multiple SAP systems in a single ACS cluster in non-production environments to reduce cost and complexity. 
     

For SAP ASCS clusters, the reference architectures have evolved over the years. They started with ASCS ERS with NFS cluster using DRBD before availability zones and other storage native storage capabilities were available in Azure. Once Azure started offering native storage capabilities using ANF, Azure NetApp Files, they offered a reference architecture that supports NFS managed shares on Azure NetApp Files, which is a high-performance, low-latency managed share. They also offered NFS on Azure Files, which is a cost-effective option and higher resiliency because it is zone-replicated. The multi-ID guide was offered when there was a demand from customers to create a reference architecture that could support running multiple systems in a single cluster. The recommendation today for ASCS cluster is using Azure native storage like Azure Files on NFS, and then using the simple mount operation, which makes the whole cluster configuration less complex. 
 
HANA reference architectures are very similar to ASCS, offering different flavors of deployment patterns, focusing on offering flexibility to customers in terms of availability versus cost. They started with a simple scale-up plus Pacemaker option with individual virtual machines running with local disks and simple mount. For customers who wanted additional monitoring for their NFS mounted file servers, they offered the scale up HSR with NFS mounted file shares where HANA log data shared would be hosted on Azure NetApp Files, with the additional feature of Pacemaker monitoring NFS for failures. For larger implementations of HANA like scale out, they offer two solutions: scale out with N+m or a standby node with shared storage and a standby node ready to take over. Finally, for customers requiring more availability, they offer scale out HSR with Pacemaker, where scale out nodes would be deployed in two zones, using Pacemaker for failovers and for monitoring NFS shares. 

Introducing SAPHanaSR-angi: The Next Generation of HANA Resource Agents 

SAPHanaSR-angi, meaning Advanced Next Generation Interface, is a new set of resource agents for HANA system replication, developed to secure SUSE for SAP HANA over the next decade. This new solution was developed based on customer feedback and needs (scenarios, usability, troubleshooting), engineering thoughts (maintenance, QA automation, future readiness), and workload needs from HANA (features, required, interfaces, runtimes, road map). It is a package containing both types of resource agents for scale-out and scale-up. It is available starting with SP5, and SP6 and SP4 are also supported, allowing seamless migration to this new resource agent without downtime. SAPHanaSR-angi will also be the default for SLES 16 and ongoing, and new features will be implemented in this new set of resource agents. 

Benefits of SAPHanaSR-angi include: 

  1. Improved Outage Handling: Better handling for short-time outages, especially of the HANA tools, and related to communication to the Pacemaker backend. 
  2. Enhanced Resilience: More resilient against admin mistakes during maintenance procedures. 
  3. Faster Takeovers: Faster take-over in case of file system failures. Also, faster takeovers in scale-out setups for several cases like failing index servers, file system or node failures. Finally, faster takeover in scale-up setups if the SAP HANA is unresponsive (this feature is experimental, and feedback is welcome). 
  4. Reduced Complexity: Reduced complexity due to identical software packages and similar configurations for and scale-out. 

SAPHanaSR-angi has two modes: 

  1. Conservative Configuration: The cluster is patient and prefers stopping over fencing the nodes and this one does not react to filesystem failures. 
  2. Progressive Configuration: In case of failures, the nodes or all affected nodes of the affected site will be fenced. Takeover time can be further optimized here. 

It is recommended to carefully investigate those two types of configurations and make an educated decision on how to set it up in your landscapes. 

Proactive HA with Azure Scheduled Events 

Azure Scheduled Events is a service that allows virtual machines to query an instance API for early warnings about upcoming events, such as planned maintenance or unexpected hardware issues. In this setup, the azure-events resource agent is integrated with the Pacemaker cluster to continuously monitor such notifications. When a scheduled event is detected, the agent can take proactive steps such as safely draining workloads from the active node, placing it into maintenance mode, and transferring services to a secondary node. This helps reduce the risk of service disruptions by acting before the event impacts the system. Several enhancements have been made to the agent and its deployment process to improve the efficiency and reliability of API communication.  

Conclusion: 

The continuous collaboration between SUSE and Microsoft on high availability innovations for SAP on Azure demonstrates a shared commitment to empowering your enterprise. By leveraging advanced resource agents like SAPHanaSR-angi and intelligent features like Azure Scheduled Events, you gain robust, reliable, and highly available SAP environments. Our proven solutions ensure your mission-critical applications run confidently on Azure. 

For a deeper dive into these innovations, watch the full session: Unlocking Synergies: SUSE & Microsoft High Availability Innovations. 

We invite you to visit our dedicated SUSE/Microsoft webpage.

We would also be happy to discuss your specific SAP on Azure needs at Microsoft Ignite 2025 in San Francisco. Come meet us in person!

Share
(Visited 1 times, 1 visits today)
Avatar photo
11 views
Tobias Kutning - Senior Manager Product Management for SAP Solutions