This blog was written based on the SUSECON 2019 presentation given by Stephen Mogg, Technical Strategist for SAP and Public Cloud and Mark Gonnelly, Senior Consultant for SUSE Consulting.
Microsoft Azure is one of the best hyper-scale, enterprise-grade, hybrid cloud platform available. In fact, it’s the only global public cloud service provider to offer SLAs. But, where do you turn when your business needs a higher SLA?
When High Availability Matters
SUSE High Availability Extension is the perfect companion to keep your business critical systems up and running. While Azure provides superior reliability and security for cloud computing, SUSE High Availability uses open source high availability clustering technology. This adds an extra layer of protection against downtime. Clustering helps safeguard your workloads from systems failure and increases services availability, either through greater reliability, redundancy or fast failover to standby systems.
What’s the Big Deal About Clustering?
Clustering seems like a very complex and daunting system, and to a degree, it is. If you look at SUSE High Availability documentation, there are lots of intricate diagrams with a lot of moving parts – but when you break it down, there are really only two parts: Corosync and Pacemaker.
Corosync is the cluster membership communication layer. It’s the piece that the nodes use to talk to each to confirm that they’re all still up and running.
Pacemaker sits on top of Corosync and acts as the resource manager. It’s the piece that does all the work. It continuously monitors the system, manages dependencies and via a set of scripts, it automatically stops/starts and migrates services based on whatever rules and policies you have configured.
Pacemaker is the resource manager, and like most managers, Pacemaker has a group of employees, if you will. These “employees” are called “Resource Agents” (RAs). RAs give Pacemaker information about the cluster so it will know when to stop, start and/or migrate a resource. Resource agents provide “intelligence” to Pacemaker.
Next we have Fencing. Why do we need it? For one reason, loss of a peer node is indistinguishable from loss of communication with that node. There is a big difference in a node being down because it is physically broken and a node being down because of a network failure. When the state of a node or resource cannot be established with certainty, fencing comes in. Even when the cluster is not aware of what is happening on a given node, fencing can ensure that the node does not run any important resources. Fencing is about moving from an UNKNOWN state to a KNOWN state.
Implementing High Availability in the Cloud
Now that we have discussed the basic components of SUSE High Availability – we need talk about how to implement that technology inside Azure. The first thing to note is that whether you get clustering or not will depend on how you purchase SUSE in Azure. You can buy it either via the Azure marketplace, or as “bring your own subscription” (BYOS).
There is the standard SUSE Linux Enterprise Server and SUSE Linux Enterprise Server for SAP. If you buy SUSE Linux Enterprise Server for SAP, the High Availability extension is included. If you buy the standard SUSE Linux Enterprise Server through the Azure marketplace, you have no ability to add anything on to it. That is, you can’t get the High Availability extension to provide protection to your applications, so you’ll have to use the BYOS model. Keep that in mind if you want to provide High Availability capabilities to some applications.
Additionally, there are certain technical specifications when considering clustering in a public cloud, like Azure. It’s not impossible – but there are differences. Fencing is one such example. The most popular form of fencing is Shared Block Device (SBD), and most cloud providers don’t allow a raw block device to be shared between multiple VMs. Also, when it comes to shared storage (NFS/SNB), you might get NFS or you might not – it depends on your public cloud provider. So, there are bits to finagle when using the cloud, as opposed to on premise, when it comes to clustering. There are Corosync changes to make – as well as fencing roles and permissions to tweak, as mentioned earlier.
The More You Learn…
For more detailed technical information, as well as best practices, links to SUSE and Microsoft resources and a demo of how it all works together, watch the video from SUSECON ’19 below or check-out the PDF presentation here.
Join Us in Dublin!
If this has piqued your interest, why not plan to attend the next SUSECON! Registration for SUSECON2020 is now open! Stay on the cutting edge of what’s happening in open source technology. Register today and attend next year’s conference in Dublin, Ireland.