SAP HANA Cluster – automated OS patching with SUSE Manager and Salt states

October 11, 2022 | By: Meike Chabowski

The following article has been contributed by Bo Jin, Sales Engineer at SUSE and Linux Consultant.

Challenge and motivation

SUSE Linux Enterprise Server for SAP Applications is not just a great product for running SAP workloads. SUSE also provides best practice guides for building up reliable SAP HANA SR high availability clusters by using SUSE Linux Enterprise Server for SAP as a solution.

Customers using clusters often struggle with patching SAP systems running SUSE Linux Enterprise Server in a Pacemaker cluster. The main reason is the need to reboot the operating system (OS) after a kernel patch installation. Although SUSE Live Patching is being used, you still need to patch the OS once in a while with all patches within the scheduled “maintenance window”. During the maintenance window, as an SAP Basis administrator, you need to run cluster commands to move, stop and start resources.

But what if you could automate the OS patching by using SUSE Manager and Salt states while keeping SAP HANA downtime short?

I have developed several Salt execution and state modules which interact with the Pacemaker cluster configuration and management tools crm, crm_mon and the SAPHanaSR-showAttr command in order to query the cluster status.

These Salt modules will be used in Salt states which in turn enable a fully automated patching process for SAP HANA SR scale-up clusters.

The solution in brief

The SUSE Best Practices guide SAP HANA System Replication Scale-Up – Performance Optimized Scenario describes the maintenance of a cluster in quite details. The steps, if not automated, must be executed manually. The Salt states, modules, runners and reactors that I have developed and that are described here have been integrated to exactly follow the best practice instructions.

Some of the “golden rules” of working with Pacemaker clusters I strictly follow are:

“Never change a cluster if the cluster state is not IDLE”
“Don’t change or configure an SAP HANA master-slave cluster resource if the system replication status is not SOK.”

Based on these rules, the patch workflow has been tested as described below.

The patching workflow

The following section explains the patching workflow at a glance. For a two-node SAP HANA scale-up cluster, the step patch diskless node is not needed, and you can continue with the primary node.

Stage 1: Patch secondary site

Execute Salt execution to all member nodes of the cluster:

# salt "hana-*" state.apply myhana

The Salt module will detect the node roles as primary and secondary – and diskless_node in case of a three-node cluster.
Start with the secondary node.
The SAP HANA SR scale-up cluster master-slave resource will be set into maintenance mode.
The secondary node will be patched and rebooted.
After the secondary node has been restarted, Pacemaker will be started.
The master-slave resource will be activated (unset maintenance mode).

Stage 2: Patch diskless node (optional)

Start patching the diskless_node in case of a diskless setup.
diskless_node will be rebooted after patching.

Stage 3: Patch primary site

Re-discover the node roles as primary, secondary and diskless_node in case of a three-node cluster.
Execute Salt states on the primary node.
Move the master-slave resource to the other node which is secondary at the moment.
The SAP HANA SR scale-up cluster master-slave resource will be set into maintenance mode.
The old primary node will be patched and rebooted.
After the old primary node has been restarted, Pacemaker will be started.
Clear the pacemaker cli-ban location constraint so that this node can be used again as new secondary site.
The master-slave resource will be activated (unset maintenance mode).
The old primary has become new secondary.
Now you are finished 😀.

The workflow uses Salt reactor and requisite systems.

Requisites: The Salt requisite system is used to create relationships between states. This provides a method to easily define inter-dependencies between states.
Reactors: Salt’s Reactor system allows Salt to trigger actions in response to an event.

Feel free to adjust the reactor and requisites to map the workflow steps to your needs.

High level architecture

Salt modules for SAP Hana System Replication scale-up cluster

My colleagues from SUSE development created a great set of Salt execution modules, which is called salt-shaptools, that allows us to setup and configure new SAP HANA and NetWeaver clusters. In order to automate the patching of the cluster nodes, I have developed a few additional Salt modules that use crm , crm_mon and SAPHanaSR-showAttr to query SAP HANA cluster resources and nodes status prior to patching the OS.

These execution modules are:

bocrm.check_if_maintenance
bocrm.check_if_nodes_online
bocrm.check_sr_status
bocrm.delete_cli_ban_rule
bocrm.find_cluster_nodes
bocrm.get_dc
bocrm.get_msl_resource_info
bocrm.if_cluster_state_idle
bocrm.is_cluster_idle
bocrm.is_quorum
bocrm.move_msl_resource
bocrm.off_msl_maintenance
bocrm.pacemaker
bocrm.patch_diskless_node
bocrm.set_msl_maintenance
bocrm.set_off_msl_maintenance
bocrm.set_on_msl_maintenance
bocrm.start_pacemaker
bocrm.stop_pacemaker
bocrm.sync_status
bocrm.wait_for_cluster_idle

SUSE Manager in action

In order to create patch and reboot jobs, I also created Salt runner module scripts which call the SUSE Manager API. The main advantage of using SUSE Manager instead of calling the Salt state directly via the cmd state module using cmd.run is that, for audit and compliance reasons, we can keep track of records about the patch jobs. These runner modules are:

checkjob_status.py
patch_hana.py
reboot_host.py

More information

More detailed information about the SaltStack configurations, modules and states which I have created for a fully automated patching of SAP HANA Database Scale-up clusters can be found in my GitHub repository at https://github.com/bjin01/salt-sap-patching which is licensed under GPL v3.0. Long live Salt, SUSE Manager and Pacemaker 😀 !

(Visited 24 times, 1 visits today)

Jul 11th, 2024

Implementation Of SAP Content Server and MaxDB in SUSE High Availability Cluster Environment

Gaurav Patil

May 17th, 2023

What Trento 2.0 Means for a Secure SAP Platform? Changes in the SUSE’s tool to reduce risk and improve the reliability of the SAP environments

Sebastian Martinez

Oct 19th, 2023

Revolutionizing Cloud Infrastructure: A Comprehensive Approach to Streamlined Deployment and Management

Vince Matev

Aug 19th, 2022

SUSE Linux Enterprise Micro 5.3 Public Beta (Beta 2) is out!

Vincent Moutoussamy

6,215 views

Meike Chabowski Meike Chabowski works as Documentation Strategist at SUSE. Before joining the SUSE Documentation team, she was Product Marketing Manager for Enterprise Linux Servers at SUSE, with a focus on Linux for Mainframes, Linux in Retail, and High Performance Computing. Prior to joining SUSE more than 20 years ago, Meike held marketing positions with several IT companies like defacto and Siemens, and was working as Assistant Professor for Mass Media. Meike holds a Master of Arts in Science of Mass Media and Theatre, as well as a Master of Arts in Education from University of Erlangen-Nuremberg/ Germany, and in Italian Literature and Language from University of Parma/Italy.