Solution: Configure Amazon CloudWatch to Alert During an SAP HANA SUSE Cluster Takeover Event
Running workloads in the public cloud is something that customers expect from SUSE, and at SUSE we take it a step further. We strive to ensure that our products integrate with the public cloud platforms that customers choose. In this blog, I will describe how to create an alert when a SUSE scale-up HANA cluster node is promoted to primary during a takeover event. Before I describe how to setup up the alert, I have included a brief background on SUSE and AWS’s experience in running SAP workloads for customers. I included links to references at the bottom of the blog so that you can review to learn more about the topic.
SAP on AWS
Since 2015, SUSE Linux Enterprise Server for SAP Applications has been the operating system of choice for Amazon Web Services (AWS) customers running SAP applications. It was the first SAP certified operating system for SAP HANA on AWS. Currently more than 5,000 customers have deployed SAP on AWS. 
Many AWS and SUSE customers have taken advantage of the services offered by AWS. With regards to SAP on AWS, a large number of customers leverage “Quick Start: SAP HANA on AWS”,  which automates the deployment of a SUSE scale-up cluster. The Quick Start follows the best practices that AWS and SUSE have published, when SUSE Linux Enterprise Server for SAP Applications is chosen as the operating system.  Which ever method customers choose to deploy their SAP HANA SUSE scale-up cluster, I always recommend that they read the white paper “SUSE Linux Enterprise Server for SAP Applications 12 SP3 for the AWS Cloud Set-Up Guide”  to provide detailed information on the settings and cluster agents used to run a SUSE SAP HANA scale-up cluster on AWS.
Once you have your environment properly setup and tested (please see test case scenarios in the previously mentioned white paper ), an important next step before the system is placed into production is to configure alerts to notify the appropriate teams when a takeover cluster event occurs.
AWS Services: CloudTrail, Simple Notification Service and CloudWatch
The SUSE scale-up cluster uses two specific SUSE cluster agents developed for AWS: aws-vpc-move-ip and ec2 stonith agent. In AWS, there is not a virtual floating IP address. Instead the cluster uses an AWS Overlay IP address.   The AWS Overlay IP address is a specific route table entry that sends network traffic to the primary cluster node. The route is controlled by the aws-vpc-move-ip resource agent and will ensure the primary cluster node is the targeted route for SAP HANA traffic.
During a cluster takeover event, the SUSE aws-vpc-move-ip resource agent will change the AWS VPC routing table to the primary cluster node.  This action makes the event ideal for event-driven alerts. To learn more about the how the SUSE SAP HANA scale-up cluster works on AWS please read the white paper “SUSE Linux Enterprise Server for SAP Applications 12 SP3 for the AWS Cloud Set-Up Guide”.
To send a notification based off the route table change, you will need to use three core AWS services which can be configured using the API or AWS Management Console:
- “CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services.”
- Amazon Simple Notification Service (SNS) is the alert push notification service used to deliver alerts.
- Amazon CloudWatch is a monitoring service that will perform an action based on an event.
As I stated previously the SUSE aws-vpc-move-ip resource agent will change the AWS VPC routing table to direct traffic destined for the primary server to the newly promoted cluster node when a takeover/failover occurs. The resource agent performs the route change operation using an AWS API call. CloudTrail is the service that will capture the routing change that is made by the SUSE aws-vpc-move-ip resource agent via the AWS API. The event details that we will alert on are as followed: “eventSource”:”ec2.amazonaws.com” and “eventName”:”ReplaceRoute”. CloudTrail is enabled by default on your AWS account. 
The next step is to configure SNS. You will need to create a Topic which SNS uses as way to group notifications. I used HANA-alert as the “Topic Name” and “Display Name” which allows me to easily identify what alerts I will be pushing to the SNS topic. The next step is to create subscriptions. Subscriptions are notification methods that are used to configure how the alerts are sent to the configured recipients. For my account, I configured email and SMS (text) as the notification method for my email address and phone number. Follow “Getting Started with Amazon SNS” as a walk-through guide on setting up SNS topics and subscriptions. 
At this point the event, ReplaceRoute, is being captured with CloudTrail and we have configured the notification method using SNS. CloudWatch is the next service that needs to be setup. It will be responsible for creating the alert for the API event, ReplaceRoute, and publishing the alert to SNS to send the notification.
To configure CloudWatch select “Create rule” under the CloudWatch Events menu selection in the console. Below is a screenshot of the rule that I created. Since you are matching the alert based on an API action select “Event Pattern”. The “Service Name” is EC2 since it is the service that generates the event ReplaceRoute. The “Event Type” is “AWS API Call via CloudTrail” and the specific operation is ReplaceRoute.
Now that we have configured CloudWatch to alert on the ReplaceRoute event. The next part of the configuration of the event rule is configuring the “Targets”. Essentially we are formatting the alert and telling CloudWatch what to do with the alert. In our case, select “SNS topic” as the target and select the topic that was created for the notifications. “Amazon CloudWatch Events are represented as JSON objects”.  I selected the “Input Transformer” option. The option allows you to configure the output based on the details provided by the event.
A couple of helpful tips:
- I recommend creating the event rule in CloudWatch and sending the notification using “Matched Event” for the first time. The alert will contain the entire event in JSON. There will be ~30 event attributes that can then be used to create a custom message in the “Input Transformer” section.
- After selecting the “Input Transformer”, create the “Input Path” without any white space. The “Input Path” example shows white space but do not include the white space because it will generate JSON errors.
The AWS API integration of the aws-mov-ip resource agent and ec2 fencing agent allows the SUSE SAP HANA scale-up cluster to leverage native AWS services. This blog focuses on configuring alerts when the ReplaceRoute event occurs. Additionally, a CloudWatch event can be created when the ec2 fencing agent initiates a shutdown on a problematic node. I am working on an upcoming blog that describes the cluster behavior during a takeover when a cluster is running on AWS.