Protecting SAP HANA workloads from memory reclamation on large memory systems
This document (000021841) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 15 SP3
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP5
SUSE Linux Enterprise Server for SAP Applications 15 SP6
Situation
SAP HANA systems with substantial memory can experience sudden CPU spikes, leading to performance degradation and potential failovers in clustered environments. This is often accompanied by seemingly depleted available memory, heavily occupied by pagecache, while active/inactive memory usage of the HANA is consuming a fraction of the total memory only.
Resolution
To mitigate this, SAP workload memory can be protected against external memory demands using the cgroup2 memory controller. SUSE simplified this configuration with the introduction of Workload Memory Protection (WMP) in SLES 15, as detailed in the SUSE documentation:
https://documentation.suse.com/sles-sap/15-SP6/html/SLES-SAP-guide/cha-memory-protection.html
The memory.low protection value should ideally align with the working set size of the protected SAP workload. This protection does not reserve memory exclusively but makes its reclamation less preferred. While the documentation cited above might suggest using the Global Allocation Limit (GAL) for memory.low on large memory systems, smaller environments might lack sufficient memory for the OS and vendor tools. In such cases, the recommendation of a minimum of 512 MB of RAM per CPU core for the operating system, as outlined in the SUSE planning guide, should be followed.
Cause
When external processes (e.g., backups) request significant memory, the kernel needs to reclaim memory pages. On systems with terabytes of memory, this process can consume considerable CPU resources. In severe cases, excessive memory reclamation can lead to "thrashing," of the main workload, where actively used memory is repeatedly reclaimed and reloaded, causing system instability.
Additional Information
The aim to protect SAP workloads from such behavior exists for a decade. SUSE introduced the Page Cache Limit in SUSE Linux Enterprise Server for SAP applications 11 SP1 onwards and in SUSE Linux Enterprise Server for SAP applications 12.
However the vm.pagecache_limit_mb has been deprecated for several reasons (e.g. the limit constrained both disposable pagecache use and necessary pagecache of the workload) in favor of cgroup2 memory controller.
Technical Background:
The kernel maintains three watermarks per memory zone (or it can be said with some simplification - per NUMA node), min, low, high.
If free memory drops below the low watermark, the kernel thread kswapd will be woken up to reclaim memory up to high watermark, then sleep again. Userspace threads will continue getting memory immediately until free memory drops to the min watermark.
What does this mean for threaded workloads?
If there's a sudden memory demand from many threads and kswapd can't reclaim fast enough, this direct reclaim will happen and userspace execution of the HANA threads will be blocked as they perform the direct reclaim in the kernel.
In the context of HANA with no swap, the reclaimed memory will be utilized from page cache.
What are the performance implications for large workloads with a large page cache (non shared memory) if HANA suddenly requests memory considering that HANA requests memory for allocation on a certain numa node?
If the HANA policy is to prefer a node and not strictly bind to it, it can happen that allocation will be satisfied from a non-preferred node if it has enough memory, rather than reclaim memory from the preferred node. Otherwise if the policy is bind the allocation can be satisfied from the bound numa node only. The performance impact depends on whether the memory requests are spread over time and kswapd can keep up, or bursts of multiple threads requesting memory in parallel happen.
How to monitor reclaim activity?
There is no configurable way to express reclaim activity via a logging mechanism. The amount of direct reclaim activity can be observed
from /proc/vmstat counters such as pgscan_direct.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021841
- Creation Date: 16-May-2025
- Modified Date:16-May-2025
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com