may become unresponsive, may suffer Out-of-memory issues, may appear
to stall for a while and then resume normal function minutes or many
hours later etc.
Plus general tuning guidance for
customers running SAP for Suse Linux Enterprise.
general guidelines about using pagecache_limit and optimizing some of
the I/O related settings:-
If on the server in question,
you are *not* simultaneously mixing a heavy file I/O workload while
running a memory intensive application workload, then this setting
(pagecache_limit) will probably cause more harm than good. However,
in most SAP environments, there is both high I/O and memory intensive
Ideally, vm.pagecache_limit_mb should be zero
until such time that pagecache is seen to exhaust memory. If it does
exhaust memory then trial-and-error-tuning must be used to find
values that work for the specific server/workload in question.
regards the type of settings that have both a fixed value and a
'ratio' setting option, keep in mind that ratio settings will be more
and more inaccurate as the amount of memory in the server grows.
Therefore, specific 'byte' settings should be used as opposed to
'ratio' type settings. The 'ratio' settings can allow too much
accumulation of dirty memory which has been proven to lead to
processing stalls during heavy fsync or sync write loads. Setting
dirty_bytes to a reasonable value (which depends on the storage
performance) leads to much less unexpected behavior.
say, a 4gb pagecache limit on a 142G machine, is asking for trouble,
especially when you consider that this would be much smaller than a
default dirty ratio limit (which is by default 40% of available
If the pagecache_limit is used, it should always
be set to a value well above the 'dirty' limit, be it a fixed value
or a percentage.
The thing is that there is no universal
'correct' values for these settings. You are always balancing
throughput with sync latency. If we had code in the kernel so that it
would auto-tune automatically based on the amount of RAM in the
server, it would be very prone to regressions because it depends on
server-specific loading. So, necessarily, it falls to the server
admins to come up with the best values for these settings (via
*If* we know for a fact that the server
does encounter issues with pagecache_limit set to 0 (not active),
then choose a pagecache_limit that is suitable in relation to how
much memory is in the server.
Lets assume that you have a
server with 1TB of RAM, these are *suggested* values which could be
used as a starting point:-
pagecache_limit_mb = 20972
# 20gb - Different values could be tried from say 20gb <>
pagecache_limit_ignore_dirty = 1 # see the below section on this variable to decide what it should be set too
vm.dirty_bytes = 629145600
# This could be reduced or
increased based on actual hardware performance but
vm.dirty_background_bytes to approximately 50% of this
vm.dirty_background_ratio = 0
= 314572800 # Set this value to approximately 50% of vm.dirty_bytes
NOTE: If it is
decided to try setting pagecache_limit to 0 (not active) then it's
still a good idea to test different values for dirty_bytes and
dirty_background_bytes in an I/O intensive environment to arrive at
heart of this patch is a function called shrink_page_cache(). It is
called from balance_pgdat (which is the worker for kswapd) if the
pagecache is above the limit. The function is also called in
shrink_page_cache() calculates the
number of pages the cache is over its limit. It reduces this number
by a factor (so you have to call it several times to get down to the
target) then shrinks the pagecache (using the Kernel
shrink_page_cache does several passes:
- Just reclaiming from inactive pagecache memory. This is fast
-- but it might not find enough free pages; if that happens, the
second pass will happen.
- In the second pass,
pages from active list will also be considered.
- The third pass will only happen if pagecacahe_limig_ignore-dirty is
not 1. In that case, the third pass is a repetition of the second
pass, but this time we allow pages to be written out.
all passes, only unmapped pages will be considered.
it changes memory
the pagecache_limit_mb is set to zero (default), nothing changes.
set to a positive value, there will be three different operating
(1) If we still have plenty of free pages, the pagecache
limit will NOT be enforced. Memory management decisions are taken as
(2) However, as soon someone consumes those free
pages, we'll start freeing pagecache -- as those are returned to the
free page pool, freeing a few pages from pagecache will return us to
state (1) -- if however someone consumes these free pages quickly,
freeing up pages from the pagecache until we
(3) Once we are at or below the low
watermark, pagecache_limit_mb, the pages in the page cache will be
governed by normal paging memory management decisions; if it starts
growing above the limit (corrected by the free pages), we'll free
some up again.
This feature is useful for machines that
have large workloads, carefully sized to eat most of the memory.
Depending on the applications page access pattern, the kernel may too
easily swap the application memory out in favor of pagecache. This
can happen even for low values of swappiness. With this feature, the
admin can tell the kernel that only a certain amount of pagecache is
really considered useful and that it otherwise should favor the
default for this setting is 1; this means that we don't consider
dirty memory to be part of the limited pagecache, as we can not
easily free up dirty memory (we'd need to do writes for this). By
setting this to 0, we actually consider dirty (unampped) memory
to be freeable and do a third pass in shrink_page_cache() where we
schedule the pages for write-out. Values larger than 1 are also
possible and result in a fraction of the dirty pages to be considered
From SAP on the subject:
If there are a
lot of local writes and it is OK to throttle them by limiting the
writeback caching, we recommended that you set the value to 0. If
writing mainly happens to NFS filesystems, the default 1 should be
left untouched. A value of 2 would be a middle ground, not limiting
local write back caching as much, but potentially resulting in some
customers are not tuning I/O settings for large memory systems and
some SAP customers are setting pagecache unnecessarily, or setting a
limit which is much too low for the amount of memory present and the
workload pattern of the server.
Also, since that advent of systemd v228 and beyond, included in SLES12 SP2, SAP systems will likely need to adjust the DefaultTasksMax setting, which was introduced by Linux upstream as a security feature to prevent any one service from spawning to many threads and consuming all server resources.
note that the usage of the pagecache feature is only supported on
SLES for SAP.
Please note that the pagecache feature is not used on SLES for SAP version 15 and above. Instead, control groups (cgroups) are used.
To view current effective DefaultTasksMax, view the contents of /etc/systemd/system.conf, or use the command syntax shown here:-
$ systemctl show --property DefaultTasksMax
To change the global value for DefaultTasksMax, uncomment the line in /etc/systemd/system.conf and set to the desired value.
If you wish to change the DefaultTasksMax on a per-service basis, then the TasksMax setting can be added to the appropriate systemd unit file.
To enable the new settings, you can use 'systemctl deamon-reload' or just reboot the server.
SAP note 1557506 - Linux paging improvements
SAP note 2456149 - Diagnostics Agents fails to start with error OutOfMemoryError on Linux X86 SLES12 SPS2
SLES12 SP2 Release Notes - 2.3.2 Support for PIDs cgroup Controller (DefaultTasksMax)
SUSE Linux Enterprise Server for SAP Applications 12 SP2 - Guide: 7.1 Kernel: Page-Cache Limit
This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.