Memory management best practices

This document (7015019) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 SP3
SUSE Linux Enterprise Server 11 SP2
SUSE Linux Enterprise Server 11 SP1

Situation

When there are concerns that a system is in trouble due to the memory limits, it is necessary to answer some important questions.

Why am I worried about the memory?
Is the system/application running slow and system is running out of memory?
System has lack of free memory despite the fact that additional memory was added?
Is the system using just a part of the memory and the rest is being unused?

Following text might explain that there is quite often nothing to be worried about.

Resolution

1. There are concerns about the amount of free memory as e.g. "free" command displays just a small portion of free memory.

system@domain:~> free -m
             total       used       free     shared    buffers     cached
Mem:         11904      11756        148          0         84       8325
-/+ buffers/cache:       3346       8558
Swap:         2053        281       1772

In case that this is the scenario, there is no reason to be worried. Operating system itself manages all available memory in a very efficient way and from user perspective the memory management is fully automated. Unless some of the symptoms described below occurs, the system is handling memory efficiently.

For example, it holds the data which are often in use in memory page cache to provide fast access to those data, as access to memory is much faster in comparison to a disk access. In case if the system cannot satisfy memory requirements of new or greedy processes, then the kernel will start dropping some page cache to free the necessary amount of memory for them. However, when the system is not using available memory, there is most probably something wrong.

In order to check whether the system is running correctly in regards to memory, it is possible to do that by checking the output of the following command:

system@domain:~> cat /proc/meminfo
MemTotal:       12190624 kB
MemFree:          155804 kB
Buffers:           86812 kB
Cached:          8257812 kB
SwapCached:        38220 kB
Active:          8640440 kB
Inactive:        2834544 kB
Active(anon):    6236740 kB
Inactive(anon):   924040 kB
Active(file):    2403700 kB
Inactive(file): 1910504 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2103292 kB
SwapFree:        1814756 kB
Dirty:              5144 kB
Writeback:             0 kB
AnonPages:       3113680 kB
Mapped:          6662064 kB
Shmem:           4030420 kB
Slab:             276132 kB
SReclaimable:     229984 kB
SUnreclaim:        46148 kB
KernelStack:        3640 kB
PageTables:        41784 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8198604 kB
Committed_AS:    9079364 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      152392 kB
VmallocChunk:   34359567868 kB
HardwareCorrupted:     0 kB
AnonHugePages:     63488 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      325604 kB
DirectMap2M:    12255232 kB

which displays a snapshot of memory usage (in KBs) from the moment when the command was triggered. There are many counters in the output but the one that are helping to decide whether the system is running well are the following:

MemTotal (total amount of physical RAM)
MemFree (amount of physical RAM which is unused by the system)
Cached (amount of physical RAM used as cache memory)
Active (amount of buffer/page cache memory which is in active usage - usually not reclaimed for other actions)
Inactive (amount of buffer/page cache memory which is not in active usage - can be reclaimed for other actions)
Dirty - amount of memory waiting to be written to disk

Using simple formula (simplified):

Inactive - Dirty = amount of memory which can be dropped/reclaimed in case that it will be needed by other processes

When the result of this formula does show enough memory, then the system is handling the memory in a good way, and there is no reason to be concerned about the memory usage.

Taking more snapshots of the file (say every 1-2 seconds) while there is a suspected misbehavior might help identifying problems in the kernel memory management implementation under particular load.

2. Another situation which can happen is that system is not using all available physical RAM.
Such situation can happen on NUMA (Non-Uniform Memory Access) hardware. One memory node in the system is too far beyond access of a particular CPU than its local memory node.

Using following command:

system@domain:~> numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 12285 MB
node 0 free: 146 MB
node distances:
node 0
0: 10

it is possible to check whether there is more than one memory node. And if the value of:

/proc/sys/vm/zone_reclaim_mode

is other than 0, there is a chance to be affected by a memory issue described above (not all available physical memory is used).

Mentioned parameter controls whether memory reclaim is performed on a local NUMA node even if there is a plenty of memory free on other nodes. This parameter is automatically turned on for machines with more pronounced NUMA characteristics.

For more details please check:

https://www.suse.com/documentation/sles11/singlehtml/book_sle_tuning/book_sle_tuning.html

Changing the value back to 0 will most probably fix the issue.

3. If there are still concerns that system is running out of memory because:

the formula above shows that just tiny portion of memory is reclaimable
the system is swapping massively: (check "vmstat 1" where swap in (si) and swap out (so) are important values )
or any other issues related to the memory

please take more snapshots of:

/proc/meminfo (from the time when system is under memory stress)
answer the questions mentioned at the beginning of the TID
collect the supportconfig from the system and provide all the logs to SUSE Technical Services for further analysis

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.