Low write performance on Linux servers with large RAM
This document (7010287) is provided subject to the disclaimer at the end of this document.
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11
SLES support has recommended (through this TID and also through active support cases) the following values to a large number of customers for more than a decade, and seen complete relief of related performance problems, without any negative side effects.
To set the values on the fly, which will take effect immediately, use the following commands. This is best to do when large writes are not already underway. These commands will set a 600 MB dirty cache and set background threads to spawn and begin to clear the cache, when it reaches 300 MB:
echo 629145600 > /proc/sys/vm/dirty_bytes echo 314572800 > /proc/sys/vm/dirty_background_bytes
vm.dirty_bytes = 629145600 vm.dirty_background_bytes = 314572800
Deeper explanations, alternative settings, and other helpful knowledge can be found in the "Cause" and "Additional Information" sections below. It is best to become familiar with all sections of this document.
Decreasing cache to improve performance may seem counter-intuitive, because most caches are read caches which give better performance as you increase their size. However, write caches have trade-offs. Write caches allow you to write to memory very quickly, but at some point you have to pay that "debt" and actually get the work done. Writing out all the data can take considerable time. This is especially true when an application is writing large amounts of data to a file system which resides over a network, such as an NFS mount. The faster the network, the less likely this will cause a problem. However, even in the best scenarios, network I/O is usually slower than local disk I/O.
Therefore, it is especially important that NFS Client machines (which mount NFS shares from remote NFS Servers) have a small dirty cache. Of course, it is also possible (but less common) that Linux NFS *Servers* (or any Linux machine) might need these values tuned lower, if the amount of dirty cache is too large. For dirty cache, "too large" simply means: Any size that can't be flushed quickly and efficiently. Of course, this will depend on the hardware in use, how it is configured, whether it is functioning perfectly or having intermittent errors, etc. Therefore, it is difficult to give a rule of thumb about when and where tuning is most needed. The best that can be said is, "If you have problems that involve performance during large writes, try tuning the dirty cache."
In ratio form, the tunable settings are:
The maximum percentage of RAM devoted to dirty cache.
The default on SLES 11 is 40%; on SLES 12 and 15 the default is 20%.
When the dirty cache reaches this percentage of memory, processes will not be allowed to write more data until some of their cached data is written out. This ensures that the ratio is enforced. By itself, this enforcement can slow down writes noticeably, but not tremendously. However, if an application has written a large amount of data which is still in the dirty cache, and then issues a "sync" command to have it all written to disk, this can take a significant amount of time to accomplish. During that time, some applications may appear stuck or hung. Some applications which have timers watching those processes may even believe that too much time has passed and the operation needs to be aborted, also known as a "timeout".
Lowering the ratio from it's default of 20 or 40 to something lower, such as 5 or 10, may be helpful enough. However, some kernels will not accept a value lower than 5%. For systems with extremely large RAM, 5% may still be too much. Therefore, it is commonly necessary to use the "bytes" settings rather than the "ratio" settings.
When dirty cache reaches this percentage of system memory, background writes will start.
The default is 10%.
"Background writes" get writing done even when the application isn't forcing a sync, and even if the dirty_ratio has not yet been reached. The goal of this setting is to prevent the dirty cache from growing too large. When reducing dirty_ratio, it is common to reduce dirty_background_ratio as well.
A good rule of thumb is:
dirty_background_ratio = 1/4 to 1/2 of the dirty_ratio
If the dirty_backgound_ratio is set equal to or higher than the dirty_ratio, the kernel will instead automatically use dirty_background_ratio = 1/2 dirty_ratio. The same type of rule exists when using "..._bytes" settings instead of "..._ratio".
Just like the "*_bytes" values, these ratios can be observed or modified with the sysctl utility (see man pages for sysctl(8), sysctl.conf(5)). But simply put, these can be set (to come into effect upon boot) in /etc/sysctl.conf, as:
vm.dirty_ratio = 10 vm.dirty_background_ratio = 5
Or for temporary testing, values can be echoed into their respective /proc areas:
echo 10 > /proc/sys/vm/dirty_ratio echo 5 > /proc/sys/vm/dirty_background_ratio
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7010287
- Creation Date: 09-Mar-2012
- Modified Date:20-Apr-2023
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com