Cluster status reports MDSs behind on trimming

This document (000019740) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 6

Situation

The cluster status shows the following health warnings:

HEALTH_WARN <x> MDSs behind on trimming
HEALTH_WARN x clients failing to respond to cache pressure

Resolution

Increase / decrease the below values by 10%, then observe the cluster and if needed adjust by another 10% (depending on the results, do this for up to 5 times):
 
ceph config set mds mds_cache_trim_threshold xxK (should initially be increased)
ceph config set mds mds_cache_trim_decay_rate x.x (should initially be decreased)
ceph config set mds mds_cache_memory_limit xxxxxxxxxx (should initially be increased)
ceph config set mds mds_recall_max_caps xxxx (should initially be increased)
ceph config set mds mds_recall_max_decay_rate x.xx (should initially be decreased)

 
Also see the Additional Information Section.

Cause

The "MDS behind on trimming" warning indicates that at the current setting the MDS daemon can not trim its cache quickly enough. This is throttled in order to prevent the MDS spending too much time on cache trimming. However under some cache heavy workloads the default settings can be too conservative.

Additional Information

The following command can be used to obtain the current / default values before adjusting the settings:
 
ceph config show-with-defaults mds.<ins_mds> | egrep "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"

Note that the adjusted settings when set as per the resolution section are not permanent and will revert back to default once a MDS is restarted. Specifically regarding the "mds_cache_memory_limit", this is dependent on the total amount of memory available on the server. If feasible, double the current setting.

If the "MDS behind on trimming" warnings are fixed by the customized settings and no adverse effects can be observed (concerns would be high CPU load of the MDS and a slowdown in metadata operations on the client side), consider setting the adjusted mds_cache_trim.* settings permanently.

Also see TID 000019591: When running "du" command on a cephfs mount, ceph -s reports 1 MDSs report oversized cache.

To get more details on the clients caps usage, the following commands can be useful:
 
ceph tell mds.<ins_mds_server_name> client ls
ceph daemonperf mds.<ins_mds_server_name> (needs to be executed on the MDS host)

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019740
  • Creation Date: 23-Nov-2021
  • Modified Date:24-Nov-2021
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center