When running "du" command on a cephfs mount, ceph -s reports 1 MDSs report oversized cache.

This document (000019591) is provided subject to the disclaimer at the end of this document.

Environment

SES6

Situation

Customer is reporting "MDSs report oversized cache/clients failing to respond to cache pressure", when executing the du command on a 1.8Tb directory, the alarms appear again. The du command takes approximately 10 minutes to execute and the alarm remains active for 75 minutes.
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 -s
  cluster:
    id:     30eacb3f-6207-4c08-bd83-7d3f0e5bb97e
    health: HEALTH_WARN
            1 MDSs report oversized cache
            1 clients failing to respond to cache pressure
 
  services:
    mon: 3 daemons, quorum mon01,mon02,mon03 (age 3h)
    mgr: mon03(active, since 4d)
    mds: cephfs:1 {0=mds=up:active} 1 up:standby
    osd: 48 osds: 48 up (since 5h), 48 in (since 9w)
    rgw: 2 daemons active (rgw01, rgw02)
 
  data:
    pools:   10 pools, 2544 pgs
    objects: 37.21M objects, 55 TiB
    usage:   112 TiB used, 150 TiB / 262 TiB avail
    pgs:     2541 active+clean
             3    active+clean+scrubbing+deep
 
  io:

 

Resolution

Option A:
Increase "mds_cache_memory_limit = 8589934592" .  8GB is a good base line assuming the MDS node has sufficient RAM.  Can also be increased above 8GB if needed..

Option B:
Allow "getfattr" to get the information without the overhead that is required by "du".
# getfattr -d -m ceph.dir.* /mnt/cephfs
getfattr: Removing leading '/' from absolute path names
# file: mnt/cephfs
ceph.dir.entries="4"
ceph.dir.files="0"
ceph.dir.rbytes="522096128"
ceph.dir.rctime="1584374176.593833820"
ceph.dir.rentries="1004"
ceph.dir.rfiles="999"
ceph.dir.rsubdirs="5"
ceph.dir.subdirs="4"

Cause

Depending on the number of files and directories that are du'ed, the client process needs to acquire capabilities for a lot of inodes (4732300 in this case). When this makes the cache grow beyond its target, the MDS will try to recall caps. However the client can't give those up until du finishes (for consistent results).

ls -R will likely show similar behavior.

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019591
  • Creation Date: 18-Mar-2020
  • Modified Date:18-Mar-2020
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center