SES Cluster Pools got marked read only, osds are near full.

This document (000019724) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5
SUSE Enterprise Storage 6

Situation

Customer allowe osd(s) to fill up and mark pools/cluster "ReadOnly" or osd's are near full.  

"ceph -s" will report number of osd's full and number of pools affected: 
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 -s
  cluster:
    id:     8007d21c-6c85-3f03-85df-f56fc7cf85eb
    health: HEALTH_ERR
            1 full osd(s)
            20 pool(s) full
 
  services:
    mon: 3 daemons, quorum mon-03,mon-02,mon-01
    mgr: mon-03(active), standbys: mon-02, mon-01
    mds: cephfs-1/1/1 up  {0=mon-03=up:active}
    osd: 62 osds: 62 up, 62 in
    rgw: 1 daemon active
 
  data:
    pools:   20 pools, 2456 pgs
    objects: 25.07M objects, 93.0TiB
    usage:   168TiB used, 167TiB / 335TiB avail
    pgs:     2451 active+clean
             5    active+clean+scrubbing+deep

#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 health detail
HEALTH_ERR 1 full osd(s); 20 pool(s) full; clock skew detected on mon.mon-02, mon.mon-01
osd.52 is full
pool 'cephfs_data' is full (no space)
pool 'cephfs_metadata' is full (no space)
pool '.rgw.root' is full (no space)
pool 'default.rgw.control' is full (no space)
pool 'default.rgw.meta' is full (no space)
pool 'default.rgw.log' is full (no space)
pool 'default.rgw.buckets.index' is full (no space)
--[cut here]--
pool 'default.rgw.buckets.data' is full (no space)
pool 'default.rgw.buckets.non-ec' is full (no space)
--[cut here]--

"ceph report" will also provide similar ouput:
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 report
report 4224157997
{
    "cluster_fingerprint": "1bf16fc2-bff3-4a0d-b2ff-28fd11d349d8",
    "version": "12.2.12-594-g02236657ca",
    "commit": "02236657ca915367985ddf280fed3699124fa76d",
    "timestamp": "2020-09-21 14:41:27.871676",
    "tag": "",
    "health": {
        "checks": {
            "OSD_FULL": {
                "severity": "HEALTH_ERR",
                "summary": {
                    "message": "1 full osd(s)"
                },
                "detail": [
                    {
                        "message": "osd.52 is full"
                    }
                ]
            },

"ceph osd df tree" will provide detail about osd(s) and fullcapacity:
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 osd df tree
ID CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE NAME                    
-1       335.08347        -  335TiB  168TiB  167TiB 50.25 1.00   - root default
---[cut here]--- 
52   ssd   0.72769  1.00000  745GiB  708GiB 37.2GiB 95.00 1.89  24         osd.52               
57   ssd   0.72769  1.00000  745GiB  436GiB  309GiB 58.54 1.17  19         osd.57  

Note:osd. 52 is 95% full

Resolution

"osd's" should never be full in theory and administrators should monitor how full osd's are with "ceph osd df tree", if osds are approaching 80% full, it’s time for the administrator to take action to prevent osd from filling up.  Action could include reweighting the osd's in question and or adding more osd's to the cluster.

Ceph as several parameters to help notify the administrator when osd's are filling up:

    # ceph osd dump | grep ratio
    full_ratio 0.95
    backfillfull_ratio 0.9
    nearfull_ratio 0.85

By default, when osd's reach 85% capacity, nearfull_ratio warning is triggered.
By default when osd's reach 90% capacity, backfillfull_ratio warning is triggered.  At this point the cluster will deny backfilling to the osd in question.
By default when osd's reach 95% capacity, full_ratio is triggered, all pg's on osd in question will be marked Read Only, as well as all pools which are associated with the pg's on the osd.  The cluster is marked Read Only, to prevent corruption from occurring.    

It is good practice to set "noout" flag when in this situation to avoid rebalancing if osd's go down.
How to set/unset noout:

    ceph osd set noout
    ceph osd unset noout

To get the cluster out of this state, data needs to be pushed away or removed from the osd(s) in question.  I this example it is a single osd in question (osd.52), but there could be many osd's that are marked full.  To push data away from the osd, run:

    ceph osd reweight $osdID $Weight

Where $osdID is the osd number "52 and $Weigh is a value below 1:

    ceph osd reweight 52 .85

The first objective is to get the osd's that are full below 95% capacity, so the osd(s) /pool(s)/cluster is not marked Read Only.  It possible to achieve this goal with a $Weight of .95 in some cases, and in other case, it may require a lower value, .90, .85, .80, etc...  

The second objective is to get the osd(s) in question below 90%, then below 85% of capacity.  This is achieved with by continuing to monitor the cluster and reweighting down the osd's in question.  This process can take a few hours to complete.  

Keep in mind that when data is being pushed away or removed from an osd, that data is being placed on another osd in the cluster.  This action could cause other osd's to fill up as well.  The administrator will need to monitor ALL osds with "ceph osd df tree" to ensure that proper action is taken. 
In some cases it will be necessary to change the following settings temporarily: 

    ceph osd set-nearfull-ratio <ratio>
    ceph osd set-backfillfull-ratio <ratio>
    ceph osd set-full-ratio <ratio>

"ceph osd set-full-ratio .96" will change the "full_ratio to 96% and remove the Read Only flag on on osd(s) which are 95% -96%.  If osd(s) are 96% full ist possible to set "ceph osd set-full-ratio .97". However, do not set this value to high.

"ceph osd set-backfillfull-ratio 91" will change the "backfillfull_ratio to 91% and allowing backfill to occure on osd's which are 90-91% full.  This setting is helpful when there are multiple osd's which are full. 

In some cases, it will appear that the cluster is trying to add data to the osd's before the cluster will start pushing data away from the osd(s) in question.  

Once osd(s) are below 95% capacity or below the 90% capacity, set the setting back to default.

    ceph osd set-nearfull-ratio .85
    ceph osd set-backfillfull-ratio .90
    ceph osd set-full-ratio .95

This will ensure that there is breathing room the next time an osd(s) gets marked full.  

If the Administrator is confident the issue addressed and it is safe to reweight osd(s) backup, the Administrator can do so in the same way: 

    ceph osd reweight $osdID $Weight

Where $osdID is the osd number "52 and $Weigh is a value up to 1:

    ceph osd reweight 52 .90
    or
    ceph osd reweight 52 .95
    or
    ceph osd reweight 52 1

Monitor with:
    ceph -s
    ceph health detail
    ceph osd df tree.  

If the osd(s) is filling up again, reweight the osd back down again. 

Note: osd.52 is now 81% full and has a reweight value of "0.84999"
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 osd df tree
ID CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE NAME                    
-1       335.08347        -  335TiB  168TiB  167TiB 50.25 1.00   - root default 
52   ssd   0.72769  0.84999  745GiB  606GiB  140GiB 81.27 1.41  18         osd.52               
57   ssd   0.72769  1.00000  745GiB  436GiB  310GiB 58.46 1.01  19         osd.57  
 

When the administrator is confident issue is resolved, remove the noout flag:

    ceph osd unset noout

Cause

osd(s) filled up to 95%. 
Other contributors:
  • Cluster was filled up beyond the failure domain capacity, then a failure happened filling up the remaining osd's. 
  • Cluster is configured with osd of various sizes.  It is recommended that all osd be of the same size for even distribution.
  • If the cluster is 70% full, its time to add more osd's to the cluster or remove un-wanted data. 

 

Status

Top Issue

Additional Information

https://docs.ceph.com/en/latest/rados/operations/health-checks/#osd-full
https://docs.ceph.com/en/latest/rados/operations/health-checks/#osd-backfillfull
https://docs.ceph.com/en/latest/rados/operations/health-checks/#osd-nearfull
https://docs.ceph.com/en/latest/rados/operations/health-checks/#osd-out-of-order-full

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019724
  • Creation Date: 24-Sep-2020
  • Modified Date:24-Sep-2020
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center