How to speed up or slow down osd recovery

This document (000019693) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5
SUSE Enterprise Storage 6
 

Situation

When OSDs (Object Storage Daemons) are stopped or removed from the cluster or when new OSDs are added to a cluster, it may be needed to adjust the OSD recovery settings.

Also see:
https://docs.ceph.com/docs/master/dev/osd_internals/backfill_reservation/
 
The values can be increased if it is needed for a cluster to recover quicker as these help OSDs to perform recovery faster.
 
  • osd max backfills: This is the maximum number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery, which might impact overall cluster performance until recovery finishes. 
  • osd recovery max active: This is the maximum number of active recover requests. The higher the number, the quicker the recovery, which might impact the overall cluster performance until recovery finishes. 
  • osd recovery op priority: This is the priority set for recovery operation. The lower the number, the higher the recovery priority. Higher recovery priority might cause performance degradation until recovery completes. 

As mentioned above, again note that changing these values can impact the overall performance of the cluster and clients may see lower performance.  

The default values for the above settings are:
 
ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills"
osd_max_backfills = 1
osd_recovery_max_active = 3
osd_recovery_op_priority = 3

Resolution

The following command should be sufficient to speed up backfilling/recovery.  On the Admin node run:
 
ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6
or 
ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9

NOTE: The above commands will return something like the below message, this can be confusing but in the case of these specific settings can be ignored as they are actually saved and in use:
 
osd.0: osd_max_backfills = '2' osd_recovery_max_active = '6' (not observed, change may require restart)

IMPORTANT: Setting the values to high can cause OSDs to restart, causing the cluster to become unstable, also see the "Additional Information" section below.

Cause

Cluster is in a recovery state or due to upcoming planned changes will be in a recovery state and it is needed to speed up or slow down recovery.

Status

Top Issue

Additional Information

To view the current active setting(s), on the node where the the OSD being checked is running execute for example:
 
ceph daemon osd.<insert_id> config get osd_max_backfills

To set back to default:
ceph tell 'osd.*' injectargs --osd-max-backfills=1 --osd-recovery-max-active=3

With SES 6, "ceph config set" can alternatively be used:
 
ceph config set osd osd_max_backfills 2
ceph config set osd osd_recovery_max_active 3

To set back to default:
 
ceph config rm osd osd_recovery_max_active
ceph config rm osd osd_max_backfills

To view the current settings:
 
ceph config show osd.<insert_id>
 
Recovery can be monitored with "ceph -s".

After increasing the settings, should any OSDs become unstable (restarting) or clients are negatively impacted by the additional recovery overhead then reduce the values or set them back to the defaults.

Once the cluster is finished with recovery and back in a HEALTH_OK state, set the values back to default. 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019693
  • Creation Date: 06-Apr-2022
  • Modified Date:06-Apr-2022
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center