Performance Degradation Observed After SUSE Enterprise Storage 6 Patch Cycle

This document (000019829) is provided subject to the disclaimer at the end of this document.


SUSE Enterprise Storage 6


Performance issues arise after patching the SUSE Enterprise Storage 6 environment.


Change the 'bluefs_buffered_io' setting back to 'true'.

This can be accomplished via these methods:

Without restarting OSDs (temporary setting until OSD is restarted):
On the OSD node:   # ceph deamon osd.<id> config set bluefs_buffered_io true

From the monitor:    # ceph tell osd.<id> injectargs '--bluefs_buffered_io=true'    #  single osd
                       e.g. # ceph tell osd.56 injectargs '--bluefs_buffered_io=true'

                             # ceph tell osd.* injectargs '--bluefs_buffered_io=true'    #  all osds in cluster

Permanent setting for entire cluster (and without running the more invasive 'stage' commands):
On salt master, create/edit: /srv/salt/ceph/configuration/files/ceph.conf.d/osd.conf

Add the following line to osd.conf:

The following command builds the ceph.conf to be pushed out, incorporating osd.conf:-

        # salt '<salt_master_node_name>' state.apply ceph.configuration.create

e.g.   # salt '' state.apply ceph.configuration.create

mon1:/srv/salt/ceph/configuration/files/ceph.conf.d # salt '' state.apply ceph.configuration.create
  Name: /var/cache/salt/minion/files/base/ceph/configuration - Function: file.absent - Result: Changed Started: - 18:13:46.846499 Duration: 11.657 ms
  Name: /srv/salt/ceph/configuration/cache/ceph.conf - Function: file.managed - Result: Changed Started: - 18:13:46.858342 Duration: 5640.245 ms
  Name: find /var/cache/salt/master/jobs -user root -exec chown salt:salt {} ';' - Function: - Result: Changed Started: - 18:13:52.525985 Duration: 43.848 ms

Summary for
Succeeded: 3 (changed=3)
Failed:    0
Total states run:     3
Total run time:   5.696 s

The next command pushes out the ceph.conf to all nodes in the cluster:

        # salt '*' state.apply ceph.configuration

mon1:/srv/salt/ceph/configuration/files/ceph.conf.d # salt '*' state.apply ceph.configuration
  Name: /etc/ceph/ceph.conf - Function: file.managed - Result: Changed Started: - 18:16:47.268270 Duration: 101.657 ms

Summary for
Succeeded: 1 (changed=1)
Failed:    0
Total states run:     1
Total run time: 101.657 ms

.......    [text removed to shorten example - there should be one 'entry' for each node in the cluster]

Summary for
Succeeded: 1 (changed=1)
Failed:    0
Total states run:     1
Total run time: 131.195 ms
mon1:/srv/salt/ceph/configuration/files/ceph.conf.d #

Check the /etc/ceph/ceph.conf on some of the nodes to make sure the change has been made.

NOTE: If only the 'permanent' change is made, each osd will have to be restarted in order to pick up the parameter change.



Up until the 'Nautilus' release of Ceph, the bluefs_buffered_io setting defaulted to 'false'. However, with the upstream release of the 'Nautilus' version of Ceph, bluefs_buffered_io defaulted to 'true'.

Since SUSE Enterprise Storage 6 is based on the 'Nautilus' release, prior to version of SUSE Enterprise Storage 6, the default setting for 'bluefs_buffered_io' was also 'true' (enabled).

In version and later, the default was changed to 'false' (disabled).

A decision was made that the parameter defaulting to 'true' was essentially a regression in the Nautilus release and that it should again default to 'false'. Having 'bluefs_buffered_io' set to true, has also been linked with performance issues for files over 2GB.



Top Issue

Additional Information

When 'bluefs_buffered_io' is enabled, bluefs will in some cases perform buffered reads.  This allows the kernel page cache to act as a secondary cache for things like RocksDB compaction.  For example, if the rocksdb block cache isn't large enough to hold blocks from the compressed SST files, they can be read from page cache instead of from the disk.  This option previously was enabled by default, however in some test cases it appears to cause excessive swap utilization by the linux kernel and a large negative performance impact after several hours of run time.

The current recommendation is that if you have not seen a problem with bluefs_buffered_io enabled, you should be safe to continue to use it, but kernel swap usage should be regularly monitored for any sign of thrashing.

See also:

Evidence appears to suggest that this issue will only occur where most or all of the OSDs are on slower hardware (spinning drives) as opposed to an environment based mostly on SSD / NVMe.

NOTE: This setting never defaulted to 'true' for SUSE Enterprise Storage 5.x


This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019829
  • Creation Date: 20-Jan-2021
  • Modified Date:20-Jan-2021
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center