Low Disk Performance with high IO stalls system

This document (7023297) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11

Situation

Some applications, for example a Database, might do a lot of I/O and it might seem from an administrator point of view, that this I/O stalls the system. Also there might be a lot more kworker processes. This might be aggravated by using Softraid, maybe as host based mirror, on top of the devices. This stall can get to the point where it triggers failures in a cluster in the form of failed monitor operations due to system stress.

This could be an indication that there is

    barrier
   
set on the Filesystem even so the underlying Device does not have a write cache.

Resolution

What happens is that the filesystem with barriers enabled issues flush requests to all intermediate layers only to get discarded by the SCSI Disks and this slows down the system leading to the observed performance issue.

To alleviate this issue it is recommended to mount the relevant filesystems with

    nobarrier
   
as mount option.

Extreme care should be taken to ensure that the device connected to this Filesystem really has a volatile cache or not.

If it does then setting

    nobarrier
   
can result in data loss. Please never set nobarrier on a Filesystem on a device with cache enabled!

To identify whether the device has a cache or not, one can check

   dmesg
   
and check for "cache" like

       dmesg | grep cache
      
and the result might look like

[    3.685928] sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    5.140281] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

as can be seen the device identified as

    sda
   
reports the Write cache as disabled, so a Filesystem associated with this device can use

    nobarrier
   
the opposite in this example is the device identified as

    sdb
   
this device reports the Write cache enabled, so no Filesystem associated with this device should
have the barriers removed to prevent data loss.

This can also be checked in the running system with the tool

    sdparm
   
On the same system as in the example above the output of sdparm reads

belphegore:~ # sdparm --get=WCE=1 /dev/sda
    /dev/sda: DELL      PERC H730 Mini    4.27
WCE         0

which means Write Cache disabled for sda, nobarrier possible

belphegore:~ # sdparm --get=WCE=1 /dev/sdb
    /dev/sdb: IFT       DS 1000 Series    555Q
WCE         1

which means Write Cache enabled for sdb, barrier necessary

Cause

The Kernel cannot determine the best setting on the device itself at the moment

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023297
  • Creation Date: 23-Aug-2018
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center