Kernel soft lockup with blk_mq_update in traces

This document (000020248) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 15

 

Situation

Sporadic there are Reports for Kernel Soft Lockups like

   kernel: [1726320.308008] NMI watchdog: BUG: soft lockup - CPU#104 stuck for 23s!

and followed by

   kernel: [1726320.336383]  ? blk_mq_update_queue_map+0x20/0x20

in the logs.

Resolution

Prior to Kernel 5.0 the blk-mq code for collecting disk I/O stats that is sometimes not working very well with NUMA.
These stats are the ones shown in

   /proc/diskstats

In the Azure Enviroment the Hyper-V storvsc driver in Linux can set the

   can_queue

parameter value too high, which can result in allocating too many "tags" when operating with blk-mq enabled.

There are several workarounds possible:

1. The most obvious would be to move to a Linux kernel version 5.0 or later. But this is the most problematic in a Production Enviroment as it depends on whether a Kernel 5.0 or later is available.

2. Disable blk-mq and use the older block subsystem in the Linux kernel.

This workaround could negate some I/O performance gains of the parallelism that the blk-mq subsystem provides.  

To apply this workaround the

  scsi_mod.use_blk_mq=y

is removed  from the kernel boot line, and a reboot is required.
 
3. Disable disk I/O stats.  

This can be done on-the-fly on a per-disk basis on a running system by

   echo "0" /sys/block/<device>/queue/iostats

This has to be done for each disk device in the System and is not persistent over a reboot.

4. Reduce the "can_queue" value.

This value cannot be set directly. The desired effect can be achieved by adding the kernel boot line options

   hv_storvsc.storvsc_ringbuffer_size=131072  hv_storvsc.storvsc_vcpus_per_sub_channel=1024

a reboot is required during which these values will then be passed to the hv_storvsc.

Option 4 only works on Azure. Option 3 is not persistent. Option 1 might not be possible.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020248
  • Creation Date: 09-Jun-2021
  • Modified Date:09-Jun-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center