SUSE Support

Here When You Need Us

Failing paths to SAN disks connected via qla2xxx module

This document (000021056) is provided subject to the disclaimer at the end of this document.

Environment

Affected kernel versions with qla2xxx module loaded.

SLES 12SP5:
  • 4.12.14-122.150
  • 4.12.14-122.153
SLES 15SP4:
  • 5.14.21-150400.24.55

Situation

Physical servers with attached SAN fibre channel LUNs may struggle after some time with failing paths without a visible reason. In the worst scenario it can lead to loss of all paths to the disks because the qla2xxx driver doesn't synchronize IOCB in order. As a side effect there might be observed CPU saturation by kworker processes and recently failing paths to all disks depending on I/O load.
 

Examples of symptoms:

- tur checker timeouts and failing paths:
 
 2023-03-23T16:00:34.585286+00:00 slesnode multipathd[3993]: DATA-LUN-01: sdu - tur checker timed out
 2023-03-23T16:00:34.585567+00:00 slesnode multipathd[3993]: checker failed path 65:64 in map DATA-LUN-01
 2023-03-23T16:00:34.585700+00:00 slesnode multipathd[3993]: DATA-LUN-01: remaining active paths: 1
 2023-03-23T16:00:34.585837+00:00 slesnode multipathd[3993]: sdu: mark as failed
 2023-03-23T16:00:34.588549+00:00 slesnode kernel: [359543.159096] device-mapper: multipath: Failing path 65:64.

- udev messages about stuck workers:
 
 2023-03-23T16:01:35.242335+00:00 slesnode systemd-udevd[1082]: seq 14453 '/devices/virtual/block/dm-32' is taking a long time
 2023-03-23T16:03:35.242320+00:00 slesnode systemd-udevd[1082]: seq 14453 '/devices/virtual/block/dm-32' killed
 2023-03-23T16:03:35.242640+00:00 slesnode systemd-udevd[1082]: Worker [49397] terminated by signal 9 (Killed)
 2023-03-23T16:03:35.242802+00:00 slesnode systemd-udevd[1082]: Worker [49397] failed while handling '/devices/virtual/block/dm-32'
 2023-03-23T16:03:35.244528+00:00 slesnode kernel: [359723.817217] print_req_error: I/O error, dev dm-32, sector 267642024


 

Resolution


Temporary solution with immediate fix (no need to reboot) as workaround:

echo 0 > /sys/module/qla2xxx/parameters/ql2xenforce_iocb_limit 
NOTE: This settings is not reboot persistent.



 

Permanent fix delivered with kernel patch:

Update the kernel to version 4.12.14-122.156 or higher for SLES12SP5 and 5.14.21-150400.24.60 or higher for SLES15SP4.
 

Cause

  • Failing paths to disks (LUNs).
  • Kworker processes are exhausting CPU time.
  • I/O errors to disks.

Additional Information

https://lists.suse.com/pipermail/sle-updates/2023-April/028842.html
https://lists.suse.com/pipermail/sle-security-updates/2023-April/014437.html

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021056
  • Creation Date: 27-Apr-2023
  • Modified Date:15-May-2023
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.