System crash during a fibre channel LIP reset

This document (000019913) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15 SP2

Situation

During a fibre channel LIP reset a SLES 15 SP2 system may encounter a system crash.
A stack trace in the kernel log may look similar to:
[ 7037.935074] BUG: unable to handle page fault for address: 0000000100000023
[ 7037.935104] #PF: supervisor read access in kernel mode
[ 7037.935119] #PF: error_code(0x0000) - not-present page
[ 7037.935134] PGD 0 P4D 0  
[ 7037.935147] Oops: 0000 [#1] SMP NOPTI
[ 7037.935161] CPU: 11 PID: 6564 Comm: kworker/u40:2 Not tainted 5.3.18-24.52-default #1 SLE15-SP2
[ 7037.935184] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.0.4m.0.0604200735 06/04/2020
[ 7037.935219] Workqueue: fnic_event_wq fnic_handle_frame [fnic]
[ 7037.935245] RIP: e030:fnic_exch_mgr_reset+0x140/0x650 [fnic]
[ 7037.935265] Code: c0 48 89 c5 0f 84 17 02 00 00 48 05 20 01 00 00 0f 84 0b 02 00 00 4c 8b bd 58 02 00 00 4d 85 ff 0f 84 fb 01 00 00 8b 4c 24 18 <41> 3b 4f 24 0f 85 ed 01 00 00 8b 95 80 02 00 00 80 e6 10 74 0f 8b
[ 7037.935309] RSP: e02b:ffffc90040d87d38 EFLAGS: 00010006
[ 7037.935325] RAX: ffff888259e14320 RBX: ffff88825d4c87c8 RCX: 0000000000a60b03
[ 7037.935345] RDX: ffff88825d5ad380 RSI: 0000000000000055 RDI: ffff88825d5acd80
[ 7037.935365] RBP: ffff888259e14200 R08: 000000000000000b R09: 8080808080808080
[ 7037.935385] R10: 0000000000000010 R11: fefefefefefefeff R12: 0000000000000055
[ 7037.935405] R13: ffff88825d4c925c R14: 0000000000000215 R15: 00000000ffffffff
[ 7037.935448] FS:  0000000000000000(0000) GS:ffff88826d4c0000(0000) knlGS:0000000000000000
[ 7037.935469] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7037.935486] CR2: 0000000100000023 CR3: 000000000240a000 CR4: 0000000000040660
[ 7037.935519] Call Trace:
[ 7037.935541]  ? __switch_to_asm+0x40/0x70
[ 7037.935560]  ? lock_timer_base+0x67/0x80
[ 7037.935585]  fc_rport_logoff+0x49/0xd0 [libfc]
[ 7037.935609]  fc_disc_gpn_id_resp+0x1b0/0x220 [libfc]
[ 7037.935630]  ? fc_disc_gpn_ft_parse+0x2a0/0x2a0 [libfc]
[ 7037.935651]  fc_invoke_resp+0x6a/0xc0 [libfc]
[ 7037.935674]  fc_exch_recv+0x4eb/0x590 [libfc]
[ 7037.935694]  fnic_handle_frame+0x5b/0xd0 [fnic]
[ 7037.935713]  ? __schedule+0x305/0x750
[ 7037.935731]  process_one_work+0x1f4/0x3e0
[ 7037.935747]  worker_thread+0x2d/0x3e0
[ 7037.935762]  ? process_one_work+0x3e0/0x3e0
[ 7037.935776]  kthread+0x10d/0x130
[ 7037.935790]  ? kthread_park+0xa0/0xa0
[ 7037.935803]  ret_from_fork+0x35/0x40
[ 7037.935817] Modules linked in: af_packet 8021q garp mrp nf_log_ipv4 nf_log_common xt_tcpudp xt_LOG ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle bridge stp llc iptable_raw bonding iptable_security iscsi_ibft iscsi_boot_sysfs ebtable_filter ebtables ip6_tables rfkill iptable_filter ip_tables x_tables bpfilter xen_acpi_processor xen_pciback xen_netback xen_blk
back xen_gntalloc xen_gntdev xen_evtchn xfs intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm intel_powerclamp mgag200 crc32_pclmul drm_vram_helper i2c_algo_bit ttm ghash_clmulni_intel ib_core aesni_intel drm_
kms_helper aes_x86_64 crypto_simd cryptd pcspkr ipmi_ssif glue_helper drm syscopyarea mei_me sysfillrect lpc_ich sysimgblt enic joydev fb_sys_fops mei mfd_core ioatdma dca ipmi_si ipmi_devintf ipmi_msghandler button xenfs xen_privc
md btrfs libcrc32c xor hid_generic usbhid raid6_pq sd_mod crc32c_intel fnic libfcoe
[ 7037.935882]  xhci_pci libfc xhci_hcd scsi_transport_fc ahci libahci usbcore megaraid_sas libata wmi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod
[ 7037.936122] Supported: Yes
[ 7037.936132] CR2: 0000000100000023
[ 7037.936157] ---[ end trace f01cd85bd933972b ]---
[ 7037.946552] RIP: e030:fnic_exch_mgr_reset+0x140/0x650 [fnic]

Resolution

Engineering has identified the fix but hasn't provided the fix in the latest maintenance kernel yet.
If you are seeing such crashes please open a Case with a request for a Program Temporary Fix (PTF).

Status

Reported to Engineering

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019913
  • Creation Date: 16-Mar-2021
  • Modified Date:16-Mar-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center