Updating to kernel 4.4.175-94.79 leads to a panic on HPE Superdome Flex or MC990x/UV300 systems

This document (7023772) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)

Situation

Note that this happens on HPE Superdome flex or MC990x/UV300 systems but can happen on any system with a UV BIOS!

Updating to SLES12 SP3 kernel 4.4.175-94.79.1 leads to a panic with the following stack trace.

[    0.000000] Linux version 4.4.175-94.79-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Feb 21 16:03:55 UTC 2019 (047a6d3)
[...]
[    1.391265] UV: N:3 M:0 m_shift:0 n_lshift:0
[    1.396033] UV: gpa_mask/shift:0x3fffffffffff/46 pnode_mask:0x7 apic_pns:6
[    1.403710] UV: mmr_base/shift:0xfffa0000000/26 gru_base/shift:0xffee0000000/26
[    1.411871] UV: gnode_upper:0x0 gnode_extra:0x0
[    1.416928] UV: Found 8 hubs, 8 nodes, 448 CPUs
[    1.421981] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    1.430744] IP: [<ffffffff8161e7ee>] __down_common+0x4c/0xf9
[    1.437082] PGD 0
[    1.439344] Oops: 0002 [#1] SMP
[    1.442979] Modules linked in:
[    1.446401] Supported: Yes
[    1.449423] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.175-94.79-default #1
[    1.457486] Hardware name: HPE Superdome Flex/Superdome Flex, BIOS Bundle:3.0.444-20190306-072130-jenkins SFW:IP147.007.000.168.000.1903012301 03/01/2019
[    1.472813] task: ffff8801d154c040 ti: ffff8801d1550000 task.ti: ffff8801d1550000
[    1.481167] RIP: 0010:[<ffffffff8161e7ee>]  [<ffffffff8161e7ee>] __down_common+0x4c/0xf9
[    1.490214] RSP: 0000:ffff8801d1553c48  EFLAGS: 00010096
[    1.496144] RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: ffffffff82416d9c
[    1.504110] RDX: ffff8801d1553c50 RSI: 0000000000000001 RDI: ffffffff82416d94
[    1.512077] RBP: ffff8801d1553c98 R08: 0000000000000000 R09: 0000000000000000
[    1.520042] R10: 0000000000000000 R11: 00000000000003c7 R12: ffffffff82416d94
[    1.528010] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8801d154c040
[    1.535975] FS:  0000000000000000(0000) GS:ffff88b787400000(0000) knlGS:0000000000000000
[    1.545008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.551422] CR2: 0000000000000000 CR3: 0000000001e0c000 CR4: 0000000000760670
[    1.559389] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.567355] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.575321] PKRU: 00000000
[    1.578334] Stack:
[    1.580579]  0000000001000000 ffffffff82416d9c 0000000000000000 0000000000000000
[    1.588892]  0000000000000000 ffffffff82416d94 ffff8801d1553d18 ffff8801d1553d20
[    1.597190]  0000000000000000 fffffffffffffffc 0000000000000000 ffffffff810ccae2
[    1.605495] Call Trace:
[    1.608227]  [<ffffffff810ccae2>] down_interruptible+0x42/0x50
[    1.614744]  [<ffffffff81079849>] uv_bios_call_irqsave+0x39/0xd0
[    1.621452]  [<ffffffff8107991b>] uv_bios_get_sn_info+0x3b/0xa0
[    1.628062]  [<ffffffff81fa98a3>] uv_system_init_hub+0x8e6/0x142d
[    1.634870]  [<ffffffff81fa38bc>] native_smp_prepare_cpus+0x2af/0x309
[    1.642068]  [<ffffffff81f8f195>] kernel_init_freeable+0xc3/0x226
[    1.648875]  [<ffffffff8161414a>] kernel_init+0xa/0xe0
[    1.654615]  [<ffffffff816210b5>] ret_from_fork+0x55/0x80
[    1.662876] DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
[    1.669195]
[    1.670850] Leftover inexact backtrace:
[    1.670850]
[    1.676783]  [<ffffffff81614140>] ? rest_init+0x80/0x80
[    1.682617] Code: e6 81 00 00 00 49 89 fc 53 48 89 d3 48 83 e4 f0 48 83 ec 28 48 8b 47 10 48 8d 54 24 08 48 89 4c 24 08 48 89 44 24 10 48 89 57 10 <48> 89 10 4c 89 e8 4c 89 7c 24 18 83 e0 01 c6 44 24 20 00 48 89
[    1.704505] RIP  [<ffffffff8161e7ee>] __down_common+0x4c/0xf9
[    1.710939]  RSP <ffff8801d1553c48>
[    1.714831] CR2: 0000000000000000
[    1.718536] ---[ end trace 635cd5391d48baa3 ]---
[    1.723698] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    1.723698]
[    1.733888] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Resolution

A kernel has been released in to maintenance that contains a fix for this issue:-

        29 Mar 2019 - the Linux Kernel 801  -  version 4.4.176-94.88.1


Cause


Additional Information


Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023772
  • Creation Date: 18-Mar-2019
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center