SUSE Support

Here When You Need Us

Intel Dualport Gigabit NIC82571EB stops booting of Dell server system R730 on SLES12 SP2ff

This document (7023156) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)
SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP2)


Situation

So far this has been noticed on Dell R730 and Dell R430 servers with
Dual Port Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
Starting SLES 12 SP2 the system stops to boot with:
> [    6.899738] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
> [    6.899739] {1}[Hardware Error]: event severity: fatal
> [    6.899740] {1}[Hardware Error]:  Error 0, type: fatal
> [    6.899741] {1}[Hardware Error]:   section_type: PCIe error
> [    6.899741] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [    6.899742] {1}[Hardware Error]:   version: 1.16
> [    6.899743] {1}[Hardware Error]:   command: 0x0007, status: 0x0010
> [    6.899743] {1}[Hardware Error]:   device_id: 0000:04:00.1
> [    6.899744] {1}[Hardware Error]:   slot: 0
> [    6.899744] {1}[Hardware Error]:   secondary_bus: 0x00
> [    6.899745] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x105e
> [    6.899745] {1}[Hardware Error]:   class_code: 000002
> [    6.899746] Kernel panic - not syncing: Fatal hardware error!

 iDRAC from server shows:
    A fatal error was detected on a component at bus 4 device 0 function 0. 
    A bus fatal error was detected on a component at slot 5.
    A runtime critical stop occurred.

SLES12 versions prior to SP2 report this condition but do not stop the boot process.

Resolution

Add bootparameter "ghes_disable=1"

iDRAC still reports a PCI error, but system and NIC appear to work regardless.

Cause

NIC and Dell Server firmware encounter issues during adapter initialization..
Firmware updates available at publishing of this document on NIC (version 1.5.85) and Dell server (BIOS 2.7.1) do not resolve the issue.
Since NIC is EOLed for these systems a real fix is not expected, the only known workaround is to disable ghes error reporting.
No negative sideeffects have been noted. However, customers should perform their own testing and be aware that this  will result in supression of errors on other adapters in system as well.

Additional Information

Only the Dual-port NIC is causing these issues, the quad-port NIC from this Intel series does not stop booting the system.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023156
  • Creation Date: 04-Jul-2018
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.