My Favorites

Close

Please to see your favorites.

  • Bookmark
  • Email Document
  • Printer Friendly
  • Favorite
  • Rating:

mcelog not working with AMD processor family 16 and above on SLES11 SP3

This document (7013006) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 Service Pack 3

Situation

On SLES mcelog is used to track hardware errors.
AMD processor families 16 and newer do not support mcelog.
On SLES11 SP3 running mcelog with newer AMD CPUs leaves /var/log/mcelog empty, even in case of hardware errors.

Resolution

For these CPUs the kernel module edac_mce_amd is to be loaded instead of using mcelog.
Yet loading the module still is triggered by the mcelog startscript:

linux:~ # /etc/init.d/mcelog start
Starting mcelog... AMD CPU detected, loading edac_mce_amd              done

Update: 
A patch has been released to fix this issue on October 24, 2013.  Update the system to apply the patch.  The specific patch that addresses this issue can be found at https://download.novell.com/Download?buildid=Hp-QDHVE-oM~ 

Cause

AMD processor families 16 and newer do not support mcelog.
Intel processors are not affected and still can be monitored with mcelog.

Additional Information

If a hardware error is found, edac_mce_amd does not log into /var/log/mcelog but into /var/log/messages instead.

The keyword for a failure is: [Hardware Error]

Example of a failure:

[32683.598837] [Hardware Error]: MC4 Error (node 0): DRAM ECC error detected on the NB.
[32683.615771] [Hardware Error]: Error Status: Corrected error, no action required.
[32683.615780] [Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c4a400053080a13
[32683.615783] [Hardware Error]: MC4_ADDR: 0x000000032c4aa220
[32683.615790] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)


Note: The mcelog version on the SP3 DVD (mcelog-1.0.2013.01.18-0.11.9) contains a cosmetical bug and prints an error even on processor family 15 which still works with mcelog. A fixed package is available but as of early August 2013 not yet released
Make sure  to install the latest mcelog update from the update channels if you spot a higher mcelog version than mentioned above, please.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7013006
  • Creation Date:08-AUG-13
  • Modified Date:28-MAR-14
    • SUSESUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback

< Back to Support Search

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center