System crash or unexpected reboot - What information is needed by Customer Support for a root cause analysis?

This document (7010249) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11
SUSE Linux Enterprise Desktop 12
SUSE Linux Enterprise Desktop 11
SUSE Linux Enterprise Desktop 10
SUSE Linux Enterprise High Availability Extension
SUSE Linux Enterprise Real Time Extension
SLES Expanded Support Platform

Situation

The system encountered a crash or rebooted unexpectedly. In order to identify the root cause for this issue, a request with Customer Care is about to get opened. This article is intended as help to answer important questions that arise with each system crash. Providing as much details as possible and system information will contribute to identify the cause.

Resolution

When opening a request about a system crash, please provide answers to the following questions:
  1. When did the crash occur?
    Please provide the exact time and date.
     
  2. What is the system main task?
     
  3. Was this a one time crash or did the system encounter this issue several times?
    In case the system crashed several times please provide all known occurrences.
     
  4. At the time the system crashed, were any particular log entries noticed?
     
  5. In case no entries can be found in /var/log/messages, were any entries written to the logs of the hardware management board?
     
  6. What was the situation on the system before it crashed?
    Please report any observation like an increase e.g. in CPU/RAM usage or high I/O wait.
     

What kind of system data is needed by Customer Care?

SUSE Customer Support uses for troubleshooting a tool called supportutils. In order to create a system report, please run as root

supportconfig -l

This will collect all relevant system data (even older, already rotated messages files) and create a compressed file in /var/log with the following file name:

nts_$HOSTNAME_$DATE_$TIME.tbz

Please always run the most recent version of supportutils for better results and append this file to the service request. For details how to upload it refer to TID 000019214 Supportconfig Self Service via SCC/FTP

For SUSE Expanded Support based systems please provide a sosreport.

In case the crash happens in a clustered environment please provide a system report for all involved nodes.
 

Steps to trace system reboots

In certain situations it is possible that no crash messages can be found in /var/log/messages (especially in case of situations where the system management board reset the hardware). Please also check if /var/log/mcelog contains any reports. If this is the case a hardware check should be started in the first place and all hardware components should get patched to the most recent BIOS / firmware level. If the system crashed more often without leaving evidence connect a second system via a serial connection as outlined in TID 000016233 Configuring a Remote Serial Console for SLES .

Kernel Core Dump capture

If a system crashes, the possibility of capturing a kernel core dump is given using kdump. Its configuration is explained in TID 000016171 Configure crashkernel memory for kernel core dump analysis. A best practices document about providing kernel core dumps to Customer Care is available at TID 000017820 Best practice for providing kernel core dumps to support incidents

For SLES Expanded Support based system please consult the corresponding online documentation for RHEL5 or RHEL6 on configuring kdump.

Please note: kernel core dumps must have been written completely to the dump device. To ensure this is the case, set KDUMP_IMMEDIATE_REBOOT to "yes" in /etc/sysconfig/kdump and wait for the system to reboot itself. Note that cores can be very large, so this may take a while. Forcing a reboot manually could interrupt the writing and result in an incomplete core. If the dump is incomplete for whatever reason an analysis will not be possible.


Additional information

sysstat is a tool which collects system data (e.g. CPU, RAM, i/o usage) in regular intervals. Its output is also a valuable source of information when it comes to troubleshooting crash situations. Please consider to install the package sysstat and enable its service by using
 
chkconfig boot.sysstat on
/etc/init.d/boot.sysstat start

If this service is activated before the system crashes, supportconfig and sosreport will include its output into the system report.
 

Additional Information

General troubleshooting recommendations:
 
  1. If the system crashes without log entries contact the hardware vendor
  2. Apply available BIOS / firmware related updates
  3. Check timestamp of /var/log/mcelog. If entries are found in this log file, please contact the hardware vendor for a hardware check
  4. Apply available online updates
  5. Configure serial console
  6. Configure kdump
  7. Open a service request
  8. Provide the core dump upon request

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7010249
  • Creation Date: 05-Mar-2012
  • Modified Date:23-Feb-2021
    • SUSE Linux Enterprise Desktop
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Real Time Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center