System crash or unexpected reboot - What information is needed by Customer Support for a root cause analysis?
This document (7010249) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 11
SUSE Linux Enterprise Desktop 11
SUSE Linux Enterprise Desktop 10
SLES Expanded Support Platform
Situation
Resolution
- When did the crash occur?
Please provide the exact time and date.
- What is the system main task?
- Was this a one time crash or did the system encounter this issue several times?
In case the system crashed several times please provide all known occurrences.
- At the time the system crashed, were any particular log entries noticed?
- In case no entries can be found in /var/log/messages, were any entries written to the logs of the hardware management board?
- What was the situation on the system before it crashed?
Please report any observation like an increase e.g. in CPU/RAM usage or high I/O wait.
What kind of system data is needed by Customer Care?
SUSE Customer Support uses for troubleshooting a tool called supportutils. In order to create a system report, please run as root
supportconfig -l
This will collect all relevant system data (even older, already rotated messages files) and create a compressed file in /var/log with the following file name:
nts_$HOSTNAME_$DATE_$TIME.tbz
Please always run the most recent version of supportutils for better results and append this file to the service request. For details how to upload it refer to TID 000019214 Supportconfig Self Service via SCC/FTP
For SUSE Expanded Support based systems please provide a sosreport.
In case the crash happens in a clustered environment please provide a system report for all involved nodes.
Steps to trace system reboots
In certain situations it is possible that no crash messages can be found in /var/log/messages (especially in case of situations where the system management board reset the hardware). Please also check if /var/log/mcelog contains any reports. If this is the case a hardware check should be started in the first place and all hardware components should get patched to the most recent BIOS / firmware level. If the system crashed more often without leaving evidence connect a second system via a serial connection as outlined in TID 000016233 Configuring a Remote Serial Console for SLES .Kernel Core Dump capture
If a system crashes, the possibility of capturing a kernel core dump is given using kdump. Its configuration is explained in TID 000016171 Configure crashkernel memory for kernel core dump analysis. A best practices document about providing kernel core dumps to Customer Care is available at TID 000017820 Best practice for providing kernel core dumps to support incidents
For SLES Expanded Support based system please consult the corresponding online documentation for RHEL5 or RHEL6 on configuring kdump.
Please note: kernel core dumps must have been written completely to the dump device. To ensure this is the case, set KDUMP_IMMEDIATE_REBOOT to "yes" in /etc/sysconfig/kdump and wait for the system to reboot itself. Note that cores can be very large, so this may take a while. Forcing a reboot manually could interrupt the writing and result in an incomplete core. If the dump is incomplete for whatever reason an analysis will not be possible.
Additional information
sysstat is a tool which collects system data (e.g. CPU, RAM, i/o usage) in regular intervals. Its output is also a valuable source of information when it comes to troubleshooting crash situations. Please consider to install the package sysstat and enable its service by using
/etc/init.d/boot.sysstat start
If this service is activated before the system crashes, supportconfig and sosreport will include its output into the system report.
Additional Information
- If the system crashes without log entries contact the hardware vendor
- Apply available BIOS / firmware related updates
- Check timestamp of /var/log/mcelog. If entries are found in this log file, please contact the hardware vendor for a hardware check
- Apply available online updates
- Configure serial console
- Configure kdump
- Open a service request
- Provide the core dump upon request
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7010249
- Creation Date: 05-Mar-2012
- Modified Date:23-Feb-2021
-
- SUSE Linux Enterprise Desktop
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Real Time Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com