SLES server running on VMware ESX is hung and unable to get a kernel core for debugging

This document (7008844) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 10
SUSE Linux Enterprise Server 11
VMware ESX

Situation

Server is hung and will not respond to keyboard, ssh, or magic keys. 
The nmi_watchdog kernel boot parameter does not work either (the kernel nmi watchdog driver doesn't find the same watchdog timer sources in a VMware ESX guest that it finds on a physical server so the watchdog timer reports an error when initializing in a ESX guest). 

Resolution

To troubleshoot this type of issue a kernel core is needed and a NMI (non-maskable interrupt) will need to be sent to the server from the VMware ESX host server.

Set the SLES server up to gather a kernel core.  TID 3374462"Configure kernel core dump capture"
After the server has been rebooted to make the crashkernel active then enter:
  sysctl -w kernel.unknown_nmi_panic=1
Example:
  server:~ # sysctl -w kernel.unknown_nmi_panic=1
  kernel.unknown_nmi_panic = 1

From the VMware ESX host machine. Do "vm-support -x"  that should give you the VMware ID's for each virtual machine.
Example:
  /sbin # vm-support -x
  VMware ESX Support Script 1.33
  Available worlds to debug:
  wid=9720177     11-sles10sp3-oes2sp3
  wid=13820675    sles11sp1-server   <--used in example
  wid=9840489     10-sles10sp3-oes2sp3
  wid=3930680     05-sles11-x86_64
 
On ESXi version 5.5.0 and newer the -x switch is no longer available. To find the WID for the VM guest enter 'ps -g | grep -i <vm name> | grep -i vmm0', the number at the beginning of that line is the guest's WID.
Example:
 
      # ps -g | grep -i sles11sp1-server | grep -i vmm0
         1382067      vmm0:sles11sp1-server 


Then do "/usr/lib/vmware/bin/vmdumper 13820675 nmi"  Replace 13820675 with your VMware ID.
Example:
  /sbin # /usr/lib/vmware/bin/vmdumper 13820675 nmi
  Sending NMI to guest...

Core will now be generated on the virtual machine for debugging, by default the virtual machine will reboot when finished and core is in /var/crash on SLES 11 and /var/log/dump on SLES 10.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7008844
  • Creation Date: 17-Jun-2011
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center