SUSE Conversations


Installing and Configuring KDump on SLES 10 for Kernel Crash Analysis

mfaris01

By: mfaris01

May 25, 2010 11:14 am

Reads:2546

Comments:0

Rating:0

Although Linux continues to prove itself as one of the most stable platforms ever, there are times when something goes awry and bad things happen. A poorly written application performs some function that the kernel doesn’t like and a crash or panic occurs. As with the Unix-related operating systems, a simple reboot isn’t an acceptable remedy or resolution.

This is where core dumps can help you out, especially with SLES and RHEL, for that matter. Core dump files can be sent to Novell or RedHat for further research and resolution.

In a production environment, time is critical to find the root cause, resolve it and get your systems back online and fully function. In this article, the intention is to help you understand, troubleshoot and and resolve issues surrounding kernel crashes.

General

There are a couple of utilities that can process core dump files. We will focus on KDump for this article.

lkcd is another utility included on the SLES 10 distro, but has more limitations than kdump. For one, the latest version was released in 2006, has very limited 64bit support and lacks support for a number of driver modules. KDump is more recent and is more scalable to fit a diverse environment.

KDump consists of two main components, Kexec and Kdump.

Kexec uses the UNIX exec system call image overlay philosophy to spawn a new Linux kernel image over a running Linux kernel image, without going through BIOS. Kexec has different uses including fast reboot, but here we’ll discuss it’s main use, kdump.

Capturing a dump after a kernel crash is inherently unreliable, since kernel code that access the dump device may be in an unstable state. Kdump gets around this problem by collecting the dump after booting into a healthy kernel via kexec.

Prerequisites

You do not have to complete this section if you wish to use the kernel-kdump that is available through YaST, by searching on “kdump” in Software Management. This would not apply if you are using a VMI or more recent kernel.

Since kdump requires it’s own kernel to be built based off of your existing, we’ll need to run “make menuconfig” to edit some kernel parameters. Luckily, SLES 10 ships with a kernel that is 99% kdump ready. We just need to tweak a couple of things and build our new crash kdump kernel.

NOTE: Kernel modification is not recommended for someone who is not experienced or is new to Linux. The changes we are making here are minor as most of the parameters we need are default. Use “make menuconfig” at your own risk. Remember it’s just a configuration modification, the kernel must still be manually built afterward.

Change to the kernel sources directory (/usr/src/linux-2.6.xxx) depending on the version of your kernel.

Enter the command “make menuconfig” and is you aren’t missing any libraries, like ncurses-devel-5.5-18.11, you should see the following screen.

Scroll down to “Processor type and features” and hit Enter

At the bottom, highlight Kernel crash dumps (EXTERIMENTAL) and press the spacebar to mark. Another option will appear below. Leave that to default. Press ESC twice to go back.

Scroll to General Setup and press Enter.

Select the top item Local Version and press Enter.

Change the item listed from “-default” or “-bigsmp” or whatever, to “-kdump” and select OK.

Press ESC twice to go back to the main menu. And ESC to exit.

Select Yes to save the new configuration.

Compile your new kernel. Kernel compiling is beyond the scope of this article. Please refer to your documentation or the many resources on the web for more information.

Kdump doesn’t support compressed kernel images so we’ll use the image vmlinux-kdump instead of the compressed version vmlinuz-kdump

Installation

The packages we’ll need for SLES 10 are

kexec-tools
kdump
yast2-kdump – Configure kdump for you X-server folks. (optional)

You can install them through YaST.

Configuration

Kdump

Now that we have our kernel “kdump enabled”, we need to configure for a both a local dump and a network dump. Whereas kdump can dump to an ftp server, nfs, cifs and ssh.

Configure Local Dump

Edit the file /etc/sysconfig/kdump and make the following changes. (Shaded)

If you are using SMP then modify the following line:

KDUMP_COMMANDLINE_APPEND="maxcpus=1 "
KDUMP_OPTIONS="--args-linux " 

It is recommended to have Kdump save crash dumps on a local device by setting the runlevel to 1, it will be different for network dumps and runlevel 5 is not recommended unless you allocated enough memory.

KDUMP_RUNLEVEL="1" 

Once the dump is completed, we’ll want the server to reboot back to the normal kernel. It will not work if the very next parameter is not empty.

KDUMP_IMMEDIATE_REBOOT="yes"

This specifies to which device the dump is written. It can cause issues with the underlying filesystem. We will use KDUMP_SAVEDIR for the location to put the dump file.

KDUMP_DUMPDEV=""

Change this setting to a location where you will have sufficient disk space to write the dump file. You can use nfs shares, as well as the other options.

KDUMP_SAVEDIR="file:///tmp/kerneldump"

You can reserve diskspace for kdump with this directive. Default is 64MB.

KDUMP_FREE_DISK_SIZE=128MB

Altogether, the settings look like this for our setup:

KDUMP_COMMANDLINE_APPEND="maxcpus=1 "
KDUMP_OPTIONS="--args-linux "
KDUMP_RUNLEVEL="1"
KDUMP_IMMEDIATE_REBOOT="yes"
KDUMP_DUMPDEV=""
KDUMP_SAVEDIR="file:///tmp/kerneldump"
KDUMP_FREE_DISK_SIZE=128MB
NOTE: Local and Network kdump configurations cannot coexist. To change your configuration for a network dump, change these two parameters to reflect using network.

Configure Network Dump

KDUMP_RUNLEVEL="3"

By default, init 3 is the lowest runlevel network devices are loaded.

KDUMP_SAVEDIR="nfs:///mynfs_server:/kerneldump"

You can use DNS or IP Address. I prefer IP, because DNS is just something else that might be crashing the box.

Save the file and we’ll configure GRUB.

For kdump to function properly, we need to add a parameter to kernel to reserve a certain amount of RAM for crash. The default is 64MB, and some recommend 128MB. The parameter looks like this:

crashkernel=128M@16M 

The 16M is the amount that was allocated during the configuration and the 128M is the amount of memory we want to reserve for crash.

Modify /boot/grub/menu.lst and append this parameter to the kernel directive.

       kernel /boot/vmlinuz-2.6.16.60-0.39.3-vmi root=/dev/sda2 vga=0x32b resume=/dev/sda1 splash=silent showopts clock=pit crashkernel=128M@16M

Save the file and exit.

Now we need to “turn it on” so it will start on boot.

chkconfig kdump on

and then we’ll start it.

/etc/init.d/kdump start

If you get an error, check the entry in /boot/grub/menu.lst and make sure your syntax is correct.

At this point we want to reboot and load the kdump-enabled kernel with the crash parameters.

Crash Testing

Now let’s simulate a crash. We can do this by turning on System Request (sysrq) functionality and then triggering a panic.

Enable sysrq:

echo 1 > /proc/sys/kernel/sysrq

Now, send it into panic.

echo c > /proc/sysrq-trigger

The console should start to boot the crash kernel. After a few moments you will see the dump file being created. Should look something like this.

Setting flag_elf64 to true
[ 67% ]

You can check the file progress by remembering the KDUMP_SAVEDIR directive and listing the files in that path.

# ll /tmp/kerneldump/

-rw------- 1 root  root  392366080 May 16 11:05 vmcore

When it’s done, it will reboot, (because we told it too) and return to the normal kernel.

If this had been an actual emergency, you would have been directed, by Novell Support, where to send the file for further analysis. Just Kidding.
There are analyzers like GDB that can provide a more human view of the core dump file, and perhaps you can find your problem without further support.

Conclusion

This process is a bit complicated and it does require some kernel knowledge to configure. As stated before, SLES 10.x provides a kdump-enabled kernel that you can just install. If you looked closely at our config presented here, we’re using a VMI kernel, so we had to make the changes manually. There is much more functionality to kdump than was presented. If you wish to use this utility for your crash dump tool, get it working the way you like and then port it to production.

Enjoy!

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Tags: , , ,
Categories: Enterprise Linux, SUSE Linux Enterprise Server, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

Comment

RSS