SUSE Conversations


Problems with /boot mounting or a split kernel after patching



By: rothweiler

October 26, 2010 2:14 pm

Reads:473

Comments:1

Rating:0

Beginning with SLES10 SP3 many of us started to experience problems getting to /boot on a running system. This can be a major problem if you apply a kernel patch while /boot is not mounted as the patch engine will update libraries and kernel files in / so that when you reboot the kernel is /boot is read but then mismatched to libraries in / causing what is often referred to as a split kernel.

If you experiencing:

The symptoms of a split kernel can be:

  • Slow boot
  • System freezing after boot completion
  • Logon not completing logon processes
  • Modules failing to load during boot – ACPI is the most common

If you don’t have a split kernel go ahead and skip forward to the section headed “Mounting /boot when /boot won’t mount

Repairing a split kernel (we will check if you have a split kernel first)

Boot to a rescue DVD/CD.

At the login prompt login as root.

First – mount your /boot to /media/floppy with the command:

mount /dev/CCISS/c0d0p1 /media/floppy

#notes - c0 means controller zero, d0 means disk zero, p1 means partition one – if your /boot is on another disk or partition you would need to substitute those parameters here. I have read some postings of people having this same problem with standard hda and sda devices but have not experienced it myself – however, the same substitution would apply for those devices but as /boot/hda1 as they are not CCISS devices.

Now that we have the real /boot mounted we want to peek into your grub menu to find the path to your / mount point:

cat /media/floppy/grub/menu.lst

Look for the section “root=” I put the rest of my filesystem in LVM so mine is “root=/dev/system/slash” where system is the VG name and slash is the LVM name for /. We take the line we find after root= and mount it to /media/cdrom like:

mount /dev/system/slash /media/cdrom

With both mounted we need to make comparisons:

ls /media/floppy <this is mounted directly to /boot.

ls /media/cdrom/boot <this is mounted to / so boot is a sub-directory intended to be a mount point.

If the second is empty or matches the files in /media/floppy then you do not have a split kernel. If they are mismatched we’ll recover from that problem with these steps.

Make a backup directory of the new files in case an accident happens”

mkdir /media/cdrom/boot.new

cp /media/cdrom/boot/* /media/cdrom/boot.new

Make a backup of the current files in the real /boot in case something goes wrong:

mkdir /media/floppy/boot.old

mv /media/floppy/* /media/floppy/boot.old <We’re moving these because the new files will be different versions and we don’t normally want different versions of the kernel sitting directly in /boot.

cp /media/floppy/grub/menu.lst /media/floppy/boot.old <cp, do not mv this file.
put the correct (new) files in place
mv /media/cdrom/boot/* /media/floppy

Make sure you have softlinks for your initrd and vmlinuz files:

ls /media/floppy Look for files with the short name initrd and vmlinuz

If the short name versions (softlinks to the long name versions) do not exist use these link commands:

link /media/floppy/vmlinuz-{your version of kernel here} /media/floppy/vmlinuz -s

link /media/floppy/initrd-{your version of kernel here} /media/floppy/initrd -s

Update the GRUB menu – make a note of the kernel version of the files you just moved such as “vmlinux-2.6.16.60-0.54.5-bigsmp.gz” You’ll need the numbers inside of the GRUB menu:

vi /media/floppy/grub/menu.lst – mine looks like this:

# Modified by YaST2. Last modification on Wed Jun 23 08:08:38 UTC 2010
default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: linux###
title SUSE Linux Enterprise Server 10 SP3
    root (hd0,0)
    kernel /vmlinuz-2.6.16.60-0.54.5-bigsmp root=/dev/system/slash vga=0x317 res
ume=/dev/cciss/c0d0p2 splash=silent  showopts
    initrd /initrd-2.6.16.60-0.54.5-bigsmp

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- SUSE Linux Enterprise Server 10 SP3
    root (hd0,0)
    kernel /vmlinuz-2.6.16.60-0.54.5-bigsmp root=/dev/system/slash vga=normal sh
owopts ide=nodma apm=off acpi=off noresume nosmp noapic maxcpus=0 edd=off 3
    initrd /initrd-2.6.16.60-0.54.5-bigsmp

Edit the version numbers on every line to match the kernel files you just put in place. Use caution here as a mistyped version will cause a grub failure and you’ll be back here in rescue mode to fix it.

Once complete exit and save forcefully (:wq!) as menu.lst is usually read only.

Reboot and you should be back to normal operating status (mostly) :-)

Mounting /boot when /boot won’t mount

If your system is running normally but you cannot see any files in /boot or you just repaired a split kernel we need to fix the mount problem. This is most often caused by multipath trying to control the device where /boot resides.

Lets check

ls /boot

Is the directory empty? If so try to mount /boot

mount /boot

Do you get an error like “already mounted” or “/boot busy”? if so lets check if it is already mounted:

umount /boot <If you do not get an error that boot is not mounted then this is not the solution you are looking for and you should search for other solutions.

Next steps – first, check where /boot is supposed to be by looking in fstab:

	/dev/system/slash / ext3 defaults 1 1
	/dev/cciss/c0d0p1 /boot ext3 acl,user_xattr 1 2
	/dev/system/tmp /tmp ext3 defaults 1 2
	/dev/system/usr-novell /usr/novell ext3 defaults 1 2
	/dev/system/var /var ext3 defaults 1 2
	/dev/cciss/c0d0p2 swap swap defaults 0 0
	proc /proc proc defaults 0 0
	sysfs /sys sysfs noauto 0 0
	debugfs /sys/kernel/debug debugfs noauto 0 0
	usbfs /proc/bus/usb usbfs noauto 0 0
	devpts /dev/pts devpts mode=0620,gid=5 0 0
	DIST /media/nss/DIST nssvol noauto,rw,name=DIST,norename 0 0
	VOL1 /media/nss/VOL1 nssvol noauto,rw,name=VOL1,norename 0 0
	VOL2 /media/nss/VOL2 nssvol noauto,rw,name=VOL2,norename 0 0

This shows that /boot is on my first disks first partition – remember the disk and partition number for the next steps.

Let’s mount /boot by device ID:
mount /dev/disk/by-id/ then press tab 3 times. This will show all possible completions such as:

server1:~ # mount /dev/disk/by-id/cciss-3600508b1001030393146353436306
cciss-3600508b1001030393146353436306400
cciss-3600508b1001030393146353436306400-part1
cciss-3600508b1001030393146353436306400-part2
cciss-3600508b1001030393146353436306400-part3
cciss-3600508b10010303931463534363064001
cciss-3600508b10010303931463534363064002
cciss-3600508b10010303931463534363064003
cciss-3600508b1001030393146353436306500
cciss-3600508b1001030393146353436306500-part1
cciss-3600508b10010303931463534363065001
cciss-3600508b1001030393146353436306600
cciss-3600508b1001030393146353436306600-part1
cciss-3600508b10010303931463534363066001
server1:~ # mount /dev/disk/by-id/cciss-3600508b1001030393146353436306

For my system (and I am guessing most systems) we are looking for the first device with a -part1 and should complete the line entered to this point with the correct device ID – in this case it is the one ending with 400-part1 then add the target mount point of /boot for a complete command looking like:

mount /dev/disk/by-id/cciss-3600508b1001030393146353436306400-part1 /boot

After the mount command is executed verify /boot mounted by checking for contents again.

ls /boot

Once boot is confirmed mounted we need to make sure the device is blacklisted from multipath.

Create (or edit) /etc/mulitpath.conf but first lets check for SAN devices that we do not want to blacklist. (if the server is not SAN connected this is not needed)

multipath -ll Ensure any SAN devices listed do not match any of the patterns in your multipath blacklist. Then create your multipath.conf:

vi /etc/multipath.conf

It needs to include a section for blacklisting such as:

	blacklist {
		devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
		devnode ^hd[a-z]
		devnode ^cciss!c[0-9]d[0-9]* 
	}

This should cover most normal non-SAN devices.

Exit and save

:wq

and now rebuild initrd

mkinitrd – this will generate a new /boot/initrd-{kernel version and platform} including the multipath blacklists to prevent multipath from trying to manage the local CCISS devices.

Once these steps are completed you should be able to reboot and see the boot files in /boot without taking extra steps. This is pretty important when patching your kernel or you will end up with the split kernel problem outlined above (but you now know how to fix that too!)

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Tags: , ,
Categories: Enterprise Linux, SUSE Linux Enterprise Server, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

1 Comment

  1. By:rothweiler

    In TID 7005808, the multipath.conf is discussed but it is not always read until after the /boot device is grabbed by multipath – a little too late!
    TID 7005808 also does not help in recovering from a split kernel leaving you (possibly) with the cause of the broken server after a kernel patch but not knowing how to repair the problem.

    This problem is not unique to SuSE so other distributions can use the same or similar steps to what is outlined here.

Comment

RSS