Troubleshooting the SLES10 Boot Process
Overview
When a server fails to boot, a critical situation is at hand. The purpose of this document is to provide a quick reference guide to narrow down the cause of a failed boot and get the server back up as quickly as possible. It is based on SUSE Linux Enterprise Server 10 (SLES10).
Troubleshooting Procedure
- The primary troubleshooting objective is to narrow down where in the boot process the failure occurred.
- The boot process is summarized below. For more details, refer to the Troubleshooting Table below.
- Look at the failed server’s screen for the last on-screen landmark that matches the troubleshooting table’s "On-Screen Landmarks".
- Once you determine how far in the boot process the failure occurred, look at the troubleshooting table’s associated files and troubleshooting/potential fixes.
- The two most identifiable on-screen landmarks are:
- The grub boot menu screen (Troubleshooting Table, Line 3)
- Seeing the word "done" scrolling across the screen (Troubleshooting Table, Lines 8 and 11)
- The purpose of boot installed system, run level 1 and chroot installed system is to get the server in an operational maintenance state, so further problem resolution can be completed.
- Boot Installed System (BIS) Procedure
- If this procedure works, then the problem is most likely on lines 1-6 of the troubleshooting table.
- Boot from CD1
- Select "Installation"
- Select your Language
- Accept the License Agreement
- Click "Other"
- Select "Boot Installed System"
- Click "OK"
- Boot to Run Level 1
- Run level 1 is very similar to chroot installed system (CIS), but the kernel does it for you. You also have access to yast and the proc filesystem. So, run level 1 is preferred over CIS.
- Append "init 1" to the boot options line of the default boot kernel (ie SUSE Linux Enterprise Server 10)
- Type root’s password
- If you need network access, just use yast to configure it
- chroot Installed System (CIS) Procedure
- Used mostly in lines 7-14 of the troubleshooting table.
- Boot from CD1
- Select "Rescue System", Rescue login: root
- Your first goal is to find and mount the root "/" partition, so we can see /etc/fstab
- Run cat /proc/partitions to find the disk devices the OS sees
- For each device, display the partition table
ls-boot:~ # parted -s /dev/sda print Disk geometry for /dev/sda: 0kB - 2147MB Disk label type: msdos Number Start End Size Type File system Flags 1 32kB 214MB 214MB primary ext2 boot, type=83 2 214MB 535MB 321MB primary linux-swap type=82 3 535MB 2147MB 1612MB extended lba, type=0f 5 535MB 1012MB 477MB logical reiserfs type=83 6 1012MB 1596MB 584MB logical reiserfs type=83 7 1596MB 2147MB 551MB logical reiserfs type=83
- You can ignore type 82 swap and type 0f extended partitions
- To find the root partition, you may need to just guess. For example,
- mount /dev/sda1 /mnt
- ls -l /mnt
- If the /mnt directory listing shows /etc and /root, then its the root partition
- Repeat these steps for each device until you find root. In this case, the root device is /dev/sda6
- mount /dev/sda6 /mnt
BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login
yast lan > Next > Edit > Next > Next/Finish
- Mount all additional file systems relative to /mnt
- Run cat /mnt/etc/fstab
Rescue# cat /mnt/etc/fstab /dev/sda6 / reiserfs acl,user_xattr 1 1 /dev/sda1 /boot ext2 acl,user_xattr 1 2 /dev/sda7 /usr reiserfs acl,user_xattr 1 2 /dev/sda5 /var reiserfs acl,user_xattr 1 2 /dev/sda2 swap swap defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0
- This shows the system devices and their mount points.
- Mount all additional file systems, for example.
mount /dev/sda1 /mnt/boot mount /dev/sda5 /mnt/var mount /dev/sda7 /mnt/usr
- Rebind proc, sysfs and dev
mount --rbind /proc /mnt/proc mount --rbind /sys /mnt/sys mount --rbind /dev /mnt/dev
- chroot to the mounted installed system. The chroot command remaps /mnt as root "/".
chroot /mnt
- If this command fails, then you need to confirm that /mnt/bin/bash and glibc on the installed system are valid.
- To return to the rescue system, type exit.
Troubleshooting Table
BIS = Boot Installed System Procedure
CIS = chroot Installed System Procedure
Boot Process | Associated File(s) | On-Screen Landmarks | Troubleshooting / Potential Fixes | |
1 | BIOS | N/A | BIOSMessages | Update the firmware Make sure a disk device is marked bootable |
2 | MBR | /boot/grub/stage1 | GRUB loading stage2… |
BIS grub-install /dev/<disk> or lilo -v |
3 | GRUB | /boot/grub/stage2 /boot/grub/menu.lst |
GRUB menu or grub> prompt | BIS grub-install /dev/<disk> or lilo -v Check /boot/grub/menu.lst |
4 | kernel | /boot/vmlinuz | Hardware info scrolling RAMDISKdriver initialized: |
BIS Reinstall kernel rpm |
5 | initrd | /boot/initrd /etc/sysconfig/kernel |
RAMDISK: <relevant message> | BIS mkdir -p /tmp/ramdisk; cd /tmp/ramdisk; zcat /boot/initrd | cpio-ivd mkinitrd lilo -v |
6 | ramdisk:init | /init in /boot/initrd /etc/sysconfig/kernel |
Starting udevd Creating devices Loading <module_name> There will be a "Loading" statement for each module defined in the /etc/sysconfig/kernel INITRD_MODULES variable. |
BIS mkinitrd creates the ramdisk:init file. |
7 | sbin:init | /sbin/init /etc/inittab |
INIT: version 2.85 booting | init 1, then CIS
Use boot options init=/bin/bash or init=/bin/sash to bypass running /sbin/init. |
8 | sbin:init:boot | /bin/bash /etc/init.d/boot /etc/init.d/boot.d/* |
System Boot Control: Running /etc/init.d/boot Each service shows: done,failed or skipped System Boot Control: The system has been setup |
init s or init 1 starts the minimum services CIS start no services To step through or stop the boot process from this point on, edit /etc/sysconfig/boot and change to: PROMPT_FOR_CONFIRM="yes" RUN_PARALLEL="no" FLOW_CONTROL="yes" (Ctrl-S stops, Ctrl-Q resumes) |
9 | sbin:init:boot | /etc/init.d/boot.local | System Boot Control: Running /etc/init.d/boot.local | init 1, then CIS |
10 | sbin:init | /etc/inittab | INIT: Entering runlevel: 3 | init 1, then CIS |
11 | sbin:init:rc | /bin/bash /etc/init.d/rc /etc/init.d/rc?.d/* |
Master Resource Control: previous runlevel:N, switching to runlevel: 3 Each service shows: done, failed or skipped Master Resource Control: runlevel 3 has been reached Skipped services in runlevel 3: |
init s or init 1, then CIS |
12 | sbin:init | /etc/inittab | N/A | init 1, then CIS init uses /etc/inittab to know how to run the login programs. |
13 | sbin:init:mingetty | /etc/issue /sbin/mingetty |
Welcome to SUSE LINUX… login: |
init 1 bypasses mingetty CIS |
14 | sbin:init:X | Graphical login screen | init 1 bypasses X login CIS |
If you don’t know what to do next, and BIS or CIS work, you can always run
rpm -Vf </path/to/file>
for each file listed in the "Associated File(s)" column.
Related Articles
Apr 26th, 2023
Are you meeting SUSE and Microsoft at SAPPHIRE Orlando
Feb 09th, 2023
2023 Brand Trend Report
Apr 21st, 2023
Comments
very useful document for trouble shooting purpose
verry helpfull