SUSE Conversations


SLES 10 – Stop The Process of Mapping SAN Attached Storage LUNs First Over Local Storage



By: kryptikos

August 27, 2008 4:23 pm

Reads:1161

Comments:2

Rating:0

It never seems to fail. Business is being held up because of something on “your” end and your boss and his boss and the bean counters are all looking at you to fix the problem. You have a server…a SuSE Linux Enterprise Server of course…that is having a bout of quirky-ness and just not acting as expected. Production is screaming they cannot do their job because the server cannot be reached. Doing your due diligence you quickly discover that there is a storage and partition mounting issue going on.

Problem: Your SAN attached SuSE box is consistently scanning the SAN SCSI storage and mapping the devices first to the luns rather then your local storage when booting. Your partitions are thus not mounting correctly, causing the device to boot into a maintenance mode and gripe about fsck and wrong superblocks.

Issue: Starting with SLES 10, Novell changed the methodology of how modules are initialized during boot. SLES boots the modules in parallel. It does not stair-step, waiting for each driver to load and initialize, rather it kicks them off concurrently. The fibre channel modules and controllers are kicking off before the local disk controller which causes your mapping to go awry.

The Nitty Gritty of Correction:

From a direct connection, HP integrated lights out (iLO), Dell Remote Access Controller (DRAC) or whatever you may have…gain access to the server and obtain a console.

Often your first indication of trouble is you will find the server sitting in maintenance mode due to the failure (see figure 1).

Figure 1

Click to view.

Log in as root. We must parse through the data in dmesg. Locate the area where the kernel is scanning the SCSI storage. You will see the devices /dev/sda through /dev/sdX being assigned via the device mapper to your SAN storage disks (see figure 2).

Figure 2. White arrows indicate points of interest.

Click to view.

Generally you will see the fibre channel SCSI module fire up just before these lines. For instance my server utilizes Emulex Fibre Channel cards. These utilize the lpfc module. If you are not sure what type of card you have, or which driver the kernel is using, you can discover this by issuing the command: hwinfo –storage which will output something similar to what is seen in figure 3.

Figure 3. White arrow indicates module driver.

Click to view.

So we have found the SAN, let’s now locate the section where the local storage disk controller loads (in my case megaraid_mbox…it is a Dell-erized LSI Logic raid controller (PERC4)…again if you are not sure look at the output from hwinfo). Look for the device assignment and mapping for your local storage. I had several boxes that were having this issue and depending on set up I found it mapped as /dev/sdaa, /dev/sdw, and /dev/sdg. Yours will vary as well. However you will see the partion mapping as you normally would if it correctly assigned itself to the first disk (i.e. /dev/sdX1, /dev/sdX2 etc). See figure 4.

Figure 4.

Click to view.

Normally from this point when you modify /etc/syconfig/kernel you tell the kernel to make initrd again and you are off to the races. However, in this case our mappings are still messed up. We need to mount the /boot correctly to get to the point where we can successfully adjust our INITRD. Edit the /etc/fstab file to reflect where your local storage was actually mapped to. In this case my local was mapped to /dev/sdg, with /dev/sdg2 being my /boot partition and /dev/sdg3 as my swap partition.

/etc/fstab

/dev/vgroot/lvroot   /                           ext3       defaults              1 1
/dev/vgroot/lvapps   /apps                ext3       defaults              1 2
/dev/vgroot/lvbest1  /apps/opt/best1      ext3       defaults              1 2
/dev/vgroot/lvlogs   /apps/opt/logs          ext3       defaults              1 2
/dev/vgroot/lvappstmp /apps/tmp            ext3       defaults              1 2
/dev/sda2            /boot                                ext3       acl,user_xattr        1 2
/dev/vgroot/lvhome   /home                ext3       defaults              1 2
/dev/vgroot/lvtmp    /tmp                      ext3       defaults              1 2
/dev/vgroot/lvvar    /var                         ext3       defaults              1 2
/dev/sda3            swap                       swap       defaults              0 0
/dev/appvg/oracle    /VZ/opt/oracle       ext3       defaults              1 2
/dev/appvg/oradata1  /VZ/oradata1         ext3       defaults              1 2
/dev/appvg/oradata2  /VZ/oradata2         ext3       defaults              1 2
/dev/appvg/oradata3  /VZ/oradata3         ext3       defaults              1 2
/dev/appvg/oradata4  /VZ/oradata4         ext3       defaults              1 2
/dev/appvg/oraarch   /VZ/oraarch          ext3       defaults              1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/fd0             /media/floppy        auto       noauto,user,sync      0 0  

Change to

/dev/vgroot/lvroot   /                           ext3       defaults              1 1
/dev/vgroot/lvapps   /apps                ext3       defaults              1 2
/dev/vgroot/lvbest1  /apps/opt/best1      ext3       defaults              1 2
/dev/vgroot/lvlogs   /apps/opt/logs          ext3       defaults              1 2
/dev/vgroot/lvappstmp /apps/tmp            ext3       defaults              1 2
/dev/sdg2            /boot                               ext3       acl,user_xattr        1 2
/dev/vgroot/lvhome   /home                ext3       defaults              1 2
/dev/vgroot/lvtmp    /tmp                     ext3       defaults              1 2
/dev/vgroot/lvvar    /var                        ext3       defaults              1 2
/dev/sdg3            swap                 swap       defaults              0 0
/dev/appvg/oracle    /VZ/opt/oracle       ext3       defaults              1 2
/dev/appvg/oradata1  /VZ/oradata1         ext3       defaults              1 2
/dev/appvg/oradata2  /VZ/oradata2         ext3       defaults              1 2
/dev/appvg/oradata3  /VZ/oradata3         ext3       defaults              1 2
/dev/appvg/oradata4  /VZ/oradata4         ext3       defaults              1 2
/dev/appvg/oraarch   /VZ/oraarch          ext3       defaults              1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/fd0             /media/floppy        auto       noauto,user,sync      0 0  

Once you have edited the fstab mount the filesystem fully: mount –a.
This will make your /boot partition accessible which is necessary to be able to rebuild the initial ramdisk (INITRD).

Next up is to edit the loading order of drivers in INITRD_MODULES. You do this from within the file /etc/sysconfig/kernel. Although in SLES the boot file is loading modules in parallel, I still want/need my local storage to kick off first even if it does not finish loading before the next module. Change the order to have your local storage module load before the fibre channel. In my case I had Emulex cards that use the module “lpfc”. The local storage module is megaraid_mbox.

/etc/sysconfig/kernel

# This variable contains the list of modules to be added to the initial
# ramdisk by calling the script "mkinitrd"
# (like drivers for scsi-controllers, for lvm or reiserfs)
#
INITRD_MODULES="piix lpfc megaraid_mbox siimage processor thermal fan jbd ext3 dm_mod edd"     

Change it to be

INITRD_MODULES="piix megaraid_mbox siimage processor thermal fan jbd ext3 dm_mod edd lpfc "   
  

We are almost done with our modifications. The last thing we would like to control is the load order of the host controllers. Located in the last block section of /etc/modprobe.conf is the load order. Simply add the local storage controller before the fibre channel controllers.

# ata_piix can't handle ICH6 in AHCI mode
install ata_piix /sbin/modprobe ahci 2>&1 |:; /sbin/modprobe --ignore-install ata_piix

# end of x86_64 part for modprobe.conf

# please keep this at the end and add local modifications to modules.conf.local
include /etc/modprobe.d
include /etc/modprobe.conf.local
options scsi_mod max_luns=256
alias scsi_hostadapter megaraid_mbox < --- add your local storage here
alias scsi_hostadapter1 lpfc
alias scsi_hostadapter2 lpfc
alias scsi_hostadapter3 lpfc
alias scsi_hostadapter4 lpfc  
               

Last few clean up steps…change your etc/fstab to reflect the original partition set up. Once you have completed that, issue the command: mkinitrd. This will rebuild your vmlinuz filesystem images for your initial ramdisk. It should look a little something like Figure 5. Notice that the module drivers are in the new order you specified from INITRD_MODULES.

Figure 5. Note the white dots where the module positions have changed.

Click to view.

That’s it. You can reboot and your SLES box should load up the local storage and assign the partitions correctly. Once it has completed the boot up you can log in and confirm your partition assignments by issuing the command less /proc/partitions at the root command line. You should see the correct sizing assigned to your local and SAN LUNs.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Tags: ,
Categories: SUSE Linux Enterprise Server, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

2 Comments

  1. By:Kennon

    I can’t say I’ve ever had that problem and I have a lot of SAN attached disk on my SLES machines. The problem I did run into recently that is similar though was on a couple of my boxes the fstab had some wonky device names in it. They were mounting drives by these long device ID’s and for some reason that somehow changed and my blades booted into maintenance mode. I had to edit the fstab to the actual /dev/sda2..3..etc…to fix it. What is weird is I went looking around the other 30 or so similar servers and only one other one was using this convention in the fstab, the rest were already set to the actual device name. The only thing I could come up with on this was I am pretty sure those two servers were both originally SLES10 SP0 and then upgraded to latest SP’s while the rest were installed as either SP1 or more recently SP2 out of the box…weird. Thanks for the cool solution. Hope I never need it :)

  2. By:Anonymous

    My colleague and I had a SLES10 build with QLOGIC HBA, booting with the fibres connected did not allow the boot to complete at all – not even to single user mode. Disconnecting the fibres did, but of course this is not the way to do it! Ran through these pointers, lo-and-behold issue solved – thanks this was a real help!

    PS – Not too much stress though as was a development box!

Comment

RSS