SUSE Conversations


Rollback SLES Online Updates with LVM Snapshots



By: jrecord

August 22, 2008 6:50 pm

Reads:443

Comments:1

Rating:0

Overview
Configuration
Prepare to Update
Update the System
Accept the Update
Reject the Update
Rollback from Disaster
Conclusion

Overview

This document is offered as an alternative solution to Rollback SLES Online Updates with Software RAID1. If you have your system disk configured with Logical Volume Management (LVM), you can implement this solution as a way to rollback a failed online update. You simply backup your boot directory, create an LVM snapshot of your system disk, and then update as usual. If you don’t like the update and want to rollback to your previous state, then you boot up the server using the snapshot images, and restore the older data from the snapshot back to the original LVM volumes. The purpose of this article is to detail this rollback procedure using LVM. The concept should be valid for EVMS as well, but an EVMS solution is outside the scope of this article.

A word of caution. The one thing this procedure does not rollback are application meta data changes. If an application updates it’s meta data format as a result of the update process and you decide to rollback the updates, the older application will most likely not do well with a new meta data format. Novell Storage Services (NSS) on Open Enterprise Server (OES) for Linux could experience something like this. I have not seen this happen with NSS, but it has the potential.

Configuration

I have installed SUSE Linux Enterprise Server 10 with Service Pack 2 (SLES) on an i386 machine with an minimum install and 512M of RAM. There is a /boot partition that is not LVM, and / and /home that are LVM volumes. I always hate it when examples only show the simplest way to do something, and then you are left to figure out more complicated ways that match your own system. So for this article I have included LVM devices for / and /home to show how to rollback multiple volumes.

Configuration Details

Click to view.

Figure 1 – Configuration Details

Prepare to Update

Before you can proceed with a system update, you need to create snapshots of the system volumes in case you want to rollback the updates. The procedure to prepare for an update is:

  1. Perform a complete system backup to tape. No rollback procedure of any kind replaces a good backup solution.
  2. Backup the /boot directory. Since the boot partition cannot be an LVM volume due to boot loader limitations, it must be backed up separately; obviously no LVM snapshot is available for the boot partition.

Boot Partition Backup

Click to view.

Figure 2 – Boot Partition Backup

  1. Add a disk if necessary to extend the volume group. If in doubt, add a disk of the same size as your disk(s) currently in the volume group.

Disk Added to Extend Volume Group

Click to view.

Figure 3 – Disk Added to Extend Volume Group

  1. Extend the volume group with sufficient space for the upgrade.

WARNING: If you don’t have sufficient space, you will get an error creating the snapshots.

Insufficient free extents (2) in volume group

Extending sys Volume Group

Click to view.

Figure 4 – Extending sys Volume Group

Volume Group Extended

Click to view.

Figure 5 – Volume Group Extended

  1. Create an original and rollback copy of the /etc/fstab file.

Copies of fstab

Click to view.

Figure 6 – Copies of fstab

  1. Create a snapshot volume for each volume that may be affected by the update.

Creating Snapshot Volumes

Click to view.

Figure 7 – Creating Snapshot Volumes

Snapshot Volumes Created

Click to view.

Figure 8 – Snapshot Volumes Created

  1. Mount the system/root snapshot volume
  2. Use the rollback file /mnt/etc/fstab.rollback

Prepare Rollback fstab

Click to view.

Figure 9 – Prepare Rollback fstab

Update the System

Once you have prepared the LVM volumes with snapshots of the current system, you can proceed with a normal online update. Reboot the server to activate the new kernel and updated changes.

Online Update

Click to view.

Figure 10 – Start Online Update

Online Update in Progress

Click to view.

Figure 11 – Online Update in Progress

Updated GRUB Menu

Click to view.

Figure 12 – Updated GRUB Menu

Accept the Update

Updated System

Click to view.

Figure 13 – Updated System

Usually the update goes smoothly, and you want to accept the update. The only thing needed to accept the update is to remove the snapshots with lvremove.

Remove Snapshots

Click to view.

Figure 14 – Remove Snapshots

Updated System without Snapshots

Click to view.

Figure 15 – Updated System without Snapshots

Reject the Update

If for any reason you do not want the update on your server, you can rollback the changes. The typical reason for a rollback is unexpected behavior with third party applications/drivers. The procedure is:

  1. Restore /boot
  2. Reboot, using the snapshots. The snapshot image needs to be using an fstab that references the LVM snapshot devices for all mounted file systems (see Figure 9 – Prepare Rollback fstab in the Prepare to Update procedure above.)

Restore Boot Partition

Click to view.

Figure 16 – Restore Boot Partition

  1. Change to root=/dev/sys/ssroot on the GRUB boot options line.

Booting to Snapshots

Click to view.

Figure 17 – Booting to Snapshots

Running System on Snapshots

Click to view.

Figure 18 – Running System on Snapshots

  1. Reformat the LVM origin volume devices

WARNING: If the snapshots are too small, then you may get errors similar to this one when running pvscan or reading the snapshot volume.

/dev/dm-3: read failed after 0 of 4096 at 10736288: Input/output error
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error

Running out of space on the LVM snapshot device has the highest probability of occurring after you reformat your device. This is bad timing and could irreparably break the server; requiring a complete reinstall of the operating system. So make sure you have enough space when you created your snapshots.

Reformatting File Systems

Click to view.

Figure 19 – Reformatting File Systems

  1. Restore /home
    1. Reformat /dev/sys/home
    2. Mount /dev/sys/home to /mnt
    3. Copy /home/* to /mnt
    4. Unmount /mnt

Restoring Home

Click to view.

Figure 20 – Restoring Home

  1. Restore / (root)
    1. Reformat /dev/sys/root
    2. Mount /dev/sys/root to /mnt
    3. Create mount point and used directories in /mnt
    4. Copy the snapshot root file system to the origin root file system on /mnt
    5. Validate the root restore operation

Restoring Root

Click to view.

Figure 21 – Restoring Root

  1. Restore the /mnt/etc/fstab.original file on the mounted origin root volume

Restoring fstab

Click to view.

Figure 22 – Restoring fstab

  1. Reboot normally

Rebooting without Snapshots

Click to view.

Figure 23 – Rebooting without Snapshots

GRUB Menu Rolled Back

Click to view.

Figure 24 – GRUB Menu Rolled Back

  1. The system has been rolled back, and is using the original restored LVM volumes.

System Rolled Back

Click to view.

Figure 25 – System Rolled Back

  1. Remove the LVM snapshot volumes.

Removing Snapshots

Click to view.

Figure 26 – Removing Snapshots

At this point you might be wondering, “Why don’t I just just use the snapshot volumes and delete the original volumes?” LVM binds it’s snapshot volumes to the original volumes. The snapshots are just deltas of the data, so the original volume must be present as a reference point. If you attempt to delete the original volumes, you will get the following error:

rblvm:~ # lvremove /dev/sys/root
  Can't remove logical volume "root" under snapshot

rblvm:~ # lvremove /dev/sys/home
  Can't remove logical volume "home" under snapshot

System Rollback Complete

Click to view.

Figure 27 – System Rollback Complete

  1. Rollback the boot loader
    1. Just run grub-install <install_device> or lilo -v to reinstall the original boot loader.

Rollback from Disaster

So the procedure works great when the update finishes and a reboot is successful. What if the server will not even boot after an update? How do you rollback the changes now?

Well there is an easy way and a more difficult way to do this. The easy way is to boot the installed system on the snapshot device and follow the procedure above to restore the system prior to update. The more difficult way is more flexible and generally works if boot installed system does not. It involves booting into rescue mode and restoring the file systems.

Procedure using Boot Installed System

The procedure using boot installed system is the easiest way to rollback a system that has failed to boot or experienced some other catastrophic failure. Test case one is an example of a system ramdisk that did not update properly.

Test Case #1: Ramdisk Update Failure

Click to view.

Figure 28 – Test Case #1: Ramdisk Update Failure

The steps are basically the same as “Reject the Update” above, except you are not going to do the initial reboot into the snapshot images. The reason is you are already in a failed state and must boot to the snapshot images anyway. So, you will have to restore /boot as the second step, instead of the first. I have summarized the steps below.

  1. Boot from CD1
  2. Select Installation
  3. Choose you language and keyboard language
  4. Select Other
  5. Select Boot Installed System

Boot Installed System

Click to view.

Figure 29 – Boot Installed System

  1. Choose /dev/sys/ssroot from which to boot.

NOTE: If /dev/sys/root was damaged or destroyed during the update process, then Boot Installed System will automatically pick /dev/sys/ssroot and boot from it.

Select Root Snapshot

Click to view.

Figure 30 – Select Root Snapshot

For details regarding the remaining steps, refer to Reject the Update above.

  1. Restore /boot
  2. Restore /home
  3. Restore / (root)
  4. Restore the /mnt/etc/fstab.original file
  5. Reboot normally
  6. The system has been rolled back
  7. Remove the LVM snapshot volumes
  8. Rollback the boot loader

Procedure using Rescue Mode

The rescue mode procedure is a bit more complicated, but not too bad. The init command itself is missing or damaged in test case two, and the server fails to boot after the update.

Test Case #2: init Binary Update Failure

Click to view.

Figure 31 – Test Case #2: init Binary Update Failure

This is a case where boot installed system will fail because the init binary is missing on the installed system. You must follow the procedure using rescue mode below to rollback the updates.

  1. Boot from CD1
  2. Select Rescue System
  3. Type “root” for Rescue login:
  4. The system should detect and activate all LVM volumes.
    1. If you need to manually activate LVM volumes, first run vgscan
    2. Next activate all detected volumes with vgchange -ay
    3. Run lvs to see all active volumes

Booting to Rescue Mode

Click to view.

Figure 32 – Booting to Rescue Mode

  1. Restore /boot
    1. Mount the boot device (/dev/sda1) to /mnt
    2. Recursively delete all files on the mounted boot partition, except bootback.tbz.
    3. Extract bootback.tbz to restore the boot file system

Restoring Boot

Click to view.

Figure 33 – Restoring Boot

    1. If the boot file system is unrecoverable; then you will need to reformat it, and restore it from the bootback.tbz you copied to the root file system or your taped backup.
    2. Unmount /mnt

Boot Restored

Click to view.

Figure 34 – Boot Restored

  1. Restore /home
    1. Reformat the LVM origin home device (/dev/sys/home)
    2. Mount the LVM snapshot device (/dev/sys/sshome) on /media
    3. Mount the reformatted LVM origin device (/dev/sys/home) on /mnt
    4. Recursively copy all files from the /media source to the /mnt destination
    5. Validate the copy
    6. Unmount /media and /mnt

Restoring Home

Click to view.

Figure 35 – Restoring Home

  1. Restore / (root)
    1. Reformat /dev/sys/root
    2. Mount /dev/sys/ssroot to /media
    3. Mount /dev/sys/root to /mnt
    4. Create an empty directory on /mnt for proc, sys and any other mount points you have for your system
    5. Copy all files from all other /media directories to the /mnt destination
    6. Validate the copy
    7. Restore the /mnt/etc/fstab.original file on the mounted origin root volume
    8. Unmount /media and /mnt

Restoring Root

Click to view.

Figure 36 – Restoring Root

  1. Reboot normally
  2. The system has been rolled back.
  3. Remove the LVM snapshot volumes.
  4. Reinstall the boot loader

Deleting Snapshots and Restoring Bootloader

Click to view.

Figure 37 – Deleting Snapshots and Restoring Bootloader

Conclusion

All updates should be tested prior to applying them to production servers. However, there are times that even the production server behaves unexpectedly. When this happens, it’s nice to have a rollback method that would allow you to rollback all the changes and get the production server back on line as quickly as possible. You should practice this procedure in a non-production test environment so you are proficient when and if you need to do the same for a production server.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Tags: , , ,
Categories: SUSE Linux Enterprise Server, Technical Solutions

Disclaimer: As with everything else at SUSE Conversations, this content is definitely not supported by SUSE (so don't even think of calling Support if you try something and it blows up).  It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.

1 Comment

  1. By:amit_rj27

    Very nice step by step info.Keep ‘em coming :)

Comment

RSS