Rollback SLES Online Updates with LVM Snapshots
Overview
Configuration
Prepare to Update
Update the System
Accept the Update
Reject the Update
Rollback from Disaster
Conclusion
Overview
This document is offered as an alternative solution to Rollback SLES Online Updates with Software RAID1. If you have your system disk configured with Logical Volume Management (LVM), you can implement this solution as a way to rollback a failed online update. You simply backup your boot directory, create an LVM snapshot of your system disk, and then update as usual. If you don’t like the update and want to rollback to your previous state, then you boot up the server using the snapshot images, and restore the older data from the snapshot back to the original LVM volumes. The purpose of this article is to detail this rollback procedure using LVM. The concept should be valid for EVMS as well, but an EVMS solution is outside the scope of this article.
A word of caution. The one thing this procedure does not rollback are application meta data changes. If an application updates it’s meta data format as a result of the update process and you decide to rollback the updates, the older application will most likely not do well with a new meta data format. Novell Storage Services (NSS) on Open Enterprise Server (OES) for Linux could experience something like this. I have not seen this happen with NSS, but it has the potential.
Configuration
I have installed SUSE Linux Enterprise Server 10 with Service Pack 2 (SLES) on an i386 machine with an minimum install and 512M of RAM. There is a /boot partition that is not LVM, and / and /home that are LVM volumes. I always hate it when examples only show the simplest way to do something, and then you are left to figure out more complicated ways that match your own system. So for this article I have included LVM devices for / and /home to show how to rollback multiple volumes.
Figure 1 – Configuration Details
Prepare to Update
Before you can proceed with a system update, you need to create snapshots of the system volumes in case you want to rollback the updates. The procedure to prepare for an update is:
- Perform a complete system backup to tape. No rollback procedure of any kind replaces a good backup solution.
- Backup the /boot directory. Since the boot partition cannot be an LVM volume due to boot loader limitations, it must be backed up separately; obviously no LVM snapshot is available for the boot partition.
Figure 2 – Boot Partition Backup
- Add a disk if necessary to extend the volume group. If in doubt, add a disk of the same size as your disk(s) currently in the volume group.
Figure 3 – Disk Added to Extend Volume Group
- Extend the volume group with sufficient space for the upgrade.
WARNING: If you don’t have sufficient space, you will get an error creating the snapshots.
Insufficient free extents (2) in volume group
Figure 4 – Extending sys Volume Group
Figure 5 – Volume Group Extended
- Create an original and rollback copy of the /etc/fstab file.
Figure 6 – Copies of fstab
- Create a snapshot volume for each volume that may be affected by the update.
Figure 7 – Creating Snapshot Volumes
Figure 8 – Snapshot Volumes Created
- Mount the system/root snapshot volume
- Use the rollback file /mnt/etc/fstab.rollback
Figure 9 – Prepare Rollback fstab
Update the System
Once you have prepared the LVM volumes with snapshots of the current system, you can proceed with a normal online update. Reboot the server to activate the new kernel and updated changes.
Figure 10 – Start Online Update
Figure 11 – Online Update in Progress
Figure 12 – Updated GRUB Menu
Accept the Update
Figure 13 – Updated System
Usually the update goes smoothly, and you want to accept the update. The only thing needed to accept the update is to remove the snapshots with lvremove.
Figure 14 – Remove Snapshots
Figure 15 – Updated System without Snapshots
Reject the Update
If for any reason you do not want the update on your server, you can rollback the changes. The typical reason for a rollback is unexpected behavior with third party applications/drivers. The procedure is:
- Restore /boot
- Reboot, using the snapshots. The snapshot image needs to be using an fstab that references the LVM snapshot devices for all mounted file systems (see Figure 9 – Prepare Rollback fstab in the Prepare to Update procedure above.)
Figure 16 – Restore Boot Partition
- Change to root=/dev/sys/ssroot on the GRUB boot options line.
Figure 17 – Booting to Snapshots
Figure 18 – Running System on Snapshots
- Reformat the LVM origin volume devices
WARNING: If the snapshots are too small, then you may get errors similar to this one when running pvscan or reading the snapshot volume.
/dev/dm-3: read failed after 0 of 4096 at 10736288: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
Running out of space on the LVM snapshot device has the highest probability of occurring after you reformat your device. This is bad timing and could irreparably break the server; requiring a complete reinstall of the operating system. So make sure you have enough space when you created your snapshots.
Figure 19 – Reformatting File Systems
- Restore /home
- Reformat /dev/sys/home
- Mount /dev/sys/home to /mnt
- Copy /home/* to /mnt
- Unmount /mnt
Figure 20 – Restoring Home
- Restore / (root)
- Reformat /dev/sys/root
- Mount /dev/sys/root to /mnt
- Create mount point and used directories in /mnt
- Copy the snapshot root file system to the origin root file system on /mnt
- Validate the root restore operation
Figure 21 – Restoring Root
- Restore the /mnt/etc/fstab.original file on the mounted origin root volume
Figure 22 – Restoring fstab
- Reboot normally
Figure 23 – Rebooting without Snapshots
Figure 24 – GRUB Menu Rolled Back
- The system has been rolled back, and is using the original restored LVM volumes.
Figure 25 – System Rolled Back
- Remove the LVM snapshot volumes.
Figure 26 – Removing Snapshots
At this point you might be wondering, “Why don’t I just just use the snapshot volumes and delete the original volumes?” LVM binds it’s snapshot volumes to the original volumes. The snapshots are just deltas of the data, so the original volume must be present as a reference point. If you attempt to delete the original volumes, you will get the following error:
rblvm:~ # lvremove /dev/sys/root Can't remove logical volume "root" under snapshot rblvm:~ # lvremove /dev/sys/home Can't remove logical volume "home" under snapshot
Figure 27 – System Rollback Complete
- Rollback the boot loader
- Just run grub-install <install_device> or lilo -v to reinstall the original boot loader.
Rollback from Disaster
So the procedure works great when the update finishes and a reboot is successful. What if the server will not even boot after an update? How do you rollback the changes now?
Well there is an easy way and a more difficult way to do this. The easy way is to boot the installed system on the snapshot device and follow the procedure above to restore the system prior to update. The more difficult way is more flexible and generally works if boot installed system does not. It involves booting into rescue mode and restoring the file systems.
Procedure using Boot Installed System
The procedure using boot installed system is the easiest way to rollback a system that has failed to boot or experienced some other catastrophic failure. Test case one is an example of a system ramdisk that did not update properly.
Figure 28 – Test Case #1: Ramdisk Update Failure
The steps are basically the same as “Reject the Update” above, except you are not going to do the initial reboot into the snapshot images. The reason is you are already in a failed state and must boot to the snapshot images anyway. So, you will have to restore /boot as the second step, instead of the first. I have summarized the steps below.
- Boot from CD1
- Select Installation
- Choose you language and keyboard language
- Select Other
- Select Boot Installed System
Figure 29 – Boot Installed System
- Choose /dev/sys/ssroot from which to boot.
NOTE: If /dev/sys/root was damaged or destroyed during the update process, then Boot Installed System will automatically pick /dev/sys/ssroot and boot from it.
Figure 30 – Select Root Snapshot
For details regarding the remaining steps, refer to Reject the Update above.
- Restore /boot
- Restore /home
- Restore / (root)
- Restore the /mnt/etc/fstab.original file
- Reboot normally
- The system has been rolled back
- Remove the LVM snapshot volumes
- Rollback the boot loader
Procedure using Rescue Mode
The rescue mode procedure is a bit more complicated, but not too bad. The init command itself is missing or damaged in test case two, and the server fails to boot after the update.
Figure 31 – Test Case #2: init Binary Update Failure
This is a case where boot installed system will fail because the init binary is missing on the installed system. You must follow the procedure using rescue mode below to rollback the updates.
- Boot from CD1
- Select Rescue System
- Type “root” for Rescue login:
- The system should detect and activate all LVM volumes.
- If you need to manually activate LVM volumes, first run vgscan
- Next activate all detected volumes with vgchange -ay
- Run lvs to see all active volumes
Figure 32 – Booting to Rescue Mode
- Restore /boot
- Mount the boot device (/dev/sda1) to /mnt
- Recursively delete all files on the mounted boot partition, except bootback.tbz.
- Extract bootback.tbz to restore the boot file system
Figure 33 – Restoring Boot
-
- If the boot file system is unrecoverable; then you will need to reformat it, and restore it from the bootback.tbz you copied to the root file system or your taped backup.
- Unmount /mnt
Figure 34 – Boot Restored
- Restore /home
- Reformat the LVM origin home device (/dev/sys/home)
- Mount the LVM snapshot device (/dev/sys/sshome) on /media
- Mount the reformatted LVM origin device (/dev/sys/home) on /mnt
- Recursively copy all files from the /media source to the /mnt destination
- Validate the copy
- Unmount /media and /mnt
Figure 35 – Restoring Home
- Restore / (root)
- Reformat /dev/sys/root
- Mount /dev/sys/ssroot to /media
- Mount /dev/sys/root to /mnt
- Create an empty directory on /mnt for proc, sys and any other mount points you have for your system
- Copy all files from all other /media directories to the /mnt destination
- Validate the copy
- Restore the /mnt/etc/fstab.original file on the mounted origin root volume
- Unmount /media and /mnt
Figure 36 – Restoring Root
- Reboot normally
- The system has been rolled back.
- Remove the LVM snapshot volumes.
- Reinstall the boot loader
Figure 37 – Deleting Snapshots and Restoring Bootloader
Conclusion
All updates should be tested prior to applying them to production servers. However, there are times that even the production server behaves unexpectedly. When this happens, it’s nice to have a rollback method that would allow you to rollback all the changes and get the production server back on line as quickly as possible. You should practice this procedure in a non-production test environment so you are proficient when and if you need to do the same for a production server.
Comments
Very nice step by step info.Keep ’em coming 🙂