How to make best use of SUSE Linux Enterprise Micro self healing capabilities
SUSE Linux Enterprise Micro (SLE Micro) ensures deployments stay healthy and operational to run your workloads.
How does it work?
Unlike traditional Linux distributions, SLE Micro is a very small Linux Operating System (OS) and focuses on running your workloads, either as containers or VMs.
On traditional Linux, when applying maintenance or security updates, some parts of the OS (a library for instance) can be replaced while still being used by other parts, which can cause instability.
On SLE Micro, changes to the core are done atomically and are called a transaction. They can be changes to OS configuration or software stack. A transaction is done on a snapshot of the root filesystem, is only active after reboot and managed by transactional-update
tool. To ensure SLE Micro integrity regarding transitions, OS core is read-only and can not be modified (even by root) while it is running. SLE Micro is also able to monitor system health for the various transactions.
If one RPM package installation (or removal or upgrade) fails:
- On a traditional Linux distribution, the system would be in undefined state, which could include left-over files or processes running.
- On SLE Micro, this package failure is detected by
transactional-update
. Snapshot is discarded and the running system stays intact.
With each OS change being stored in snapshots, it is possible to go back in time and rollback to a previously known working state.
When preparing system deployment, default setup is sometimes not enough and we need to create additional checks which are specific for our use-case.
Thanks to SLE Micro built-in health-checker
,you can add more tests which will run at system startup and increase system reliability.
By default, health-checker
performs some basic tests on the system at boot time.
If those tests fail, health-checker
will:
- try to restart the failing services, if the system was already booted successfully (in the current snapshot),
- rollback to the last known working state if it is the first time the snapshot is used or if restarting the failing service didn’t work.
Health-checker
can be extended via additional plugins, written with your favorite programming or scripting language.
Customizing health-checker
As an example, let’s create a plugin to verify sshd
service is starting properly (code available at https://github.com/fcrozat/SUSECON-demos/blob/main/VM/deploy-container/sshd.sh ):
#!/bin/bash run_checks() { systemctl is-failed -q sshd test $? -ne 1 && exit 1 } stop_services() { systemctl stop sshd } case "$1" in check) run_checks ;; stop) stop_services ;; *) echo "Usage: $0 {check|stop}" exit 1 ;; esac exit 0
Copy this file to /usr/local/libexec/health-checker/
with executable permissions.
Reboot the system to ensure this new test is not causing any regression.
You can now test it works as designed: uninstall openssh-server
with transactional-update pkg rm openssh-server
and reboot. Check the system closely during boot and you will notice it will boot twice: first time with the change (removing openssh-server
), but health-checker
detected sshd
not starting properly and rolled back to the previous working state, which causes the second reboot to activate the older snapshot. You can check this using journalctl -u health-checker.service
.
You are now ready to create additional tests for your deployments.
In our next blog post, we will look into how to easily deploy / update / rollback containers on SLE Micro.
Related Articles
Jan 30th, 2023