Managing configuration drift with Salt and Snapper
Many configuration management tools originate in the DevOps space and become immensely popular and while they do manage configuration, they are tailored towards deployment of new servers using this configuration and not towards auditing of existing servers.
For example, let’s imagine a server with the following state:
/etc/motd: file.managed: - source: salt://common/motd
If we apply this state (in `test` mode) on a non-compliant server:
$ salt minion1 state.apply test=True minion1: ---------- ID: /etc/motd Function: file.managed Result: None Comment: The file /etc/motd is set to be changed Started: 10:06:05.021643 Duration: 30.339 ms Changes: ---------- diff: --- +++ @@ -1 +1 @@ -Have a lot of fun... +This is my managed motd Summary for minion1 ------------ Succeeded: 1 (unchanged=1, changed=1) Failed: 0 ------------ Total states run: 1
Salt is able to tell us that there is a file that deviates from the configuration. And we can easily fix it by just removing `test=True`.
Now, let’s say an intruder adds a malicious entry to `/etc/hosts`:
If we re run our state in test mode:
$ salt minion1 state.apply test=True minion1: ---------- ID: /etc/motd Function: file.managed Result: None Comment: The file /etc/motd is set to be changed Started: 10:12:11.518105 Duration: 29.479 ms Changes: ---------- diff: --- +++ @@ -1 +1 @@ -Have a lot of fun... +This is my managed motd Summary for minion1 ------------ Succeeded: 1 (unchanged=1, changed=1) Failed: 0 ------------ Total states run: 1
As expected, it did not find anything, because this rule is not in the configuration.
Creating new systems vs auditing existing systems
This model works fine in the DevOps world where the culture is to take a random Linux image from the internet and use it as a base to deploy systems from scratch. As long as all tests pass, replacing the underlying image is not a problem. Only what is explicitly defined is evaluated against the configuration and defined as a drift.
When meeting enterprise customers who are starting to use configuration management to improve the control on their infrastructure, it turns out their expectations where different. “If I use Salt, will it tell me when somebody makes a change to the system?”. “Ugh.. no… well depends…”.
That was the point that I started to think about – how could we use the state system to do more generic auditing and how do you do it without ruining the experience of working with states? Then it all clicked – implicit state can be done explicitly by using another state. When the customer said “any change”, they were in reality saying “any change against my defined configuration” plus “any change since my last working configuration”.
So, we needed a way to manage “last working configuration” and turns out SUSE is where Snapper originated and Snapper is nowadays available with most Linux distributions.
Snapper is a set of tools over snapshots (mostly btrfs, but also works on others like ext4 if you have the required kernel/tool patches). Think of it of what docker did to containers, snapper does to snapshots. It adds the required workflows, terminology and tools to make them usable.
It also turns out that my system already has some snapshots, because just like I can manually take one, tools like YaST and zypper take snapshots before and after doing operations. You can even select previous snapshots from the bootloader and boot into the previous working system.
What if I could describe a state in Salt that said: “Nothing deviates from this snapshots, except….”.
Let’s do it
So during this year Department workshop I paired with Pablo and our project had the following steps:
- Complete the Salt execution module to expose the basic snapper operations you can do from the command line. Example:
salt minion1 snapper.create_snapshot
- Create a generic way for sysadmins to do Salt operations which can be reverted. We implemented this as a meta-call (a call taking another call as a parameter)
. So you can do something like:
$ salt minion2 snapper.run function=file.append args='["/etc/motd", "some text"]' minion2: Wrote 1 lines to "/etc/motd"
This will generate a snapshot before running the command, run the command and then take a snapshot afterwards, also adding metadata about the Salt job that did the change:
... pre | 21 | | Thu Jun 9 10:34:36 2016 | root | number | salt job 20160609103437556668 | salt_jid=20160609103437556668 post | 22 | 21 | Thu Jun 9 10:34:37 2016 | root | number | salt job 20160609103437556668 | salt_jid=20160609103437556668
Because in Salt, state is implemented as a method `state.apply` or `state.highstate`, calling `snapper.run function=state.apply` means you can rollback a failed `state.apply`.
And of course we not only exposed `snapper.diff` which takes the snapshot number but also a `snapper.diff_jid` which tells you what a Salt job changed:
$ salt minion2 snapper.diff_jid 20160609103437556668 minion2: ---------- /etc/motd: --- /.snapshots/21/snapshot/etc/motd +++ /.snapshots/22/snapshot/etc/motd @@ -1 +1,2 @@ Have a lot of fun... +some text
Additionally, you get `snapper.undo_jid` which you can guess what it does: it undoes the changes done by a specific salt job (which of course could be a “`state.apply“` run).
- And finally, allowing a system administrator to use snapshots as a baseline to apply state. Lets take the original example with the malicious user modifying `/etc/hosts’, we will add a snapper state rule:
my_baseline: snapper.baseline_snapshot: - number: 20 - ignore: - /var/log - /var/cache /etc/motd: file.managed: - source: salt://common/motd
Now we apply the state in test mode again:
$ salt minion1 state.apply test=True minion1: ---------- ID: my_baseline Function: snapper.baseline_snapshot Result: None Comment: 1 files changes are set to be undone Started: 12:20:24.899848 Duration: 1051.996 ms Changes: ---------- files: ---------- /etc/hosts: ---------- actions: - modified comment: text file diff: --- /etc/hosts +++ /.snapshots/21/snapshot/etc/hosts @@ -22,5 +22,3 @@ ff02::3 ipv6-allhosts -192.168.1.34 www.google.com - ---------- ID: /etc/motd Function: file.managed Result: None Comment: The file /etc/motd is set to be changed Started: 12:20:25.953348 Duration: 20.425 ms Changes: ---------- diff: --- +++ @@ -1 +1 @@ -Have a lot of fun... +This is my managed motd Summary for minion1 ------------ Succeeded: 2 (unchanged=2, changed=2) Failed: 0 ------------ Total states run: 2
Exactly what we expect!.
So with this you can use your configuration management to manage your state against a defined state and on top of that we give you the tooling to inspect and rollback configuration changes.
We will continue adding the missing pieces to give the administrators full overview and control over their running systems.
This content was originally published in the author’s personal blog
The team working on Salt at SUSE is hiring