Case files of a TSE: A CoW ate my filesystem

Share
Share

It’s not a bug, it’s a feature!


This is an article which is part of a series that attempts to showcase the kind of work that SUSE Support does and how we help customers resolve issues they encounter when running SUSE products. The cases that are selected will be based on real cases. However, all details will be fully anonymized and stripped of identifying marks.
Part of our job at technical support is help customers who believe the software to be malfunctioning, but are not aware of its usage, features and limitations. I know this just sounds like a fancy turn the classic catchphrase “it’s not a bug, it’s a feature!”
Here is an excerpt from the Jargon Lexicon archive about that phrase if you are unfamiliar:
If you want to read other humorous Linux jargon definitions, the entire jargon archive is here:
Most often than not, when customers come in complaining of a defect or lack of a feature, the issue is usage. The chosen utility is not appropriate for the task. What the customer wishes to achieve can actually be done by the utility by specifying the correct command line option. There is a complex interaction between several utilities that is documented, but not well known etc.

What’s taking up all this space?


A good example is a customer who wrote in complaining of a full btrfs filesystem when the data that they were using there should take up about 25% of the space!
First things first, we must know a little about btrfs. Btrfs is now the default root filesystem type for SLES. It’s not mandatory, ext4 is supported too. Btrfs information resources are available on the btrfs wiki, SUSE product documentation and, indeed, the man pages of btrfs:
Btrfs (usually pronounced butter-ef-es) is a copy on write (CoW) filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. 

What is that CoW, and why is it eating my space?


Essentially, rather than replacing old data in a file when it’s changed, it copies the new data and then points to that new data. That old data? It will still be there, most conveniently in the form of “snapshots”.
We’ve all performed an update that broke our custom program, we’ve made a configuration change that we deeply regret. With btrfs we can very simply revert the system to an earlier snapshot if you have made one. It’s that time machine utility that system administrators have been hoping for all these years! Btrfs definitely gets many users (including me) out of tight jams.
Unfortunately, all these new advantages also take some getting used to and take some learning. Btrfs is not like ext4 or xfs. Back to the customer in question, they couldn’t figure out why so much space was being taken by their data. It should have been 5GB, nowhere near the 20GB disk capacity!

Right tools for the job


Long story short, the customer was using the built-in “df” and “du” utilities. These don’t give the whole picture because they don’t understand CoW filesystems, how they interact with various RAID levels or space taken up by snapshots. So, instead we should use “btrfs fi show” or “btrfs fi usage” comands for btrfs filesytems. I give a quick run down and a sample of btrfs fi show output here in this TID I wrote about what to do when you encounter “ENOSPC” errors during balancing of a btrfs filesystem.
So now that I had explained the outlines of btrfs, and its utilities, the customer was more understanding.

The actual problem


We didn’t just help the customer select the correct utility to display their issue, we incidentally found and shared the reason the space was filling up. The underlying issue had been that the customer had been running a database on /var/cache. Now, btrfs isn’t really beneficial for databases since files change quite frequently and fragment the files unnecessarily, fill up metadata and drag performance because of CoW. I’d advise against using btrfs for a database. If you are going to be running a small database on a btrfs filesystem, make sure to use the NODATACOW option when creating it and don’t use snapshots for the filesystem. The relevant product documentation below:
I advised the customer on those limitations. I also suggested that they use the possibility of setting a quota so that their root filesystem doesn’t fill up unexpectedly and cause problems like the one seen in this TID.   Here is a link to the documentation on how to setup quotas.
More and more distributions are adopting btrfs and including it as a default. We pride ourselves at SUSE for being ahead of the curve and can offer support for Btrfs as the default root filesystem since SLES 12. Our expertise of this filesystem and extensive contributions to its codebase enable our customers to already be in the new age of Linux filesystems. Thank you for finishing this article. As a treat, here is a picture of a silly cow giving some bad advice:
 ______________________________________  
< Only ~CoW~ards take regular backups! > 
 --------------------------------------  
     \ 
      \ 
        ,__, |    |  
        (oo)\|    |___ 
        (__)\|    |   )\_ 
             |    |_w |  \ 
             |    |  ||   *
made with cowsay, which is available as a package in openSUSE :)
Share
(Visited 1 times, 1 visits today)

Leave a Reply

Your email address will not be published. Required fields are marked *

No comments yet

Anthony Stalker
398 views
Anthony Stalker

I’m a Frontline Technical Support Engineer working in the EMEA region.

If you open a Support Case with SUSE for a technical problem, maybe I will be the person who helps you get to a solution.