Block Storage, Object Storage, and File Systems: What They Mean for Containers

Share

One of the things that often surprises administrators when they first
begin working with Docker containers is the fact that containers
natively use non-persistent storage. When a container is removed, so too
is the container’s storage. Of course, containerized applications would
be of very limited use if there were no way of enabling persistent
docker container storage. Fortunately, there are ways to implement persistent storage in
a containerized environment. Although a container’s own native storage
is non-persistent, a container can be connected to storage that is
external to the container. This allows for the storage of persistent
data, since this external storage is not removed when a container is
stopped. The first step in deciding how to go about implementing
persistent storage for your containers is to determine the underlying
type of storage system that you will use. In this regard, there are
three main options that are generally available: File system storage,
block storage, and object storage. Below, I explain the differences
between each type of storage and what they mean when it comes to setting
up storage for a containerized environment.

File System Storage

File system storage has been around for decades, and stores data as
files. Each file is referenced by a filename, and typically has
attributes associated with it. Some of the more commonly used file
systems include NFS and NTFS. When it comes to configuring a container
to store data persistently, file system storage is one of the simplest
options to implement. The likely best known example of file system
storage (as it relates to containers), is host-based persistence. The
idea behind host-based persistence is really simple. Containers reside
on a host server. This host server contains its own operating system,
and its own file system. Containers can be configured to store
persistent data within a dedicated folder on the host server’s file
storage. Docker containers normally use a union file system to assemble
the container layers into a cohesive file structure. Host-based
persistence bypasses the union file system for the data that needs to be
stored persistently, and stores that data using the same file system
that is in use on the host. The primary problem caused by simple host
persistence is that it completely undermines container portability. When
host persistence is used, a dependency resource (the persistent storage)
resides outside of the container, on the host server’s native file
system. To get around this problem, other flavors of host persistence
have been created. For example, multi-host persistence uses a
distributed file system to replicate persistent storage across multiple
host servers. [The takeaway:** File system storage is probably the
most awkward match for containers because file systems were not
originally designed with portability in mind. As I’ve noted, however,
there are ways to implement container-friendly file storage systems;
this is usually done by distributing a file system across multiple
servers.]**

Block Storage

Block storage is another storage option for containers. As previously
mentioned, file system storage organizes data into a hierarchy of files
and folders. In contrast, block storage stores chunks of data in blocks.
A block is identified only by its address. A block has no filename, nor
does it have any metadata of its own. Blocks only become meaningful when
they are combined with other blocks to form a complete piece of data.
Block storage is commonly used for database applications because of its
performance. Block storage is also generally used to provide
snapshotting capabilities, which allow a volume to be rolled back to a
specific point in time, without having to restore a backup. In the case
of containers, block storage is sometimes implemented in the form of
container-defined storage. Container-defined storage is a form of
software-defined storage, but is specifically intended for use in a
containerized environment. This storage is often implemented inside of a
dedicated storage container. Rancher Labs has introduced its own
distributed block storage project, called Project
Longhorn
. The basic
idea behind Longhorn is relatively simple. A storage system can contain
numerous block storage volumes, and each of these volumes can only be
mounted by a single host. That being the case, Longhorn seeks to
partition block storage controllers into large numbers of smaller block
storage controllers, each of which can be mapped to a different block
storage volume. If all of these block storage volumes reside within a
common pool of physical disks, then the Longhorn approach will allow an
orchestration engine to create block storage volumes on an on-demand
basis. For example, a block storage volume could conceivably be
automatically created at the same time that a container is created.
[The takeaway: Block storage is more flexible than file system
storage, which makes it easier to adapt block storage for container
environments. The only big challenge is making sure that block storage
data is available across an environment composed of multiple hosts. This
can be resolved through distributed
storage.]

Object Storage

Object storage works differently from file system storage or block
storage. Rather than referencing data by a block address or a file name,
data is stored as an object and is referenced by an object ID. The
advantages of object storage are that it is massively scalable, and
allows for a high degree of flexibility with regard to associating
attributes with objects. The disadvantage to using object storage is
that it does not perform as well as block storage. Because object
storage is designed primarily for scalability, it is a popular choice
for public cloud providers. Docker containers can be linked to object
storage on Amazon Web Services or Microsoft Azure, but doing so requires
that the containerized application be specifically designed to take
advantage of object storage. Whereas a typical application might be
designed to access data through file system or SCSI calls, object
storage requires HTTP-based REST calls such as Get or Put. As such,
object storage should generally be saved for applications that need
massively scaled storage, or storage that needs to traverse geographic
boundaries. The takeaway: [Object storage can be more complex to
implement because it relies on REST calls, but the scalability that
object storage provides makes it a good choice for container
environments where massive scalability is a
priority.]
Brien Posey is a freelance
technology author and 15-time Microsoft MVP. Prior to going freelance,
Posey was a CIO for a national chain of hospitals and healthcare
facilities. He also served as Lead Network Engineer for the United
States Department of Defense at Fort Knox. In addition to Posey’s
continued work in IT, Posey is in his third year of training as a
commercial scientist-astronaut candidate.

(Visited 1 times, 1 visits today)