Software containerization is unlikely to be at the top of the list of considerations for the average storage administrator. First and foremost storage professionals are going to be concerned with every enterprise’s most pressing problem: managing the ‘explosive’ volume of data within the constraints of their limited IT budget.
However, there are compelling reasons why storage professionals should be thinking about containerization as how its rapid adoption will impact the roadmap of your existing storage providers and drive changes to your own organization’s data storage strategy.
Let’s take a moment to get a basic understanding of what containerization is all about, look a little deeper at those implications to your storage strategy and also discuss how open source software-defined storage can help you with your data storage challenges.
Software containers 101 for storage professionals: the latest in a long line of abstractions
Containerization is the latest answer to the age-old development problem of trying to get the software to run properly when it’s moved from one computing environment to another. That could be from the developer’s own laptop to a testing environment, from the testing environment to production, from physical to virtual machines or from on-premise to the cloud. For example, let’s assume an application is built in Python 2.7 on a laptop, but the test environment is Python 3.0, running on Linux VMs in AWS, and the production environment is on-premise using a different version of Linux – there are multiple configuration mismatches within the application lifecycle.
The differences in the underlying OS, compute, and requirements for access to storage resources create all kinds of unwelcome impacts, resulting in lengthened development cycles and a poor DevOps experience.
Containers package up services or applications so they can be abstracted from the underlying environment in which they run. Just like server or storage virtualization, you’re ‘abstracting’ the hardware from the software. Containerization enables applications to be moved ‘seamlessly’ across different hardware and software environments.
As with many of the latest trends in IT, container adoption is being driven by the public cloud ‘hyperscalers’ – Google, in particular, has been huge container champions. Massive public cloud providers need to make hardware and software much less expensive, and their default approach to cost reduction is to use ‘white box’ commodity hardware and open source software. The Docker open source project is behind the big noise in containerization – and the dominant approach to building, delivering and scaling applications running on containers at scale is the open source Kubernetes project.
Software containerization is being driven by the hyperscalers with the promise to reduce the cost and speed up the process of application development, but what are the implications for data storage.
Storage on Containers is Hard.
Containers are built with statelessness as a principle and are by design ephemeral – Google starts and destroys a staggering 2 billion containers a week. As there is no data to be saved or migrated, there’s no need to do disk reads or writes.
However, real-world production applications most often have underlying databases; and with a database, the state needs to be preserved. This means the application must have access to physical storage. Inevitably, somewhere along the line in the application development cycle, there’s a point where developers will need access to physical storage resources – and either they will come to the storage team asking for help, or, they’ll try to do it all in the cloud by themselves. Either way, it’s essential that storage professionals consider the storage requirements of developers working with containers. I’m deliberately keeping this blog at a high level – but if you would like a deeper-dive on containers and persistent storage then I recommend this talk from Google’s Saad Ali.
Kubernetes communicates with storage using control plane interfaces called volume plugins. These plugins enable storage abstraction and make the storage portable with the container. As a result, all the major storage players are building API level connectivity with container volume plugins. Ironically, proprietary vendors are now starting to see their roadmaps driven by open source technologies. The market demand is literally forcing them to develop to and align with open source. How does this approach stand up over the long term for proprietary storage providers? Well, in our view, it doesn’t.
The Open Source and Proprietary Ethos Mismatch: Something’s Gotta Give.
How is it that huge teams of highly skilled engineers freely give of their own time and expertise in exchange for nothing more than the use of the technology which they’ve helped develop?
The answer lies in the open source ethos. It is in how open source contributors think, work and behave, and it’s driven by deeply held personal ethical convictions. Developers on open source projects see themselves as underdogs, pitted against proprietary giants, on a mission to advance mankind with superior engineering, to make knowledge public rather than private property. They are literally trying to make the world a better place.
Engineers working on open source projects like Linux, Ceph, Docker, Kubernetes Apache – and any number of others – collaborate on GitHub and naturally seek to use the codebase of other open source projects, to engineer for open source connectivity. Open source developers reuse and leverage the open source building blocks that are at hand. The adoption of open source technology is aided by the hyperscalers and their dedication to low-cost commodity hardware and open source software for innovation and cost control. Open source projects have a natural tendency to converge, to join up, and to work together because the people creating the open source technology have a deep-seated desire and commitment to make it happen.
Against this backdrop of the ever-growing communities of open source developers, the limited development capability of isolated, commercially operated silos has little chance of long-term success. This is why it is clear – the future is open source.