Challenges and Solutions with Cloud Native Persistent Storage

Wednesday, 18 January, 2023

Persistent storage is essential for any account-driven website. However, in Kubernetes, most resources are ephemeral and unsuitable for keeping data long-term. Regular storage is tied to the container and has a finite life span. Persistent storage has to be separately provisioned and managed.

Making permanent storage work with temporary resources brings challenges that you need to solve if you want to get the most out of your Kubernetes deployments.

In this article, you’ll learn about what’s involved in setting up persistent storage in a cloud native environment. You’ll also see how tools like Longhorn and Rancher can enhance your capabilities, letting you take full control of your resources.

Persistent storage in Kubernetes: challenges and solutions

Kubernetes has become the go-to solution for containers, allowing you to easily deploy scalable sites with a high degree of fault tolerance. In addition, there are many tools to help enhance Kubernetes, including Longhorn and Rancher.

Longhorn is a lightweight block storage system that you can use to provide persistent storage to Kubernetes clusters. Rancher is a container management tool that helps you with the challenges that come with running multiple containers.

You can use Rancher and Longhorn together with Kubernetes to take advantage of both of their feature sets. This gives you reliable persistent storage and better container management tools.

How Kubernetes handles persistent storage

In Kubernetes, files only last as long as the container, and they’re lost if the container crashes. That’s a problem when you need to store data long-term. You can’t afford to lose everything when the container disappears.

Persistent Volumes are the solution to these issues. You can provision them separately from the containers they use and then attach them to containers using a PersistentVolumeClaim, which allows applications to access the storage:

Diagram showing the relationship between container application, its own storage and persistent storage courtesy of James Konik

However, managing how these volumes interact with containers and setting them up to provide the combination of security, performance and scalability you need bring further issues.

Next, you’ll take a look at those issues and how you can solve them.

Security

With storage, security is always a key concern. It’s especially important with persistent storage, which is used for user data and other critical information. You need to make sure the data is only available to those that need to see it and that there’s no other way to access it.

There are a few things you can do to improve security:

Use RBAC to limit access to storage resources

Role-based access control (RBAC) lets you manage permissions easily, granting users permissions according to their role. With it, you can specify exactly who can access storage resources.

Kubernetes provides RBAC management and allows you to assign both Roles, which apply to a specific namespace, and ClusterRoles, which are not namespaced and can be used to give permissions on a cluster-wide basis.

Tools like Rancher also include RBAC support. Rancher’s system is built on top of Kubernetes RBAC, which it uses for enforcement.

With RBAC in place, not only can you control who accesses what, but you can change it easily, too. That’s particularly useful for enterprise software managers who need to manage hundreds of accounts at once. RBAC allows them to control access to your storage layer, defining what is allowed and changing those rules quickly on a role-by-role level.

Use namespaces

Namespaces in Kubernetes allow you to create groups of resources. You can then set up different access control rules and apply them independently to each namespace, giving you extra security.

If you have multiple teams, it’s a good way to stop them from getting in each other’s way. It also keeps its resources private to their namespace.

Namespaces do provide a layer of basic security, compartmentalizing teams and preventing users from accessing what you don’t want them to.

However, from a security perspective, namespaces do have limitations. For example, they don’t actually isolate all the shared resources that the namespaced resources use. That means if an attacker gets escalated privileges, they can access resources on other namespaces served by the same node.

Scalability and performance

Delivering your content quickly provides a better user experience, and maintaining that quality as your traffic increases and decreases adds an additional challenge. There are several techniques to help your apps cope:

Use storage classes for added control

Kubernetes storage classes let you define how your storage is used, and there are various settings you can change. For example, you can choose to make classes expandable. That way, you can get more space if you run out without having to provision a new volume.

Longhorn has its own storage classes to help you control when Persistent Volumes and their containers are created and matched.

Storage classes let you define the relationship between your storage and other resources, and they are an essential way to control your architecture.

Dynamically provision new persistent storage for workloads

It isn’t always clear how much storage a resource will need. Provisioning dynamically, based on that need, allows you to limit what you create to what is required.

You can have your storage wait until a container that uses it is created before it’s provisioned, which avoids the wasted overhead of creating storage that is never used.

Using Rancher with Longhorn’s storage classes lets you provision storage dynamically without having to rely on cloud services.

Optimize storage based on use

Persistent storage volumes have various properties. Their size is an obvious one, but latency and CPU resources also matter.

When creating persistent storage, make sure that the parameters used reflect what you need to use it for. A service that needs to respond quickly, such as a login service, can be optimized for speed.

Using different storage classes for different purposes is easier when using a provider like Longhorn. Longhorn storage classes can specify different disk technologies, such as NVME, SSD, or rotation, and these can be linked to specific nodes allowing you to match storage to your requirements closely.

Stability

Building a stable product means getting the infrastructure right and aggressively looking for errors. That way, your product quality will be as high as possible.

Maximize availability

Outages cost time and money, so avoiding them is an obvious goal.

When they do occur, planning for them is essential. With cloud storage, you can automate reprovisioning of failed volumes to minimize user disruption.

To prevent data loss, you must ensure dynamically provisioned volumes aren’t automatically deleted when a resource is done with them. Kubernetes enables the use protection on volumes, so they aren’t immediately lost.

You can control the behavior of storage volumes by setting the reclaim policy. Picking the retain option lets you manually choose what to do with the data and prevents it from being deleted automatically.

Monitor metrics

As well as challenges, working with cloud volumes also offers advantages. Cloud providers typically include many strong options for monitoring volumes, facilitating a high level of observability.

Rancher makes it easier to monitor Kubernetes clusters. Its built-in Grafana dashboards let you view data for all your resources.

Rancher collects memory and CPU data by default, and you can break this data down by workload using PromQL queries.

For example, if you wanted to know how much data was being read to a disk by a workload, you’d use the following PromQL from Rancher’s documentation:


sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)

Longhorn also offers a detailed selection of metrics for monitoring nodes, volumes, and instances. You can also check on the resource usage of your manager, along with the size and status of backups.

The observability these metrics provide has several uses. You should log any detected errors in as much detail as possible, enabling you to identify and solve problems. You should also monitor performance, perhaps setting alerts if it drops below any particular threshold. The same goes for error logging, which can help you spot issues and resolve them before they become too serious.

Get the infrastructure right for large products

For enterprise-grade products that require fast, reliable distributed block storage, Longhorn is ideal. It provides a highly resilient storage infrastructure. It has features like application-aware snapshots and backups as well as remote replication, meaning you can protect your data at scale.

Longhorn provides enterprise-grade distributed block storage and facilitates deploying a highly resilient storage infrastructure. It lets you provision storage on the major cloud providers, with built-in support for AzureGoogle Cloud Platform (GCP) and Amazon Web Services (AWS).

Longhorn also lets you spread your storage over multiple availability zones (AZs). However, keep in mind that there can be latency issues if volume replicas reside in different regions.

Conclusion

Managing persistent storage is a key challenge when setting up Kubernetes applications. Because Persistent Volumes work differently from regular containers, you need to think carefully about how they interact; how you set things up impacts your application performance, security and scalability.

With the right software, these issues become much easier to handle. With help from tools like Longhorn and Rancher, you can solve many of the problems discussed here. That way, your applications benefit from Kubernetes while letting you keep a permanent data store your other containers can interact with.

SUSE is an open source software company responsible for leading cloud solutions like Rancher and Longhorn. Longhorn is an easy, fast and reliable Cloud native distributed storage platform. Rancher lets you manage your Kubernetes clusters to ensure consistency and security. Together, these and other products are perfect for delivering business-critical solutions.

Scanning Secrets in Environment Variables with Kubewarden

Monday, 24 October, 2022

We are thrilled to announce you can now scan your environment variables for secrets with the new env-variable-secrets-scanner-policy in Kubewarden! This policy rejects a Pod or workload resources such as Deployments, ReplicaSets, DaemonSets , ReplicationControllers, Jobs, CronJobs etc. if a secret is found in the environment variable within a container, init container or ephemeral container. Secrets that are leaked in plain text or base64 encoded variables are detected. Kubewarden is a policy engine for Kubernetes. Its mission is to simplify the adoption of policy-as-code.

This policy uses rusty hog, an open source secret scanner from New Relic. The policy looks for the following secrets being leaked: RSA private keys, SSH private keys and API tokens for different services like Slack, Facebook tokens, AWS, Google, New Relic Keys, etc

This is a perfect example of the real power of Kubewarden and WebAssembly! We didn’t have to write all the complex code and regular expressions for scanning secrets. Instead, we used an existing open source library that already does this job. We can do this because Kubewarden policies are delivered as WebAssembly binaries.

Have an idea for a new Kubewarden policy? You don’t need to write all the code from scratch! You can use your favorite libraries in any of the supported programming languages, as long as they can be compiled to WebAssembly.

Let’s see it in action!

For this example, a Kubernetes cluster with Kubewarden already installed is required. The installation process is described in the quick start guide.

Let’s create a ClusterAdmissionPolicy that will scan all pods for secrets in their environment variables:

kubectl apply -f - <<EOF
apiVersion: policies.kubewarden.io/v1
kind: ClusterAdmissionPolicy
metadata:
  name: env-variable-secrets
spec:
  module: ghcr.io/kubewarden/policies/env-variable-secrets-scanner:v0.1.2
  mutating: false
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    resources: ["pods", "deployments", "replicasets", "daemonsets", "replicationcontrollers", "jobs", "cronjobs"]
    operations:
    - CREATE
    - UPDATE
EOF

Verify we are not allowed to create a Pod with an RSA private key

kubectl apply -f - <<EOF                                                                  
apiVersion: v1     
kind: Pod
metadata:
  name: secret
spec:
  containers:
    - name: nginx
      image: nginx:latest
      env:
        - name: rsa
          value: "-----BEGIN RSA PRIVATE KEY-----\nMIICWwIBAAKBgHnGVTJSU+8m8JHzJ4j1/oJxc/FwZakIIhCpIzDL3sccOjyAKO37\nVCVwKCXz871Uo+LBWhFoMVnJCEoPgZVJFPa+Om3693gdachdQpGXuMp6fmU8KHG5\nMfRxoc0tcFhLshg7luhUqu37hAp82pIySp+CnwrOPeHcpHgTbwkk+dufAgMBAAEC\ngYBXdoM0rHsKlx5MxadMsNqHGDOdYwwxVt0YuFLFNnig6/5L/ATpwQ1UAnVjpQ8Y\nmlVHhXZKcFqZ0VE52F9LOP1rnWUfAu90ainLC62X/aKvC1HtOMY5zf8p+Xq4WTeG\nmP4KxJakEZmk8GNaWvwp/bn480jxi9AkCglJzkDKMUt0MQJBAPFMBBxD0D5Um07v\nnffYrU2gKpjcTIZJEEcvbHZV3TRXb4sI4WznOk3WqW/VUo9N83T4BAeKp7QY5P5M\ntVbznhcCQQCBMeS2C7ctfWI8xYXZyCtp2ecFaaQeO3zCIuCcCqv+AyMQwX6GnzNW\nnVvAeDAcLkjhEqg6QW5NehcfilJbj2u5AkEA5Mk5oH8f5OmdtHN36Tb14wM5QGSo\n3i5Kk+RAR9dT/LvmlAJgkzyOyJz/XHz8Ycn8S2yZjXkHV7i+7utWiVJGEwJAOhXN\nh0+DHs+lkD8aK80EP8X5SQSzBeim8b2ukFl39G9Cn7DvCuWetk1vR/yBXNouaAr0\nWaS7S9gdd0/AMWws+QJAGjYTz7Ab9tLGT7zCTSHPzwk8m+gm4wMfChN4yAyr1kac\nTLzJZaNLjNmAfUu5azZTJ2LG9HR0B7jUyQm4aJ68hA==\n-----END RSA PRIVATE KEY-----"
EOF

This will produce the following output:

Error from server: error when creating "STDIN": admission webhook "clusterwide-env-variable-secrets.kubewarden.admission" denied
the request: The following secrets were found in environment variables -> container: nginx, key: rsa, reason: RSA private key. 

Check it out and let us know if you have any questions! Stay tuned for more blogs on new Kubewarden policies!

Meet Epinio: The Application Development Engine for Kubernetes

Tuesday, 4 October, 2022

Epinio is a Kubernetes-powered application development engine. Adding Epinio to your cluster creates your own platform-as-a-service (PaaS) solution in which you can deploy apps without setting up infrastructure yourself.

Epinio abstracts away the complexity of Kubernetes so you can get back to writing code. Apps are launched by pushing their source directly to the platform, eliminating complex CD pipelines and Kubernetes YAML files. You move directly to a live instance of your system that’s accessible at a URL.

This tutorial will show you how to install Epinio and deploy a simple application.

Prerequisites

You’ll need an existing Kubernetes cluster to use Epinio. You can start a local cluster with a tool like K3sminikubeRancher Desktop or with any managed service such as Azure Kubernetes Service (AKS) or Google Kubernetes Engine (GKE).

You must have the following tools to follow along with this guide:

Install them from the links above if they’re missing from your system. You don’t need these to use Epinio, but they are required for the initial installation procedure.

The steps in this guide have been tested with K3s v1.24 (Kubernetes v1.24) and minikube v1.26 (Kubernetes v1.24) on a Linux host. Additional steps may be required to run Epinio in other environments.

What Is Epinio?

Epinio is an application platform that offers a simplified development experience by using Kubernetes to automatically build and deploy your apps. It’s like having your own PaaS solution that runs in a Kubernetes cluster you can control.

Using Epinio to run your apps lets you focus on the logic of your business functions instead of tediously configuring containers and Kubernetes objects. Epinio will automatically work out which programming languages you use, build an appropriate image with a Paketo Buildpack and launch your containers inside your Kubernetes cluster. You can optionally use your own image if you’ve already got one available.

Developer experience (DX) is a hot topic because good tools reduce stress, improve productivity and encourage engineers to concentrate on their strengths without being distracted by low-level components. A simpler app deployment experience frees up developers to work on impactful changes. It also promotes experimentation by allowing new app instances to be rapidly launched in staging and test environments.

Epinio Tames Developer Workflows

Epinio is purpose-built to enhance development workflows by handling deployment for you. It’s quick to set up, simple to use and suitable for all environments from your own laptop to your production cloud. New apps can be deployed by running a single command, removing the hours of work required if you were to construct container images and deployment pipelines from scratch.

While Epinio does a lot of work for you, it’s also flexible in how apps run. You’re not locked into the platform, unlike other PaaS solutions. Because Epinio runs within your own Kubernetes cluster, operators can interact directly with Kubernetes to monitor running apps, optimize cluster performance and act on problems. Epinio is a developer-oriented layer that imbues Kubernetes with greater ease of use.

The platform is compatible with most Kubernetes environments. It’s edge-friendly and capable of running with 2 vCPUs and 4 GB of RAM. Epinio currently supports Kubernetes versions 1.20 to 1.23 and is tested with K3s, k3d, minikube and Rancher Desktop.

How Does Epinio Work?

Epinio wraps several Kubernetes components in higher-level abstractions that allow you to push code straight to the platform. Your Epinio installation inspects your source, selects an appropriate buildpack and creates Kubernetes objects to deploy your app.

The deployment process is fully automated and handled entirely by Epinio. You don’t need to understand containers or Kubernetes to launch your app. Pushing up new code sets off a sequence of actions that allows you to access the project at a public URL.

Epinio first compresses your source and uploads the archive to a MinIO object storage server that runs in your cluster. It then “stages” your application by matching its components to a Paketo Buildpack. This process produces a container image that can be used with Kubernetes.

Once Epinio is installed in your cluster, you can interact with it using the CLI. Epinio also comes with a web UI for managing your applications.

Installing Epinio

Epinio is usually installed with its official Helm chart. This bundles everything needed to run the system, although there are still a few prerequisites.

Before deploying Epinio, you must have an ingress controller available in your cluster. NGINX and Traefik provide two popular options. Ingresses let you expose your applications using URLs instead of raw hostnames and ports. Epinio requires your apps to be deployed with a URL, so it won’t work without an ingress controller. New deployments automatically generate a URL, but you can manually assign one instead. Most popular single-node Kubernetes distributions such as K3s,minikube and Rancher Desktop come with one either built-in or as a bundled add-on.

You can manually install the Traefik ingress controller if you need to by running the following commands:

$ helm repo add traefik https://helm.traefik.io/traefik
$ helm repo update
$ helm install traefik --create-namespace --namespace traefik traefik/traefik

You can skip this step if you’re following along using minikube or K3s.

Preparing K3s

Epinio on K3s doesn’t have any special prerequisites. You’ll need to know your machine’s IP address, though—use it instead of 192.168.49.2 in the following examples.

Preparing minikube

Install the official minikube ingress add-on before you try to run Epinio:

$ minikube addons enable ingress

You should also double-check your minikube IP address with minikube ip:

$ minikube ip
192.168.49.2

Use this IP address instead of 192.168.49.2 in the following examples.

Installing Epinio on K3s or minikube

Epinio needs cert-manager so it can automatically acquire TLS certificates for your apps. You can install cert-manager using its own Helm chart:

$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update
$ helm install cert-manager --create-namespace --namespace cert-manager jetstack/cert-manager --set installCRDs=true

All other components are included with Epinio’s Helm chart. Before you continue, set up a domain to use with Epinio. It needs to be a wildcard where all subdomains resolve back to the IP address of your ingress controller or load balancer. You can use a service such as sslip.io to set up a magic domain that fulfills this requirement while running Epinio locally. sslip.io runs a DNS service that resolves to the IP address given in the hostname used for the query. For instance, any request to *.192.168.49.2.sslip.io will resolve to 192.168.49.2.

Next, run the following commands to add Epinio to your cluster. Change the value of global.domain if you’ve set up a real domain name:

$ helm repo add epinio https://epinio.github.io/helm-charts
$ helm install epinio --create-namespace --namespace epinio epinio/epinio --set global.domain=192.168.49.2.sslip.io

You should get an output similar to the following. It provides information about the Helm chart deployment and some getting started instructions from Epinio.

NAME: epinio
LAST DEPLOYED: Fri Aug 19 17:56:37 2022
NAMESPACE: epinio
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To interact with your Epinio installation download the latest epinio binary from https://github.com/epinio/epinio/releases/latest.

Login to the cluster with any of these:

    `epinio login -u admin https://epinio.192.168.49.2.sslip.io`
    `epinio login -u epinio https://epinio.192.168.49.2.sslip.io`

or go to the dashboard at: https://epinio.192.168.49.2.sslip.io

If you didn't specify a password, the default one is `password`.

For more information about Epinio, feel free to check out https://epinio.io/ and https://docs.epinio.io/.

Epinio is now installed and ready to use. If you hit a problem and Epinio doesn’t start, refer to the documentation to check any specific steps required for compatibility with your Kubernetes distribution.

Installing the CLI

Install the Epinio CLI from the project’s GitHub releases page. It’s available as a self-contained binary for Linux, Mac and Windows. Download the appropriate binary and move it into a location on your PATH:

$ wget https://github.com/epinio/epinio/releases/epinio-linux-x86_64
$ sudo mv epinio-linux-x86_64 /usr/local/bin/epinio
$ sudo chmod +x /usr/local/bin/epinio

Try running the epinio command:

$ Epinio Version: v1.1.0
Go Version: go1.18.3

Next, you can connect the CLI to the Epinio installation running in your cluster.

Connecting the CLI to Epinio

Login instructions are shown in the Helm output displayed after you install Epinio. The Epinio API server is exposed at epinio.<global.domain>. The default user credentials are admin and password. Run the following command in your terminal to connect your CLI to Epinio, assuming you used 192.168.49.2.sslip.io as your global domain:

$ epinio login -u admin https://epinio.192.168.49.2.sslip.io

You’ll be prompted to trust the fake certificate generated by your Kubernetes ingress controller if you’re using a magic domain without setting up SSL. Press the Y key at the prompt to continue:

Logging in to Epinio in the CLI

You should see a green Login successful message that confirms the CLI is ready to use.

Accessing the Web UI

The Epinio web UI is accessed by visiting your global domain in your browser. The login credentials match the CLI, defaulting to admin and password. You’ll see a browser certificate warning and a prompt to continue when you’re using an untrusted SSL certificate.

Epinio web UI

Once logged in, you can view your deployed applications, interactively create a new one using a form and manage templates for quickly launching new app instances. The UI replicates most of the functionality available in the CLI.

Creating a Simple App

Now you’re ready to start your first Epinio app from a directory containing your source. You don’t have to create a container image or run any external tools.

You can use the following Node.js code if you need something simple to deploy. Save it to a file called index.js inside a new directory. It runs an Express web server that responds to incoming HTTP requests with a simple message:

const express = require('express')
const app = express()
const port = 8080;

app.get('/', (req, res) => {
  res.send('This application is served by Epinio!')
})

app.listen(port, () => {
  console.log(`Epinio application is listening on port ${port}`)
});

Next, use npm to install Express as a dependency in your project:

$ npm install express

The Epinio CLI has a push command that deploys the contents of your working directory to your Kubernetes cluster. The only required argument is a name for your app.

$ epinio push -n epinio-demo

Press the Enter key at the prompt to confirm your deployment. Your terminal will fill with output as Epinio logs what’s happening behind the scenes. It first uploads your source to its internal MinIO object storage server, then acquires the right Paketo Buildpack to create your application’s container image. The final step adds the Kubernetes deployment, service and ingress resources to run the app.

Deploying an application with Epinio

Wait until you see the green App is online message appears in your terminal, and visit the displayed URL in your browser to see your live application:

App is online

If everything has worked correctly, you’ll see This application is served by Epinio! when using the source code provided above.

Application running in Epinio

Managing Deployed Apps

App updates are deployed by repeating the epinio push command:

$ epinio push -n epinio-demo

You can retrieve a list of deployed apps with the Epinio CLI:

$ epinio app list
Namespace: workspace

✔️  Epinio Applications:
|        NAME         |            CREATED            | STATUS |                     ROUTES                     | CONFIGURATIONS | STATUS DETAILS |
|---------------------|-------------------------------|--------|------------------------------------------------|----------------|----------------|
| epinio-demo         | 2022-08-23 19:26:38 +0100 BST | 1/1    | epinio-demo-a279f.192.168.49.2.sslip.io         |                |                |

The app logs command provides access to the logs written by your app’s standard output and error streams:

$ epinio app logs epinio-demo

🚢  Streaming application logs
Namespace: workspace
Application: epinio-demo
🕞  [repinio-demo-057d58004dbf05e7fb7516a0c911017766184db8-6d9fflt2w] repinio-demo-057d58004dbf05e7fb7516a0c911017766184db8 Epinio application is listening on port 8080

Scale your application with more instances using the app update command:

$ epinio app update epinio-demo --instances 3

You can delete an app with app delete. This will completely remove the deployment from your cluster, rendering it inaccessible. Epinio won’t touch the local source code on your machine.

$ epinio app delete epinio-demo

You can perform all these operations within the web UI as well.

Conclusion

Epinio makes application development in Kubernetes simple because you can go from code to a live URL in one step. Running a single command gives you a live deployment that runs in your own Kubernetes cluster. It lets developers launch applications without surmounting the Kubernetes learning curve, while operators can continue using their familiar management tools and processes.

Epinio can be used anywhere you’re working, whether on your own workstation or as a production environment in the cloud. Local setup is quick and easy with zero configuration, letting you concentrate on your code. The platform uses Paketo Buildpacks to discover your source, so it’s language and framework-agnostic.

Epinio is one of the many offerings from SUSE, which provides open source technologies for Linux, cloud computing and containers. Epinio is SUSE’s solution to support developers building apps on Kubernetes, sitting alongside products like Rancher Desktop that simplify Kubernetes cluster setup. Install and try Epinio in under five minutes so you can push app deployments straight from your source.

Understanding Hyperconverged Infrastructure at the Edge from Adoption to Acceleration

Thursday, 29 September, 2022

You may be tired of the regular three-tiered infrastructure and the management issues it can bring in distributed systems and maintenance. Or perhaps you’ve looked at your infrastructure and realized that you need to move away from its current configuration. If that’s the case, hyperconverged infrastructure (HCI) may be a good solution because it removes a lot of management overhead, acting like a hypervisor that can handle networking and storage.

There are some key principles behind HCI that bring to light the advantages it has. Particularly, it can help simplify the deployment of new nodes and new applications. Because everything inside your infrastructure runs on normal x86 servers, adding nodes is as simple as spinning up a server and joining it to your HCI cluster. From here, applications can easily move around on the nodes as needed to optimize performance.

Once you’ve gotten your nodes deployed and added to your cluster, everything inside an HCI can be managed by policies, making it possible for you to strictly define the behavior of your infrastructure. This is one of the key benefits of HCI — it uses a single management interface. You don’t need to configure your networking in one place, your storage in another, and your compute in a third place; everything can be managed cohesively.

This cohesive management is possible because an HCI relies heavily on virtualization, making it feasible to converge the typical three tiers (compute, networking and storage) into a single plane, offering you flexibility.

While HCI might be an overkill for simple projects, it’s becoming a best practice for various enterprise use cases. In this article, you’ll see some of the main use cases for wanting to implement HCI in your organization. We’ll also introduce Harvester as a modern way to get started easier.

While reading through these use cases, remember that the use of HCI is not limited to them. To benefit most from this article, think about what principles of HCI make the use cases possible, and perhaps, you’ll be able to come up with additional use cases for yourself.

Why you need a hyperconverged infrastructure

There are many use cases when it comes to HCI, and most of them are based on the fact that HCI is highly scalable and, more importantly, it’s easy to scale HCI. The concept started getting momentum back in 2009, but it wasn’t until 2014 that it started gaining traction in the community at large. HCI is a proven and mature technology that, in its essence, has worked the same way for many years.

The past few decades have seen virtualization become the preferred method for users to optimize their resource usage and manage their infrastructure costs. However, introducing new technology, such as containers, has required operators to shift their existing virtualized-focused infrastructure to integrate with these modern cloud-based solutions, bringing new challenges for IT operators to tackle.

Managing virtualized resources (and specifically VMs) can be quite challenging. This is where HCI can help. By automating and simplifying the management of virtual resources, HCI makes it easy for developers and team leads to leverage virtualization to the fullest and reduce the time to market their product, a crucial factor in determining the success of a project.

Following are some of the most popular ways to use HCI currently:

Edge computing

Edge computing is the principle of running workloads outside the primary data centers of a company. While there’s no single reason for wanting to use edge computing, the most popular reason is to decrease customer latency.

In edge computing, you don’t always need an extensive fleet of servers, and the amount of power you need will likely change based on the location. You’ll need more servers to serve New York City with a population of 8.3 million than you’d need to fill the entire country of Denmark with a population of 5.8 million. One of the most significant benefits of HCI is that it scales incredibly well and low. You’d typically want multiple nodes for reasons like backup, redundancy and high availability. But theoretically, it’s possible to scale down to a single node.

Given that HCI runs on normal hardware, it’s also possible for you to optimize your nodes for the workload you need. If your edge computing use case is to provide a cache for users, then you’d likely need more storage. However, if you’re implementing edge workers that need to execute small scripts, you’re more likely to need processing power and memory. With HCI, you can adapt the implementation to your needs.

Migrating to a Hybrid Cloud Model

Over the past decade, the cloud has gotten more and more popular. Many companies move to the cloud and later realize their applications are better suited to run on-premises. You will also find companies that no longer want to run things in their data centers and instead want to move them to the cloud. In both these cases, HCI can be helpful.

If you want to leverage the cloud, HCI can provide a similar user experience on-premise. HCI is sometimes described as a “cloud in a box” because it can offer similar services one would expect in a public cloud. Examples of this include a consistent API for allocating compute resources dynamically, load balancers and storage services. Having a similar platform is a good foundation for being able to move applications between the public cloud and on-premise. You can even take advantage of tools like Rancher that can manage cloud infrastructure and on-prem HCI from a single pane of glass.

Modernization strategy

Many organizations view HCI as an enabler in their modernization processes. However, modernization is quite different from migration.

Modernization focuses on redesigning existing systems and architecture to make the most efficient use of the new environment and its offerings. With its particular focus on simplifying the complex management of data, orchestration and workflows, HCI is perfect for modernization.

HCI enables you to consolidate your complex server architecture with all its storage, compute and network resources into smaller, easy-to-manage nodes. You can easily transform a node from a storage-first resource to a compute-first resource, allowing you to design your infrastructure how you want it while retaining simplicity.

Modern HCI solutions like Harvester can help you to run your virtualized and containerized workloads side by side, simplifying the operational and management components of infrastructure management while also providing the capabilities to manage workloads across distributed environments. Regarding automation, Harvester provides a unique approach by using cloud native APIs. This allows the user to automate using the same tools they would use to manage cloud native applications. Not switching between two “toolboxes” can increase product development velocity and decrease the overhead of managing complex systems. That means users of this approach get their product to market sooner and with less cost.

Virtual Desktop Infrastructure (VDI)

Many organizations maintain fleets of virtual desktops that enable their employees to work remotely while maintaining standards of security and performance. Virtual desktops are desktop environments that are not limited to the hardware they’re hosted in; they can be accessed remotely via the use of software. Organizations prefer them over hardware since they’re easy to provision, scale, and destroy on demand.

Since compute and storage are two strongly connected and important resources in virtual desktops, HCI can easily manage virtual desktops. HCI’s enhanced reliability provides VDI with increased fault tolerance and efficient capacity consumption. HCI also helps cut down costs for VDI as there is no need for separate storage arrays, dedicated storage networks, and related hardware.

Remote office/Branch office

A remote office/branch office (ROBO) is one of the best reasons for using HCI. In case you’re not familiar, it’s typical for big enterprises to have a headquarters where they host their data and internal applications. Then the ROBOs will either have a direct connection to the headquarters to access the data and applications or have a replica in their own location. In both cases, you will introduce more management and maintenance and other factors, such as latency.

With HCI, you can spin up a few servers in the ROBOs and add them to an HCI cluster. Now, you’re managing all your infrastructure, even the infrastructure in remote locations, through a single interface. Not only can this result in a better experience for the employees, but depending on how much customer interaction they have, it can result in a better customer experience.

In addition, with HCI, you’re likely to lower your total cost of ownership. While you would typically have to put up an entire rack of hardware in a ROBO, you’re now expected to accomplish the same with just a few servers.

Conclusion

After reading this article, you now know more about how HCI can be used to support a variety of use cases, and hopefully, you’ve come up with a few use cases yourself. This is just the beginning of how HCI can be used. Over the next decade or two, HCI will continue to play an important role in any infrastructure strategy, as it can be used in both on-premises data centers and the public cloud. The fact that it uses commodity x86 systems to run makes it suitable for many different use cases.

If you’re ready to start using HCI for yourself, take a look at Harvester. Harvester is a solution developed by SUSE, built for bare metal servers. It uses enterprise-grade technologies, such as KubernetesKubeVirt and Longhorn.

What’s Next:

Want to learn more about how Harvester and Rancher are helping enterprises modernize their stack speed? Sign up here to join our Global Online Meetup: Harvester on October 26th, 2022, at 11 AM EST.

Cloud Modernization Best Practices

Monday, 8 August, 2022

Cloud services have revolutionized the technical industry, and services and tools of all kinds have been created to help organizations migrate to the cloud and become more scalable in the process. This migration is often referred to as cloud modernization.

To successfully implement cloud modernization, you must adapt your existing processes for future feature releases. This could mean adjusting your continuous integration, continuous delivery (CI/CD) pipeline and its technical implementations, updating or redesigning your release approval process (eg from manual approvals to automated approvals), or making other changes to your software development lifecycle.

In this article, you’ll learn some best practices and tips for successfully modernizing your cloud deployments.

Best practices for cloud modernization

The following are a few best practices that you should consider when modernizing your cloud deployments.

Split your app into microservices where possible

Most existing applications deployed on-premises were developed and deployed with a monolithic architecture in mind. In this context, monolithic architecture means that the application is single-tiered and has no modularity. This makes it hard to bring new versions into a production environment because any change in the code can influence every part of the application. Often, this leads to a lot of additional and, at times, manual testing.

Monolithic applications often do not scale horizontally and can cause various problems, including complex development, tight coupling, slow application starts due to application size, and reduced reliability.

To address the challenges that a monolithic architecture presents, you should consider splitting your monolith into microservices. This means that your application is split into different, loosely coupled services that each serve a single purpose.

All of these services are independent solutions, but they are meant to work together to contribute to a larger system at scale. This increases reliability as one failing service does not take down the whole application with it. Also, you now get the freedom to scale each component of your application without affecting other components. On the development side, since each component is independent, you can split the development of your app among your team and work on multiple components parallelly to ensure faster delivery.

For example, the Lyft engineering team managed to quickly grow from a handful of different services to hundreds of services while keeping their developer productivity up. As part of this process, they included automated acceptance testing as part of their pipeline to production.

Isolate apps away from the underlying infrastructure

Engineers built scripts or pieces of code agnostic to the infrastructure they were deployed on in many older applications and workloads. This means they wrote scripts that referenced specific folders or required predefined libraries to be available in the environment in which the scripts were executed. Often, this was due to required configurations on the hardware infrastructure or the operating system or due to dependency on certain packages that were required by the application.

Most cloud providers refer to this as a shared responsibility model. In this model, the cloud provider or service provider takes responsibility for the parts of the services being used, and the service user takes responsibility for protecting and securing the data for any services or infrastructure they use. The interaction between the services or applications deployed on the infrastructure is well-defined through APIs or integration points. This means that the more you move away from managing and relying on the underlying infrastructure, the easier it becomes for you to replace it later. For instance, if required, you only need to adjust the APIs or integration points that connect your application to the underlying infrastructure.

To isolate your apps, you can containerize them, which bakes your application into a repeatable and reproducible container. To further separate your apps from the underlying infrastructure, you can move toward serverless-first development, which includes a serverless architecture. You will be required to re-architect your existing applications to be able to execute on AWS Lambda or Azure Functions or adopt other serverless technologies or services.

While going serverless is recommended in some cases, such as simple CRUD operations or applications with high scaling demands, it’s not a requirement for successful cloud modernization.

Pay attention to your app security

As you begin to incorporate cloud modernization, you’ll need to ensure that any deliverables you ship to your clients are secure and follow a shift-left process. This process lets you quickly provide feedback to your developers by incorporating security checks and guardrails early in your development lifecycle (eg running static code analysis directly after a commit to a feature branch). And to keep things secure at all times during the development cycle, it’s best to set up continuous runtime checks for your workloads. This will ensure that you actively catch future issues in your infrastructure and workloads.

Quickly delivering features, functionality, or bug fixes to customers gives you and your organization more responsibility in ensuring automated verifications in each stage of the software development lifecycle (SDLC). This means that in each stage of the delivery chain, you will need to ensure that the delivered application and customer experience are secure; otherwise, you could expose your organization to data breaches that can cause reputational risk.

Making your deliverables secure includes ensuring that any personally identifiable information is encrypted in transit and at rest. However, it also requires that you ensure your application does not have open security risks. This can be achieved by running static code analysis tools like SonarQube or Checkmarks.

In this blog post, you can read more about the importance of application security in your cloud modernization journey.

Use infrastructure as code and configuration as code

Infrastructure as code (IaC) is an important part of your cloud modernization journey. For instance, if you want to be able to provision infrastructure (ie required hardware, network and databases) in a repeatable way, using IaC will empower you to apply existing software development practices (such as pull requests and code reviews) to change the infrastructure. Using IaC also helps you to have immutable infrastructure that prevents accidentally introducing risk while making changes to existing infrastructure.

Configuration drift is a prominent issue with making ad hoc changes to an infrastructure. If you make any manual changes to your infrastructure and forget to update the configuration, you might end up with an infrastructure that doesn’t match its own configuration. Using IaC enforces that you make changes to the infrastructure only by updating the configuration code, which helps maintain consistency and a reliable record of changes.

All the major cloud providers have their own definition language for IaC, such as AWS CloudFormationGoogle Cloud Platform (GCP) and Microsoft Azure.

Ensuring that you can deploy and redeploy your application or workload in a repeatable manner will empower your teams further because you can deploy the infrastructure in additional regions or target markets without changing your application. If you don’t want to use any of the major cloud providers’ offerings to avoid vendor lock-in, other IaC alternatives include Terraform and Pulumi. These tools offer capabilities to deploy infrastructure into different cloud providers from a single codebase.

Another way of writing IaC is the AWS Cloud Development Kit (CDK), which has unique capabilities that make it a good choice for writing IaC while driving cultural change within your organization. For instance, AWS CDK lets you write automated unit tests for your IaC. From a cultural perspective, this allows developers to write IaC in their preferred programming language. This means that developers can be part of a DevOps team without needing to learn a new language. AWS CDK can also be used to quickly deploy and develop infrastructure on AWS, cdk8s for Kubernetes, and Cloud Development Kit for Terraform (CDKTF).

After adapting to IaC, it’s also recommended to deploy all your configurations as code (CAC). When you use CoC, you can put the same guardrails (ie pull requests) around configuration changes required for any code change in a production environment.

Pay attention to resource usage

It’s common for new entrants to the cloud to miss out on tracking their resource consumption while they’re in the process of migrating to the cloud. Some organizations start with too much (~20 percent) of additional resources, while some forget to set up restricted access to avoid overuse. This is why tracking the resource usage of your new cloud infrastructure from day one is very important.

There are a couple of things you can do about this. The first and a very high-level solution is to set budget alerts so that you’re notified when your resources start to cost more than they are supposed to in a fixed time period. The next step is to go a level down and set up cost consolidation of each resource being used in the cloud. This will help you understand which resource is responsible for the overuse of your budget.

The final and very effective solution is to track and audit the usage of all resources in your cloud. This will give you a direct answer as to why a certain resource overshot its expected budget and might even point you towards the root cause and probable solutions for the issue.

Culture and process recommendations for cloud modernization

How cloud modernization impacts your organization’s culture and processes often goes unnoticed. If you really want to implement cloud modernization, you need to change every engineer in your organization’s mindset drastically.

Modernize SDLC processes

Oftentimes, organizations with a more traditional, non-cloud delivery model follow a checklist-based approach for their SDLC. During your cloud modernization journey, existing SDLC processes will need to be enhanced to be able to cope with the faster delivery of new application versions to the production environment. Verifications that are manual today will need to be automated to ensure faster response times. In addition, client feedback needs to flow faster through the organization to be quickly incorporated into software deliverables. Different tools, such as SecureStack and SUSE Manager, can help automate and improve efficiency in your SDLC, as they take away the burden of manually managing rules and policies.

Drive cultural change toward blameless conversations

As your cloud journey continues to evolve and you need to deliver new features faster or quickly fix bugs as they arise, this higher change frequency and higher usage of applications will lead to more incidents and cause disruptions. To avoid attrition and arguments within the DevOps team, it’s important to create a culture of blameless communication. Blameless conversations are the foundation of a healthy DevOps culture.

One way you can do this is by running blameless post-mortems. A blameless post-mortem is usually set up after a negative experience within an organization. In the post-mortem, which is usually run as a meeting, everyone explains his or her view on what happened in a non-accusing, objective way. If you facilitate a blameless post-mortem, you need to emphasize that there is no intention of blaming or attacking anyone during the discussion.

Track key performance metrics

Google’s annual State of DevOps report uses four key metrics to measure DevOps performance: deploy frequency, lead time for changes, time to restore service, and change fail rate. While this article doesn’t focus specifically on DevOps, tracking these four metrics is also beneficial for your cloud modernization journey because it allows you to compare yourself with other industry leaders. Any improvement of key performance indicators (KPIs) will motivate your teams and ensure you reach your goals.

One of the key things you can measure is the duration of your modernization project. The project’s duration will directly impact the project’s cost, which is another important metric to pay attention to in your cloud modernization journey.

Ultimately, different companies will prioritize different KPIs depending on their goals. The most important thing is to pick metrics that are meaningful to you. For instance, a software-as-a-service (SaaS) business hosting a rapidly growing consumer website will need to track the time it takes to deliver a new feature (from commit to production). However, this metric isn’t meant for a traditional bank that only updates its software once a year.

You should review your chosen metrics regularly. Are they still in line with your current goals? If not, it’s time to adapt.

Conclusion

Migrating your company to the cloud requires changing the entirety of your applications or workloads. But it doesn’t stop there. In order to effectively implement cloud modernization, you need to adjust your existing operations, software delivery process, and organizational culture.

In this roundup, you learned about some best practices that can help you in your cloud modernization journey. By isolating your applications from the underlying infrastructure, you gain flexibility and the ability to shift your workloads easily between different cloud providers. You also learned how implementing a modern SDLC process can help your organization protect your customer’s data and avoid reputational loss by security breaches.

SUSE supports enterprises of all sizes on their cloud modernization journey through their Premium Technical Advisory Services. If you’re looking to restructure your existing solutions and accelerate your business, SUSE’s cloud native transformation approach can help you avoid common pitfalls and accelerate your business transformation.

Learn more in the SUSE & Rancher Community. We offer free classes on Kubernetes, Rancher, and more to support you on your cloud native learning path.

What is GitOps?

Monday, 30 May, 2022

If you are new to the term ‘GitOps,’ it can be quite challenging to imagine how the two models, Git and Ops, come together to function as a single framework. Git is a source code management tool introduced in 2005 that has become the go-to standard for many software development teams. On the other hand, Ops is a term typically used to describe the functions and practices that fall under the purview of IT operations teams and the more modern DevOps philosophies and methods. GitOps is a paradigm that Alexis Richardson from the Weaveworks team coined to describe the deployment of immutable infrastructure with Git as the single source of truth.

In this article, I will cover GitOps as a deployment pattern and its components, benefits and challenges.

What is GitOps?

GitOps requires you to describe and observe systems with declarative configurations that will form the basis of continuous integration, delivery and deployment of your infrastructure. The desired state of the infrastructure or application is stored as code, then associated platforms like Kubernetes (K8s) reconcile the differences and update the infrastructure or application state. Kubernetes is the choice ecosystem for GitOps providers and practitioners because of this declarative requirement. FleetFluxCDArgoCD and Jenkins X are examples of GitOps tools or operators.

Infrastructure as Code

GitOps builds on DevOps practices surrounding version control, code review collaboration and CI/CD. These practices extend to the automation of infrastructure and application deployments, defined using Infrastructure as Code (IaC) techniques. The main idea behind IaC is to enable writing and executing code to define, deploy, update and destroy infrastructure. IaC presents a different way of thinking and treating all aspects of operations as software, even those that represent hardware.

There are five broad categories of tools used to configure and orchestrate infrastructure and application stacks:

  • Ad hoc scripts: The most straightforward approach to automating anything is to write an ad hoc script. Take any manual task and break it down into discrete steps. Use scripting languages like Bash, Ruby and Python to define each step in code and execute that script on your server.
  • Configuration management tools: Chef, Puppet, Ansible, and SaltStack are all configuration management tools designed to install and configure software on existing servers that perpetually exist.
  • Server templating tools: An alternative to configuration management that’s growing in popularity is server templating tools such as DockerPacker and Vagrant. Instead of launching and then configuring servers, the idea behind server templating is to create an image of a server that captures a fully self-contained “snapshot” of the operating system (OS), the software, the files and all other relevant dependencies.
  • Orchestration toolsKubernetes is an example of an orchestration tool. Kubernetes allows you to define how to manage containers as code. You first deploy the Kubernetes cluster, a group of servers that Kubernetes will manage and use to run your Docker containers. Most major cloud providers have native support for deploying managed Kubernetes clusters, such as Amazon Elastic Container Service for Kubernetes (Amazon EKS), Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS).
  • IaC Provisioning tools: Whereas configuration management, server templating, and orchestration tools define the code that runs on each server or container, IaC provisioning tools such as TerraformAWS CloudFormation and OpenStack Heat define infrastructure configuration across public clouds and data centers. You use such tools to create servers, databases, caches, load balancers, queues, monitoring, subnet configurations, firewall settings, routing rules and Secure Sockets Layer (SSL) certificates.

Of the IaC tools listed above, Kubernetes is most associated with GitOps due to its declarative piece of infrastructure.

Immutable Infrastructure

The term immutable infrastructure, also commonly known as immutable architecture, is a bit misleading. The concept does not mean that infrastructure never changes, but rather, once something is instantiated, it should never change. Instead, it should be replaced by another instance to ensure predictable behavior. Following this approach enables discrete versioning in an architecture. With discrete versioning, there is less risk and complexity because the infrastructure states are tested and have a greater degree of predictability. This is one of the main goals an immutable architecture tries to accomplish.

The GitOps model supports immutable architectures because all the infrastructure is declared as source code in a versioning system. In the context of Kubernetes, this approach allows software teams to produce more reliable cluster configurations that can be tested and versioned from a single source of truth in a Git repository.

Immutable vs. Mutable Architecture

The Declarative Deployment Model

Regarding automating deployments, there are two main DevOps approaches to consider: declarative and imperative. In an imperative (or procedural approach), a team is responsible for defining the main goal in a step-by-step process. These steps include instructions such as software installation, configuration, creation, etc. These steps will then be executed in an automated way. The state of the environment will result from the operations defined by the responsible DevOps team. This paradigm may work well with small workloads but doesn’t scale well and may introduce several failures with large software environments.

In contrast, a declarative approach eliminates the need to define steps for the desired outcome. Instead, the final desired state is what is declared or defined. The relevant team will specify the number of Pods deployed for an application, how the Pods will be deployed, how they will scale, etc. The steps to achieve these goals don’t have to be defined. With the declarative approach, a lot of time is saved, and the complex steps are abstracted away. The focus shifts from the ‘how’ to the ‘what.’

Most cloud infrastructures that existed before Kubernetes was released provided a procedural approach for automating deployment activities. Examples of these include scripting languages such as Ansible, Chef, and Puppet. Kubernetes, on the other hand, uses a declarative approach to describe what the desired state of the system should be. GitOps and K8s fit naturally. Common Git operations control the deployment of declarative Kubernetes manifest files.

GitOps CI/CD Sequence Example

Below is a high-level sequence demonstrating what a GitOps CI/CD workflow would look like when deploying a containerized application to a Kubernetes environment. The diagrams below are a representation of this:

Pull Request & Code Review

A CI/CD pipeline typically begins with a software developer creating a pull request, which will then be peer-reviewed to ensure it meets the agreed-upon standards. This collaborative effort is used to maintain good coding practices by the team and act as a first quality gate for the desired infrastructure deployment.

Build, Test and Push Docker Container Images

The Continuous Integration (CI) stage will automatically be triggered if the pipeline is configured to initiate based on the source changes. This usually requires setting the pipeline to poll for any source changes in the relevant branch of the repository. Once the source has been pulled, the sequence will proceed as follows:

  • Build Image for Application Tests: In this step, the relevant commands will be run to build and tag the Docker image. The image built in this step will be based on a Dockerfile with an execution command to run unit tests.

  • Build Production Image: Assuming the application unit tests passed, the final Docker image can be built and tagged with the new version using the production-grade Dockerfile with an execution command to start the application.

  • Push Production Container Image to Registry: Lastly, the Docker image will be pushed to the relevant Docker registry (i.e., Docker Hub) for Kubernetes to orchestrate the eventual deployment of the application.

Clone Config Repository, Update Manifests and Push To Config Repository

Once the image has been successfully built and pushed to the appropriate registry, the application manifest files must be updated with the new Docker image tag.

  • Clone Config Repository: In this step, the repository with the K8s resource definitions will be cloned. This will usually be a repository with Helm charts for the application resources and configurations.

  • Update Manifests: Once the repository has been cloned, a configuration management tool like Kustomize can update the manifests with the new Docker image tag.

  • Push to Config Repository: Lastly, these changes can be committed and pushed to the remote config repository with the new updates.

GitOps Continuous Delivery Process

The Continuous Delivery (CD) process follows when the CI completes the config repository updates. As stated earlier, the GitOps framework and its stages are triggered by changes to the manifests in the Git repository. This is where the aforementioned GitOps tools come in. In this case, I will use Fleet as an example.

Fleet is essentially a set of K8s custom resource definitions (CRDs) and custom controllers that manage GitOps for a single or multiple Kubernetes clusters. Fleet consists of the following core components in its architecture:

  • Fleet Manager: This is the central component that governs the deployments of K8s resources from the Git repository. When deploying resources to multiple clusters, this component will reside on a dedicated K8s cluster. In a single cluster setup, the Fleet manager will run on the same cluster being managed by GitOps.
  • Fleet Controller: The Fleet controllers run on the Fleet manager that performs the GitOps actions.
  • Fleet Agent: Each downstream cluster being managed by Fleet runs an agent that communicates with the Fleet manager.
  • GitRepo: Git repositories being watched by Fleet are represented by the type GitRepo.
  • Bundle: When the relevant Git repository is pulled, the sourced configuration files produce a unit referred to as a Bundle. Bundles are the deployment units used in Fleet.
  • Bundle Deployment: A BundleDeployment represents the state of the deployed Bundle on a cluster with its specific customizations.

  • Scan Config Repository: Based on polling configurations, Fleet detects the changes in the config repository before performing a Git pull (or scan) to fetch the latest manifests.

  • Discover Manifests: Fleet determines any differences between the manifests in the Kubernetes cluster versus the latest manifests in the repository. The discovered manifests or Helm charts will be used to produce a Bundle.

  • Helm Release: When the Fleet operator detects the differences, it will convert the new manifests into Helm charts (regardless of the source) and perform a Helm release to the downstream clusters.

The Benefits of GitOps

Infrastructure as Code

Infrastructure as Code is one of the main components of GitOps. Using IaC to automate the process of creating infrastructure in cloud environments has the following advantages:

  • Reliable outcomes: When the correct process of creating infrastructure is saved in the form of code, software teams can have a reliable outcome whenever the same version of code is run.

  • Repeatable outcomes: Manually creating infrastructure is time-consuming, inefficient, and prone to human error. Using IaC enables a reliable outcome and makes the process of deploying infrastructure easily repeatable across environments.

  • Infrastructure Documentation: By defining the resources to be created using IaC, the code is a form of documentation for the environment’s infrastructure.

Code Reviews

The quality gate that code reviews bring to software teams can be translated to DevOps practices with infrastructure. For instance, changes to a Kubernetes cluster through the use of manifests or Helm charts would go through a review and approval process to meet certain criteria before deployment.

Declarative Paradigm

The declarative approach to programming in GitOps simplifies the process of creating the desired state for infrastructure. It produces a more predictable and reliable outcome in contrast to defining each step of the desired state procedurally.

Better Observability

Observability is an important element when describing the running state of a system and triggering alerts and notifications whenever unexpected behavioral changes occur. On this basis, any deployed environment should be observed by DevOps engineers. With GitOps, engineers can more easily verify if the running state matches that of the desired state in the source code repository.

The Challenges with GitOps

Collaboration Requirements

Following a GitOps pattern requires a culture shift within teams. In the case of individuals who are used to making quick manual changes on an ad hoc basis, this transition will be disruptive. In practice, teams should not be able to log in to a Kubernetes cluster to modify resource definitions to initiate a change in the cluster state. Instead, desired changes to the cluster should get pushed to the appropriate source code repository. These changes to the infrastructure go through a collaborative approval process before being merged. Once merged, the changes are deployed. This workflow sequence introduces a “change by committee” to any infrastructure changes, which is more time-consuming for teams, even if it’s better to practice.

GitOps Tooling Limitations

Today, GitOps tooling such as Fleet, FluxCD, ArgoCD and Jenkins X focuses on the Kubernetes ecosystem. This means that adopting GitOps practices with infrastructure platforms outside of Kubernetes will likely require additional work from DevOps teams. In-house tools may have to be developed to support the usage of this framework, which is less appealing for software teams because of the time it will take away from other core duties.

Declarative Infrastructure Limitations

As highlighted above, embracing GitOps requires a declarative model for deploying the infrastructure. However, there may be use cases where the declared state cannot define some infrastructure requirements. For example, in Kubernetes, you can set the number of replicas, but if a scaling event needs to occur based on CPU and memory that surpasses that replica, you end up with a deviation. Also, declarative configurations can be harder to debug and understand when the results are unexpected because the underlying steps are abstracted away.

No Universal Best Practices

Probably the most glaring issue with GitOps can be attributed to its novelty. At this point, there are no universal best practices that teams can follow when implementing this pattern. As a result, teams will have to implement a GitOps strategy based on their specific requirements and figure out what works best.

Conclusion

GitOps may be in its infancy, but the pattern extends the good old benefits of discrete and immutable versioning from software applications to infrastructure. It introduces automation, reliability, and predictability to the underlying infrastructure deployed to cloud environments.

What’s Next?

Get hands-on with GitOps. Join our free Accelerate Dev Workflows class. Week three is all about Continuous Deployment and GitOps. You can catch it on demand.

Tags: ,, Category: Uncategorized Comments closed

Secure Supply Chain: Securing Kubewarden Policies

Wednesday, 4 May, 2022

With recent releases, the Kubewarden stack supports verifying the integrity and authenticity of content using the Sigstore project.

In this post, we focus on Kubewarden Policies and how to create a Secure Supply Chain for them.

Sigstore?

Since a full Sigstore dive is not within the scope for this post, we recommend checking out their nice docs.

In short, Sigstore provides an automatable workflow to match the distributed Open Source development model. The workflow specifies how to digitally sign and verify artifacts which in our case are Kubewarden Policies. It also provides a transparency log to monitor such signatures. The workflow allows to sign artifacts with traditional Public-Private key pairs, or in Keyless mode.

In the keyless mode, signatures are created with short-lived certs using an OpenID Connect (OIDC) service as identity provider. Those short-lived certs are issued by Sigstore’s PKI infrastructure, Fulcio.

Fulcio acts as a Registration Authority, authenticating that you are who you say you are by using an OIDC service (SSO via your own Okta instance, GitHub, Google, etc). Once authenticated, Fulcio acts as a Certificate Authority, issuing the short-lived certificate that you will use to sign artifacts.

These short-lived certificate include the identity information obtained by the OIDC service inside of the certificate extensions attributes. The private key associated with the certificate is then used to sign the object while the certificate itself has a public key that can be used to verify the signatures produced by the private key.

The certificates issued by Fulcio have a short validity because they are generated to be short-lived. This is an interesting property that we will discuss shortly.

Once the artifact is signed, the proof of signature is then sent to an append-only transparency log, Rekor, that allows monitoring of such signatures and protects against timing attacks. The proof of signature is signed by Rekor and this information is stored inside of the signature itself.

By using the timestamp found inside of the proof of signature, the verifier can ensure that the signing action has been performed during the limited lifetime of the certificate.

Due to this the private key associated with the certificate doesn’t need to be safely stored. It can be discarded at the end of the signature process. An attacker could even reuse the private key, but the signature would not be considered valid if used outside of the limited lifetime of the certificate.

Nobody – developers, project leads, or sponsors, needs to have access to keys and Sigstore never obtains your private key. Hence the term keyless. Additionaly, one doesn’t need expensive infra for creating and validating signatures.

Since there’s no need for key secrets and the like in Keyless mode, it is easily automated inside CIs and implemented and monitored in the open. This is one of the reasons that makes it so interesting.

Building a Rust Sigstore stack

The policy server and libs within the Kubewarden stack are responsible for instantiating and running policies. They are written in Rust and therefore, we needed a good Rust implementation of Sigstore features. Since there weren’t any available, we are glad to announce that we have created a new crate, sigstore-rs, under the Sigstore org. This was done in an upstream-first manner and we’re happy to report that it is now taking a life of its own.

Securing kubewarden policies

As you may already know, Kubewarden Policies are small wasm-compiled binaries (~1 to ~6 MB) that are distributed via container registries as OCI artifacts. Let us see how Kubewarden protects policies against Secure Supply Chain attacks by signing and verifying them before they run.

Signing your Kubewarden Policy

Signing a Policy is done in the same way as signing a container image. This means just adding a new layer within the signature to a dedicated signature object managed by Sigstore. In the Sigstore workflow, one can sign with Public-Private keypair, or Keyless. Both can also add key=value annotations to the signatures.

The Public-Private key pair signing is straightforward, using sigstore/cosign:

$ COSIGN_PASSWORD=yourpass cosign generate-key-pair

Private key written to cosign.key
Public key written to cosign.pub

$ COSIGN_PASSWORD=yourpass cosign sign \
  --key cosign.key --annotations blog=yes \
  ghcr.io/kubewarden/policies/user-group-psp:v0.2.0

Pushing signature to: ghcr.io/kubewarden/policies/user-group-psp

The Keyless mode is more interesting:

$ COSIGN_EXPERIMENTAL=1 cosign sign \
  --annotations blog=yes \
  ghcr.io/kubewarden/policies/user-group-psp:v0.2.0

Generating ephemeral keys...
Retrieving signed certificate...
Your browser will now be opened to:
https://oauth2.sigstore.dev/auth/auth?access_type=online&client_id=sigstore&code_challenge=(...)
Successfully verified SCT...
tlog entry created with index: (...)
Pushing signature to: ghcr.io/viccuad/policies/volumes-psp

What happened? cosign prompted us for an OpenID Connect provider on the browser, which authenticated us, and instructed Fulcio to generate an ephemeral private key and a x509 certificate with the associated public key.

If this were to happen in a CI, the CI would provide the OIDC identity token in its environment. cosign has support for detecting some automated environments and producing an identity token. Currently that covers GitHub And Google Cloud, but one can always use a flag.

We shall now detail how it works for policies built by the Kubewarden team in GitHub Actions. First, we call cosign, and sign the policy in keyless mode. The certificate issued by Fulcio includes the following details about the identity of the signer inside of its x503v extensions:

  • An issuer, telling you who certified the image:
    https://token.actions.githubusercontent.com
    
  • A subject related to the specific workflow and worker, for example:
    https://github.com/kubewarden/policy-secure-pod-images/.github/workflows/release.yml@refs/heads/main
    

If you are curious, and want to see the contents of one of the certificates issued by Fulcio, install the crane cli tool, jq and openssl and execute the following command:

crane manifest \
  $(cosign triangulate ghcr.io/kubewarden/policies/pod-privileged:v0.1.10) | \
  jq -r '.layers[0].annotations."dev.sigstore.cosign/certificate"' | \
  openssl x509 -noout -text -in -

The end result is the same. A signature is added as a new image layer of a special OCI object that is created and managed by Sigstore. You can view those signatures as added layers,with sha256-<sha>.sig in the repo.

Even better, you can use tools like crane or the CLI tool, kwctl to perform the same action as demonstrated below.

kwctl pull <policy_url>; kwctl inspect <policy_url>

If you want to verify policies locally, you now can use kwctl verify:

$ kwctl verify --github-owner kubewarden registry://ghcr.io/kubewarden/policies/pod-privileged:v0.1.10
$ echo $?
0

When testing policies locally with kwctl pull or kwctl run, you can also enable signature verification by using any verification related flag. For example:

$ kwctl pull --github-owner kubewarden registry://ghcr.io/kubewarden/policies/pod-privileged:v0.1.10
$ echo $?
0

All the policies from the Kubewarden team are signed in keyless mode by the workers of the CI job, specifically the CI job of Github. We don’t leave certs around and they are verifiable by third parties.

Enforcing signature verification for instantiated Kubewarden policies

You can now configure PolicyServers to enforce that all policies being run need to be signed. When deploying Kubewarden via Helm charts, you can do it so for the default PolicyServer installed by kubewarden-defaults chart.

For this, the PolicyServers have a new spec.VerificationConfig argument. Here, you can put the name of a ConfigMap containing a “verification config”, to specify the needed signatures.

You can obtain a default verification config for policies from the Kubewarden team with:

$ kwctl scaffold verification-config
# Default Kubewarden verification config
#
# With this config, the only valid policies are those signed by Kubewarden
# infrastructure.
#
# This config can be saved to its default location (for this OS) with:
#   kwctl scaffold verification-config > /home/youruser/.config/kubewarden/verification-config.yml
#
# Providing a config in the default location enables Sigstore verification.
# See https://docs.kubewarden.io for more Sigstore verification options.
---
apiVersion: v1
allOf:
  - kind: githubAction
    owner: kubewarden
    repo: ~
    annotations: ~
anyOf: ~

The verification config format has several niceties, see its reference docs. For example, kind: githubAction with owner and repo, instead of checking the issuer and subject strings blindly. Or anyOf a list of signatures, with anyOf.atLeast a number of them: this allows for accepting at least a specific number of signatures, and makes migration between signatures in your cluster easy. It’s the little things ?.

If you want support for other CIs (such as GitLab, Jenkins, etc) drop us a note on Slack or file a GitHub issue!

Once you have crafted your verification config, create your ConfigMap:

$ kubectl create configmap my-verification-config \
  --from-file=verification-config=./my-verification-config.yml \
  --namespace=kubewarden

And pass it to your PolicyServers in spec.VerificationConfig, or if using the default PolicyServer from the kubewarden-defaults chart, set it there with for example:

$ helm upgrade --set policyServer.verificationConfig=my-verification-config \
  --wait -n kubewarden kubewarden-defaults ./kubewarden-defaults

Recap

Using cosign sign policy authors can sign or author their policies. All the policies owned by the Kubewarden team have already been signed in this way.

With kwctl verify, operators can verify them, and with kwctl inspect (and other tools such as crane manifest), operators can inspect the signatures. We can keep using kwctl pull and kwctl run to test policies locally as in the past, plus now verify their signatures too. Once we are satisfied, we can deploy Kubewarden PolicyServers so they enforce those signatures. If we want, the same verification config format can be used for kwctl and the cluster stack.

This way we are sure that the policies come from their stated authors, and have not been tampered with. Phew!

We, the Kubewarden team, are curious on how you approach this. What workflows are you interested in? What challenges do you have? Drop us a word in our Slack channel or foile a GitHub issue!

There are more things to secure in the chain and we’re excited for what lays ahead. Stay tuned for more blog entries on how to secure your supply chain with Kubewarden!

Stupid Simple Kubernetes: Get Started with Kubernetes

Monday, 18 April, 2022

In the era of MicroservicesCloud Computing and Serverless architecture, it’s useful to understand Kubernetes and learn how to use it. However, the official Kubernetes documentation can be hard to decipher, especially for newcomers. In this blog series, I will present a simplified view of Kubernetes and give examples of how to use it for deploying microservices using different cloud providers, including AzureAmazonGoogle Cloud and even IBM.

In this first article, we’ll talk about the most important concepts used in Kubernetes. Later in the series, we’ll learn how to write configuration files, use Helm as a package manager, create a cloud infrastructure, easily orchestrate our services using Kubernetes and create a CI/CD pipeline to automate the whole workflow. With this information, you can spin up any kind of project and create a solid infrastructure/architecture.

First, I’d like to mention that using containers has multiple benefits, from increased deployment velocity to delivery consistency with a greater horizontal scale. Even so, you should not use containers for everything because just putting any part of your application in a container comes with overhead, like maintaining a container orchestration layer. So, don’t jump to conclusions. Instead, create a cost/benefit analysis at the start of the project.

Now, let’s start our journey in the world of Kubernetes.

Kubernetes Hardware Structure

Nodes

Nodes are worker machines in Kubernetes, which can be any device that has CPU and RAM. For example, a node can be anything, from a smartwatch, smartphone, or laptop to a Raspberry Pi. When we work with cloud providers, a node is a virtual machine (VM). So, a node is an abstraction over a single device.

As you will see in the next articles, the beauty of this abstraction is that we don’t need to know the underlying hardware structure. We will just use nodes; this way, our infrastructure is platform independent.

Cluster

A cluster is a group of nodes. When you deploy programs onto the cluster, it automatically handles the distribution of work to the individual nodes. If more resources are required (for example, we need more memory), new nodes can be added to the cluster, and the work will be redistributed automatically.

We run our code on a cluster, and we shouldn’t care about which node. The distribution of the work is automatic.

Persistent Volumes

Because our code can be relocated from one node to another (for example, a node doesn’t have enough memory, so the work is rescheduled on a different node with enough memory), data saved on a node is volatile. But there are cases when we want to save our data persistently. In this case, we should use Persistent Volumes. A persistent volume is like an external hard drive; you can plug it in and save your data on it.

Google developed Kubernetes as a platform for stateless applications with persistent data stored elsewhere. As the project matured, many organizations wanted to leverage it for their stateful applications, so the developers added persistent volume management. Much like the early days of virtualization, database servers are not typically the first group of servers to move into this new architecture. That’s because the database is the core of many applications and may contain valuable information, so on-premises database systems still largely run in VMs or physical servers.

So, the question is, when should we use Persistent Volumes? To answer that question, first, we should understand the different types of database applications.

We can classify the data management solutions into two classes:

  1. Vertically scalable — includes traditional RDMS solutions such as MySQL, PostgreSQL and SQL Server
  2. Horizontally scalable — includes “NoSQL” solutions such as ElasticSearch or Hadoop-based solutions

Vertical scalable solutions like MySQL, Postgres and Microsoft SQL should not go in containers. These database platforms require high I/O, shared disks, block storage, etc., and do not (by design) handle the loss of a node in a cluster gracefully, which often happens in a container-based ecosystem.

For horizontally scalable applications (Elastic, Cassandra, Kafka, etc.), use containers. They can withstand the loss of a node in the database cluster, and the database application can independently rebalance.

Usually, you can and should containerize distributed databases that use redundant storage techniques and can withstand a node’s loss in the database cluster (ElasticSearch is a good example).

Kubernetes Software Components

Container

One of the goals of modern software development is to keep applications on the same host or cluster isolatedVirtual machines are one solution to this problemBut virtual machines require their own OS, so they are typically gigabytes in size.

Containers, by contrast, isolate application execution environments from one another but share the underlying OS kernel. So, a container is like a box where we store everything needed to run an application: code, runtime, system tools, system libraries, settings, etc. They’re typically measured in megabytes, use far fewer resources than VMs and start up almost immediately.

Pods

pod is a group of containers. In Kubernetes, the smallest unit of work is a pod. A pod can contain multiples containers, but usually, we use one container per pod because the replication unit in Kubernetes is the pod. If we want to scale each container independently, we add one container in a pod.

Deployments

The primary role of deployment is to provide declarative updates to both the pod and the ReplicaSet (a set in which the same pod is replicated multiple times). Using the deployment, we can specify how many replicas of the same pod should be running at any time. The deployment is like a manager for the pods; it automatically spins up the number of pods requested, monitors the pods and recreates the pods in case of failure. Deployments are helpful because you don’t have to create and manage each pod separately.

We usually use deployments for stateless applications. However, you can save the state of deployment by attaching a Persistent Volume to it and make it stateful.

Stateful Sets

StatefulSet is a new concept in Kubernetes, and it is a resource used to manage stateful applications. It manages the deployment and scaling of a set of pods and guarantees these pods’ ordering and uniqueness. It is similar to deployment; the only difference is that the deployment creates a set of pods with random pod names and the order of the pods is not important, while the StatefulSet creates pods with a unique naming convention and order. So, if you want to create three replicas of a pod called example, the StatefulSet will create pods with the following names: example-0, example-1, example-2. In this case, the most important benefit is that you can rely on the name of the pods.

DaemonSets

DaemonSet ensures that the pod runs on all the nodes of the cluster. If a node is added/removed from a cluster, DaemonSet automatically adds/deletes the pod. This is useful for monitoring and logging because you can monitor every node and don’t have to monitor the cluster manually.

Services

While deployment is responsible for keeping a set of pods running, the service is responsible for enabling network access to a set of pods. Services provide standardized features across the cluster: load balancing, service discovery between applications and zero-downtime application deployments. Each service has a unique IP address and a DNS hostname. Applications that consume a service can be manually configured to use either the IP address or the hostname and the traffic will be load balanced to the correct pods. In the External Traffic section, we will learn more about the service types and how we can communicate between our internal services and the external world.

ConfigMaps

If you want to deploy to multiple environments, like staging, dev and prod, it’s a bad practice to bake the configs into the application because of environmental differences. Ideally, you’ll want to separate configurations to match the deploy environment. This is where ConfigMap comes into play. ConfigMaps allow you to decouple configuration artifacts from image content to keep containerized applications portable.

External Traffic

Now that you’ve got the services running in your cluster, how do you get external traffic into your cluster? There are three different service types for handling external traffic: ClusterIPNodePort and LoadBalancer. The 4th solution is to add another layer of abstraction, called Ingress Controller.

ClusterIP

ClusterIP is the default service type in Kubernetes and lets you communicate with other services inside your cluster. While ClusterIP is not meant for external access, with a little hack using a proxy, external traffic can hit our service. Don’t use this solution in production, but only for debugging. Services declared as ClusterIP should NOT be directly visible from the outside.

NodePort

As we saw in the first part of this article, pods are running on nodes. Nodes can be different devices, like laptops or virtual machines (when working in the cloud). Each node has a fixed IP address. By declaring a service as NodePort, the service will expose the node’s IP address so that you can access it from the outside. You can use NodePort in production, but for large applications, where you have many services, manually managing all the different IP addresses can be cumbersome.

LoadBalancer

Declaring a service of type LoadBalancer exposes it externally using a cloud provider’s load balancer. How the external load balancer routes traffic to the Service pods depends on the cluster provider. With this solution, you don’t have to manage all the IP addresses of every node of the cluster, but you will have one load balancer per service. The downside is that every service has a separate load balancer and you will be billed per load balancer instance.

This solution is good for production, but it can be a little bit expensive. Let’s look at a less expensive solution.

Ingress

Ingress is not a service but an API object that manages external access to a cluster’s services. It acts as a reverse proxy and single entry-point to your cluster that routes the request to different services. I usually use NGINX Ingress Controller, which takes on reverse proxy while also functioning as SSL. The best production-ready solution to expose the ingress is to use a load balancer.

With this solution, you can expose any number of services using a single load balancer, so you can keep your bills as low as possible.

Next Steps

In this article, we learned about the basic concepts used in Kubernetes and its hardware structure. We also discussed the different software components including PodsDeploymentsStatefulSets and Services, and saw how to communicate between services and with the outside world.

In the next article, we’ll set up a cluster on Azure and create an infrastructure with a LoadBalanceran Ingress Controller and two Services and use two Deployments to spin up three Pods per Service.

There is another ongoing “Stupid Simple AI” series. Find the first two articles here: SVM and Kernel SVM and KNN in Python.

Want to Learn More from our Stupid Simple Series?

Read our eBook: Stupid Simple Kubernetes. Download it here!

The History of Cloud Native

Wednesday, 13 April, 2022

Cloud native is a term that’s been around for many years but really started gaining traction in 2015 and 2016. This could be attributed to the rise of Docker, which was released a few years prior. Still, many organizations started becoming more aware of the benefits of running their workloads in the cloud. Whether because of cost savings or ease of operations, companies were increasingly looking into whether they should be getting on this “cloud native” trend.

Since then, it’s only been growing in popularity. In this article, you’ll get a brief history of what cloud native means—from running applications directly on hosts before moving to a hybrid approach to how we’re now seeing companies born in the cloud. We’ll also cover the “cloud native” term itself, as the definition is something often discussed.

Starting with data centers

In the beginning, people were hosting their applications using their own servers in their own data centers. That might be a bit of an exaggeration in some cases, but what I mean is that specific servers were used for specific applications.

For a long time, running an application meant using an entire host. Today, we’re used to virtualization being the basis for pretty much any workload. If you’re running Windows Subsystem for Linux 2, even your Windows installation is virtualized; this hasn’t always been the case. Although the principle of virtualization has been around since the 60s, it didn’t start taking off in servers before the mid-2000s

Launching a new application meant you had to buy a new server or even an entirely new rack for it. In the early 2000s, this started changing as virtualization became more and more popular. Now it is possible to spin up applications without buying new hardware.

Applications were still running on-premises, also commonly referred to as “on-prem.” That made it hard to scale applications, and it also meant that you couldn’t pay for resources as you were using them. You had to buy resources upfront, requiring a big cash deposit in advance.

That was one of the big benefits companies saw when cloud computing became a possibility. Now you could pay only for the resources you were using, rather than having to deposit in advance—something very attractive to many companies.

Moving to hybrid

At this point, we’re still far from cloud native being a term commonly used by close to everyone working with application infrastructure. Although the term was being thrown around from the beginning of AWS launching its first beta service (SQS) in 2004 and making it generally available in 2006, companies were still exploring this new trend.

To start with, cloud computing also mostly meant a replica of what you were running on-prem. Most of the advantages came from buying only the resources you needed and scaling your applications. Within the first year of AWS being live, they launched four important services: SQS, EC2, S3 and SimpleDB.

Elastic Compute Cloud (EC2) was, and still is, primarily a direct replica of the traditional Virtual Machine. It allows engineers to perform what’s known as a “lift-and-shift” maneuver. As the name suggests, you lift your existing infrastructure from your data center and shift it to the cloud. This was the case with Simple Storage Service (S3) and SimpleDB, a database platform. At the time, companies could choose between running their applications on-prem or in the cloud, but the advantages weren’t as clear as they are today.

That isn’t to say that the advantages were negligible. Only paying for resources you use and not having to manage underlying infrastructure yourself are attractive qualities. This led to many shifting their workload to the cloud or launching new applications in the cloud directly, arguably the first instances of “cloud native.”

Many companies were now dipping their toes into this hybrid approach of using both hardware on their own premises and cloud resources. Over time, AWS launched more services, making a case for working in the cloud more complex. With the launch of Amazon CloudFront, a Content Delivery Network (CDN) service, AWS provided a service that was certainly possible to run yourself, but where it was much easier to run in the cloud. It wasn’t just whether the workload should be running on-prem or in the cloud; it was a matter of whether the cloud could provide previously unavailable possibilities.

In 2008, Google launched the Google Cloud Platform (GCP), and in 2010 Microsoft launched Azure. With more services launching, the market was gaining competition. Over time, all three providers started providing services specialized to the cloud rather than replicas of what was possible on-prem. Nowadays, you can get services like serverless functions, platforms as a service and much more; this is one of the main reasons companies started looking more into being cloud native.

Being cloud native

Saying that a company is cloud native is tricky because the industry does not have a universal definition. Ask five different engineers what it means to be cloud native, and you’ll get five different answers. Although, generally, you can split it into two camps.

A big part of the community believes that being cloud native just means that you are running your workloads in the cloud, with none of them being on-prem. There’s also a small subsection of this group who will say that you can be partly cloud native, meaning that you have one full application running in the cloud and another application running on-prem. However, some argue that this is still a hybrid approach.

There’s another group of people who believe that to be cloud native, you have to be utilizing the cloud to its full potential. That means that you’re not just using simple services like EC2 and S3 but taking full advantage of what your cloud provider offers, like serverless functions.

Over time, as the cloud becomes more prominent and mature, a third option appears. Some believe that to be cloud native, your company has to be born in the cloud; this is something we see more and more. Companies that have never had a single server running on-prem have launched even their first applications in the cloud.

One of the only things everyone agrees on about cloud native is cloud providers are now so prominent in the industry that anyone working with applications and application infrastructure has to think about it. Every new company has to consider whether they should build their applications using servers hosted on-prem or use services available from a cloud provider.

Even companies that have existed for quite a while are spending a lot of time considering whether it’s time to move their workloads to the cloud; this is where we see the problem of tackling cloud native at scale.

Tackling cloud native at scale

Getting your applications running in the cloud doesn’t have to be a major issue. You can follow the old lift-and-shift approach and move your applications directly to the cloud with the same infrastructure layout you used when running on-prem.

While that will work for most, it defeats some of the purposes of being in the cloud; after all, a couple of big perks of using the cloud are cost savings and resource optimization. One of the first approaches teams usually think about when they want to implement resource optimizations is converting their monolith applications to microservices; whether or not that is appropriate for your organization is an entirely different topic.

It can be tough to split an application into multiple pieces, especially if it’s something that’s been developed for a decade or more. However, the application itself is only one part of why scaling your cloud native journey can become troublesome. You also have to think about deploying and maintaining the new services you are launching.

Suddenly you have to think about scenarios where developers are deploying multiple times a day to many different services, not necessarily hosted on the same types of platforms. On your journey to being cloud native, you’ll likely start exploring paradigms like serverless functions and other specialized services by your cloud provider. Now you need to think about those as well.

My intent is not to scare anyone away from cloud native. These are just examples of what some organizations don’t think about, whether because of priorities or time, that come back to haunt them once they need to scale a certain application.

Popular ways of tackling cloud native at scale

Engineers worldwide are still trying to figure out the best way of being cloud native at scale, and it will likely be an ongoing problem for at least a few more years. However, we’re already seeing some solutions that could shape the future of cloud native.

From the beginning, virtualization has been the key to creating a good cloud environment. It’s mostly been a case of the cloud provider using virtualization and the customer using regular servers as if it were their own hardware. This is changing now that more companies integrate tools like Docker and Kubernetes into their infrastructure.

Now, it’s not only a matter of knowing that your cloud provider uses virtualization under the hood. Developers have to understand how to use virtualization efficiently. Whether it’s with Docker and Kubernetes or something else entirely, it’s a safe bet to say that virtualization is a key concept that will continue to play a major role when tackling cloud native.

Conclusion

In less than two decades, we’ve gone from people buying new servers for each new application launch to considering how applications can be split and scaled individually.

Cloud native is an exciting territory that provides value for many companies, whether they’re born in the cloud or on their way to embracing the idea. It’s an entirely different paradigm from what was common 20 years ago and allows for many new possibilities. It’s thrilling to see what companies have made possible with the cloud, and I’ll be closely watching as companies develop new ideas to scale their cloud workloads.

Let’s continue the conversation! Join the SUSE & Rancher Community where you can further your Kubernetes knowledge and share your experience.