Stupid Simple Open Source

Thursday, 26 August, 2021

Even if we don’t realize it, almost all of us have used open source software. When we buy a new Android phone, we read its specs and, usually, focus on the hardware capabilities, like CPU, RAM, camera, etc. The brains of these tools are their operating systems, which are open source software. The Android operating system powers more than 70 percent of mobile phones, demonstrating the prowess of open source software.

Before the free software movement, the first personal computer was hard to maintain and expensive; this wasn’t because of the hardware but the software. You could be the best programmer in the world, but without collaboration and knowledge sharing, your software creation will likely have issues: bugs, usability problems, design problems, performance issues, etc. What’s more, maintaining these products will cost time and money. Before the appearance of open source software, big companies believed they had to protect their intellectual property, so they kept the source code secret. They did not realize that letting people inspect their source codes and fix bugs would improve their software. Collaboration leads to great success.

What is Open Source Software?

Simply put, open source software has public source code, which can be seeninspectedmodifiedimproved or even sold by anyone. In contrast, non-open source, proprietary software has code that can be seen, modified and maintained only by a limited amount of people, a person, a team or an organization.

In both cases, the user must accept the licensing agreements. To use proprietary software, users must promise (typically by signing a license displayed the first time they run it) that they will not do anything with the software that its developers/owners have not explicitly authorized. Examples of proprietary software are the Windows operating system and Microsoft Office.

Users must accept the terms of a license when using open source software, just as they do when using proprietary software, but these terms are very different. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source. Furthermore, these licenses usually state that the original creator cannot be liable for any harm or damage that the open source code may cause. This protects the creator of the open source code. Good examples of open source software are the Linux operating system, the Android operating system, LibreOffice and Kubernetes.

The Beginning of Open Source

Initially, software was developed by companies in-house. The creators controlled this software, with no right for the user to modify it, fix it or even inspect it. This also made collaboration between programmers difficult as knowledge sharing was near impossible.

In 1971, Richard Stallman joined the MIT Artificial Intelligence Lab. He noticed that most MIT developers were joining private corporations, which were not sharing knowledge with the outside world. He realized that this privacy and lack of collaboration would create a bigger gap between users and technical developers. According to Stallman, “software is meant to be free but in terms of accessibility and not price.” To fight against privatization, Stallman developed the GNU Project and then founded the Free Software Foundation (FSF). Many developers started using GNU in response to these initiatives, and many even fixed bugs they detected.

Stallman’s initiative was a success. Because he pushed against privatized software, more open source projects followed. The next big steps in open source software were the releases of Mozilla and the Linux operating system. Companies had begun to realize that open source might be the next big thing.

The Rise of Open Source

After the GNU, Mozilla, and Linux open source projects, more developers started to follow the open source movement. As the next big step in the history of open source, David Heinemeier Hansson introduced Ruby on Rails. This web application framework soon became one of the world’s most prominent web development tools. Popular platforms like Twitter would go on to use Ruby on Rails to develop their sites. When Sun Microsystems bought MySql for 1 billion dollars in 2008, it showed that open source could also be a real business, not just a beautiful idea.

Nowadays, big companies like IBM, Microsoft and Google embrace open source. So, why do these big companies give away their fearfully guarded source code? They realized the power of collaboration and knowledge sharing. They hoped that outside developers would improve the software as they adapted it to their needs. They realized that it is impossible to hire all the great developers of the world, and many developers are out there who could positively contribute to their product. It worked. Hundreds of outsiders collaborated on one of the most successful AI tools at Google, Tensorflow, which was a great success. Another success story is Microsoft’s open source .Net Core.

Why Would I Work on Open Source Projects?

Just think about it: how many times have open source solutions (libraries, frameworks, etc.) helped you in your daily job? How often did you finish your tasks earlier because you’d found a great open source, free tool that worked for you?

The most important reason to participate in the open source community is to help others and to give something back to the community. Open source has helped us a lot, shaping our world unprecedentedly. We may not realize it, but many of the products we are using currently result from open source.

In a modern world, collaboration and knowledge sharing are a must. Nowadays, inventions are rarely created by a single individual. Increasingly, they are made through collaboration with people from all around the world. Without the movement of free and open source software, our world would be completely different.  We’d live with isolated knowledge and isolated people, lots of small bubble worlds, and not a big, collaborative and helpful community (think about what you would do without StackOverflow?).

Another reason to participate is to gain real-world experience and technical upskilling. In the open source community, you can find all kinds of challenges that aren’t present in a single company or project. You can also earn recognition through problem-solving and helping developers with similar issues.

Finding Open Source Projects

If you would like to start contributing to the open source community, here are some places where you can find great projects:

CodeTriage: a website where you can find popular open source projects based on your programming language preferences. You’ll see popular open source projects like K8sTensorflowPandasScikit-LearnElasticsearch, etc.

awesome-for-beginners: a collection of Git repositories with beginner-friendly projects.

Open Source Friday: a movement to encourage people, companies and maintainers to contribute a few hours to open source software every Friday.

For more information about how to start contributing to open source projects, visit the newbie open source Git repository.

Conclusion

In the first part of this article, we briefly introduced open source. We described the main differences between open source and proprietary software and presented a brief history of the open source and free software movement.

In the second part, we presented the benefits of working on open source projects. In the last part, we gave instructions on how to start contributing to the open source community and how to find relevant projects.

Tags: Category: Cloud Computing, DevOps, Digital Transformation Comments closed

Deploying and Serving a Web Application on Kubernetes with Docker, K3s and Knative

Monday, 14 June, 2021

This article will take a working TODO application written in Flask and JavaScript with a MongoDB database and learn how to deploy it onto Kubernetes. This post is geared toward beginners: if you do not have access to a Kubernetes cluster, fear not!

We’ll use K3s, a lightweight Kubernetes distribution that is excellent for getting started quickly. But first, let’s talk about what we want to achieve.

First, I’ll introduce the example application. This is kept intentionally simple, but it illustrates a common use case. Then we’ll go through the process of containerizing the application. Before we move on, I’ll talk about how we can use containers to ease our development, especially if we work in a team and want to ease developer ramp-up time or when we are working in a fresh environment.

Once we have containerized the applications, the next step is deploying them onto Kubernetes. While we can create ServicesIngresses and Gateways manually, we can use Knative to stand up our application in no time at all.

Setting Up the App

We will work with a simple TODO application that demonstrates a front end, REST API back end and MongoDB working in concert. Credits go to Prashant Shahi for coming up with the example application. I have made some minor changes purely for pedagogical purposes.

First, git clone the repository:

git clone https://github.com/benjamintanweihao/Flask-MongoDB-K3s-KNative-TodoApp

Next, let’s inspect the directory to get the lay of the land:

% cd Flask-MongoDB-K3s-KNative-TodoApp
% tree

The folder structure is a typical Flask application. The entry point is app.py, which also contains the REST APIs. The templates folder consists of the files that would be rendered as HTML.

├── app.py
├── requirements.txt
├── static
│   ├── assets
│   │   ├── style.css
│   │   ├── twemoji.js
│   │   └── twemoji.min.js
└── templates
    ├── index.html
    └── update.html

Open app.py and we can see all the major pieces:

mongodb_host = os.environ.get('MONGO_HOST', 'localhost')
mongodb_port = int(os.environ.get('MONGO_PORT', '27017'))
client = MongoClient(mongodb_host, mongodb_port)
db = client.camp2016
todos = db.todo 

app = Flask(__name__)
title = "TODO with Flask"

@app.route("/list")
def lists ():
    #Display the all Tasks
    todos_l = todos.find()
    a1="active"
    return render_template('index.html',a1=a1,todos=todos_l,t=title,h=heading)

if __name__ == "__main__":
    env = os.environ.get('APP_ENV', 'development')
    port = int(os.environ.get('PORT', 5000))
    debug = False if env == 'production' else True
    app.run(host='0.0.0.0', port=port, debug=debug)

From the above code snippet, you can see that the application requires MongoDB as the database. With the lists() method, you can then see an example of how a route is defined (i.e. @app.route("/list")), how data is fetched from MongoDB, and finally, how the template is rendered.

Another thing to notice here is the use of environment variables for MONGO_HOST and MONGO_PORT and Flask-related environment variables. The most important is debug. When set to True, the Flask server automatically reloads when it detects and changes. This is especially handy during development and is something we’ll exploit.

Developing with Docker Containers

When working on applications, I spent a lot of time setting up my environment and installing all the dependencies. After that, I could get up and running by adding new features. However, this only describes an ideal scenario, right?

How often have you gone back to an application that you developed (say six months ago), only to find out that you are slowly descending into dependency hell? Dependencies are often a moving target; unless you lock things down, your application might not work properly. One way to get around this is to package all the dependencies into Docker containers.

Another nice thing that Docker brings is automation. That means no more copying and pasting commands and setting up things like databases.

Dockerizing the Flask Application

Here’s the Dockerfile:

FROM alpine:3.7
COPY . /app
WORKDIR /app

RUN apk add --no-cache bash git nginx uwsgi uwsgi-python py2-pip \
    && pip2 install --upgrade pip \
    && pip2 install -r requirements.txt \
    && rm -rf /var/cache/apk/*

EXPOSE 5000
ENTRYPOINT ["python"]

We start with a minimal (in terms of size and functionality) base image. Then, the application’s contents go into the container’s directory. Next, we execute a series of commands to install Python, the Nginx web server and all the Flask application’s requirements. These are exactly the steps needed to set up the application on a fresh system.

You can build the Docker container like so:

% docker build -t <yourusername>/todo-app .

You should see something like this:

# ...
Successfully built c650af8b7942
Successfully tagged benjamintanweihao/todo-app:latest

What about MongoDB?

Should you go through the same process of creating a Dockerfile for MongoDB? The good news is that someone else has done it more often than not. In our case: https://hub.docker.com/_/mongo. However, now you have two containers, with the Flask container depending on the MongoDB one.

One way is to start the MongoDB container first, followed by the Flask one. However, let’s say you want to add caching and decide to bring in a Redis container. Then the process of starting each container gets old fast. The solution is Docker Compose, a tool that lets you define and run multiple Docker containers, which is exactly the situation that we have here.

Docker Compose

Here’s the Docker compose file, docker-compose.yaml:

services:
  flaskapp:
    build: .
    image: benjamintanweihao/todo-app:latest
    ports:
      - 5000:5000
    container_name: flask-app
    environment:
      - MONGO_HOST=mongo
      - MONGO_PORT=27017
    networks:
      - todo-net
    depends_on:
      - mongo
    volumes:
      - .:/app # <--- 
  mongo:
    image: mvertes/alpine-mongo
    ports:
      - 27017:27017
    networks:
      - todo-net

networks:
  todo-net:
    driver: bridge

Even if you’re unfamiliar with Docker Compose, the YAML file presented here isn’t complicated. Let’s go through the important bits.

At this highest level, this file defines services, composed of the flaskapp and mongo, and networksSpecifying a bridged connection. This creates a network connection so that the containers defined in services can communicate with each other.

Each service defines the image, along with the port mappings, and the network defined earlier. Environment variables have also been defined in flaskapp (look at app.py to see that they are indeed the same ones.)

I want to call your attention to the volumes specified in flaskapp. What we are doing here is mapping the current directory of the host (which should be the project directory containing app.py to the /app directory of the container.)  Why are we doing this? Recall that in the Dockerfile, we copied the app into the /app directory like so:

COPY . /app

Now imagine that you want to make a change to the app. You wouldn’t be able to easily change app.py in the container. By mapping over the local directory, you are essentially overwriting the app.py in the container with the local copy in your directory. So assuming that the Flask application is in debug mode (it is if you have not changed anything at this point), when you launch the containers and make a change, the rendered output reflects the change.

However, it is important to realize that the app.py in the container is still the old version, and you will still need to remember to build a new image. (Hopefully, you have CI/CD set up to do this automatically!)

Enough talk; let’s see this in action. Run the following command:

docker-compose up

This is what you should see:

Creating network "flask-mongodb-k3s-knative-todoapp_my-net" with driver "bridge"
Creating flask-mongodb-k3s-knative-todoapp_mongo_1 ... done
Creating flask-app                                 ... done
Attaching to flask-mongodb-k3s-knative-todoapp_mongo_1, flask-app
# ... more output truncated
flask-app   |  * Serving Flask app "app" (lazy loading)
flask-app   |  * Environment: production
flask-app   |    WARNING: Do not use the development server in a production environment.
flask-app   |    Use a production WSGI server instead.
flask-app   |  * Debug mode: on
flask-app   |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
flask-app   |  * Restarting with stat
mongo_1     | 2021-05-15T15:41:37.993+0000 I NETWORK  [listener] connection accepted from 172.23.0.1:48844 #2 (2 connections now open)
mongo_1     | 2021-05-15T15:41:37.993+0000 I NETWORK  [conn2] received client metadata from 172.23.0.1:48844 conn2: { driver: { name: "PyMongo", version: "3.11.4" }, os: { type: "Linux", name: "", architecture: "x86_64", version: "5.8.0-53-generic" }, platform: "CPython 2.7.15.final.0" }
flask-app   |  * Debugger is active!
flask-app   |  * Debugger PIN: 183-021-098

Now head to http://localhost:5000 in your browser:

If you see this, congratulations! Flask and Mongo are working properly together. Feel free to play around with the application to get a feel of it.

Now let’s make a tiny change to app.py in the title of the application:

index d322672..1c447ba 100644
--- a/app.py
+++ b/app.py
-heading = "tOdO Reminder"
+heading = "TODO Reminder!!!!!"

Save the file and reload the app:

Once you are done, you can issue the following command:

docker-compose down

Getting the Application onto Kubernetes

Now comes the fun part. Up to this point, we have containerized our application and its supporting services (just MongoDB for now). How can we start to deploy our application onto Kubernetes?

Before that, let’s install Kubernetes. For this, I’m picking K3s because it’s the easiest way to install Kubernetes and super easy to get up and running.

% curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy=traefik"  sh -s -

In a few moments, you will have Kubernetes installed:

[INFO]  Finding release for channel stable
[INFO]  Using v1.20.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.20.6+k3s1/sha256sum-amd64.txt
# truncated ...
[INFO]  systemd: Starting k3s

Verify that K3s has been set up properly:

% kubectl get no
NAME      STATUS   ROLES                  AGE     VERSION
artemis   Ready    control-plane,master   2m53s   v1.20.6+k3s1

MongoDB

There are multiple ways of doing this. You could use the image we created, a MongoDB operator or Helm:

helm install mongodb-release bitnami/mongodb --set architecture=standalone --set auth.enabled=false
** Please be patient while the chart is being deployed **

MongoDB(R) can be accessed on the following DNS name(s) and ports from within your cluster:

    mongodb-release.default.svc.cluster.local

To connect to your database, create a MongoDB(R) client container:

    kubectl run --namespace default mongodb-release-client --rm --tty -i --restart='Never' --env="MONGODB_ROOT_PASSWORD=$MONGODB_ROOT_PASSWORD" --image docker.io/bitnami/mongodb:4.4.6-debian-10-r0 --command -- bash

Then, run the following command:
    mongo admin --host "mongodb-release"

To connect to your database from outside the cluster execute the following commands:

    kubectl port-forward --namespace default svc/mongodb-release 27017:27017 &
    mongo --host 127.0.0.1

Install Knative and Istio

In this post, we will be using Knative. Knative builds on Kubernetes, making it easy for developers to deploy and run applications without knowing many of the gnarly details of Kubernetes.

Knative is made up of two parts: Serving and Eventing. In this section, we will deal with the Serving portion. With Knative Serving, you can create scalable, secure, and stateless services in a matter of seconds, and that is what we will do with our TODO app! Before that, let’s install Knative:

The following instructions were based on: https://knative.dev/docs/install/install-serving-with-yaml/:

kubectl apply -f https://github.com/knative/serving/releases/download/v0.22.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/v0.22.0/serving-core.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/v0.22.0/istio.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/v0.22.0/net-istio.yaml

This sets up Knative and Istio. You might be wondering why we need Istio. The reason is that Knative requires an Ingress controller to perform things like traffic splitting (for example, version 1 and version 2 of the TODO app running concurrently) and automatic HTTP request retries.

Are there alternatives to Istio? At this point, I am only aware of one: Gloo. Traefik is not supported now, so we had to disable it when installing K3s. Since Istio is the default and the most supported, we’ll go with it.

Now wait till all the knative-serving pods are running:

kubectl get pods --namespace knative-serving -w
NAME                                READY   STATUS    RESTARTS   AGE
controller-57956677cf-2rqqd         1/1     Running   0          3m39s
webhook-ff79fddb7-mkcrv             1/1     Running   0          3m39s
autoscaler-75895c6c95-2vv5b         1/1     Running   0          3m39s
activator-799bbf59dc-t6v8k          1/1     Running   0          3m39s
istio-webhook-5f876d5c85-2hnvc      1/1     Running   0          44s
networking-istio-6bbc6b9664-shtd2   1/1     Running   0          44s

Setting up a Custom Domain

By default, Knative Serving uses example.com as the default domain. If you have set up K3s as per the instructions, you should have a load balancer installed. This means that with some setup, you can create a custom domain using a “magic” DNS service like sslip.io.

sslip.io is a service that returns that IP Address when queried with a hostname with an embedded IP address. For example, a URL such as 192.168.0.1.sslip.io will point to 192.168.0.1. This is excellent for experimenting, where you don’t have to go buy your own domain name.

Go ahead and apply the following manifest:

kubectl apply -f https://storage.googleapis.com/knative-nightly/serving/latest/serving-default-domain.yaml

If you open the  serving-default-domain.yaml, you will notice the following in the spec:

# other parts truncated      spec:
    serviceAccountName: controller
    containers:
        - name: default-doma
          image: ko://knative.dev/serving/cmd/default-domain
          args: ["-magic-dns=sslip.io"]

This enables the “magic” DNS that you will use in the next step.

Testing that Everything Works

Download the kn binary. You can find the links here: https://knative.dev/development/client/install-kn/. Be sure to rename the binary  kn and place it somewhere in your $PATH. Once you get that sorted out, go ahead and create the sample Hello World service. I have already pushed the benjamintanweihao/helloworld-python image to Docker Hub:

% kn service create helloworld-python --image=docker.io/benjamintanweihao/helloworld-python --env TARGET="Python Sample v1"

This results in the following output:

Creating service 'helloworld-python' in namespace 'default':

  0.037s The Route is still working to reflect the latest desired specification.
  0.099s Configuration "helloworld-python" is waiting for a Revision to become ready.
 29.277s ...
 29.314s Ingress has not yet been reconciled.
 29.446s Waiting for load balancer to be ready
 29.605s Ready to serve.

Service 'helloworld-python' created to latest revision 'helloworld-python-00001' is available at URL:
http://helloworld-python.default.192.168.86.26.sslip.io

To list all the deployed Knative services in all namespaces, you can do:

% kn service  list -A

With kubectl, this becomes:

% kubectl get ksvc -A

To delete the service, it is as simple as:

kn service delete helloworld-python # or kubectl delete ksvc helloworld-python

If you haven’t done so, ensure the todo-app image has been pushed to DockerHub. (If you are unfamiliar with pushing images to DockerHub, then the DockerHub Quickstart is a great place). Remember to replace {username} with your DockerHub ID :

% docker push {username}/todo-app:latest

Once the image has been pushed, you can then use the kn command to create the TODO service. Remember to replace {username} with your DockerHub ID:

kn service create todo-app --image=docker.io/{username}/todo-app --env MONGO_HOST="mongodb-release.default.svc.cluster.local" 

If everything went well, you will see this:

Creating service 'todo-app' in namespace 'default':

  0.022s The Route is still working to reflect the latest desired specification.
  0.085s Configuration "todo-app" is waiting for a Revision to become ready.
  4.586s ...
  4.608s Ingress has not yet been reconciled.
  4.675s Waiting for load balancer to be ready
  4.974s Ready to serve.

Service 'todo-app' created to latest revision 'todo-app-00001' is available at URL:
http://todo-app.default.192.168.86.26.sslip.io

Now head over to http://todo-app.default.192.168.86.26.sslip.io (or whatever has been printed on the last line of the previous output) and you should see the application! Now take a step back and see what Knative has done for you. Knative has spun up a service for you in a single command and given you a URL that you can access from your cluster.

I’ve barely scratched the surface with Knative, but I hope this motivates you to learn more about it! When I started looking at Knative, I didn’t quite understand what it did. Hopefully, the example sheds some light on the awesomeness of Knative and its convenience.

Conclusion

In this article, we took a whirlwind tour of taking a web application built in Python and requiring MongoDB and learned how to:

  1. Containerize the TODO application using Docker
  2. Use Docker to alleviate dependency hell
  3. Use Docker for development
  4. Use Docker Compose to package multiple containers
  5. Install K3s
  6. Install KNative (Serving) and Istio
  7. Use Helm to deploy MongoDB
  8. Use Knative to deploy the TODO application in a single line

While migrating an application to Kubernetes is certainly not a trivial task, containerizing your application usually gets you halfway there. Of course, there are still many things that weren’t covered, such as security and scaling.

K3s is an excellent platform to test and run Kubernetes workloads and is especially useful when running on a laptop/desktop.

I’ve barely scratched the surface with Knative, but I hope this motivates you to learn more about it! When I started looking at Knative, I didn’t quite understand what it did. Hopefully, the example sheds some light on the awesomeness of Knative and its conveniences. Indeed, one of the highlights of Knative is to “Stand up a scalable, secure, stateless service in seconds.” And as you can see, Knative delivers on that promise.

I will cover more about Knative and go deeper into its core features in a future article. I hope you can take what you have read here and adapt it to your applications!

Deploying and Serving Web Applications on Kubernetes with Docker, K3s and Knative

Monday, 14 June, 2021

In this article, we will take a working TODO application written in Flask and JavaScript, with a MongoDB database, and learn how to deploy it onto Kubernetes. This post is geared toward beginners; if you do not have access to a Kubernetes cluster, fear not!

We’ll use K3s, a lightweight Kubernetes distribution that is excellent for getting started quickly.

Let’s talk about what we want to achieve.

First, I’ll introduce the example application. This is kept intentionally simple but illustrates a common use case. Then we’ll go through the process of containerizing the application. Before we move on, I’ll talk about how we can use containers to ease our development, especially if we work in a team and want to ease developer ramp-up time or when we are working in a fresh environment.

Once we have containerized the applications, the next step is deploying them onto Kubernetes. While we can create ServicesIngresses and Gateways manually, we can use Knative to stand up our application in no time at all.

Setting up the app

We will work with a simple TODO application that demonstrates a front end, REST API back end and MongoDB working in concert. Credits go to Prashant Shahi for coming up with the example application. I have made some minor changes purely for pedagogical purposes.

First, git clone the repository:

git clone https://github.com/benjamintanweihao/Flask-MongoDB-K3s-KNative-TodoApp

Next, let’s inspect the directory to get the lay of the land:

% cd Flask-MongoDB-K3s-KNative-TodoApp
% tree

The folder structure is a typical Flask application. The entry point is app.py which also contains the REST APIs. The templates folder consists of the files that would be rendered as HTML.

├── app.py
├── requirements.txt
├── static
│   ├── assets
│   │   ├── style.css
│   │   ├── twemoji.js
│   │   └── twemoji.min.js
└── templates
    ├── index.html
    └── update.html

Open app.py and we can see all the major pieces:

mongodb_host = os.environ.get('MONGO_HOST', 'localhost')
mongodb_port = int(os.environ.get('MONGO_PORT', '27017'))
client = MongoClient(mongodb_host, mongodb_port)
db = client.camp2016
todos = db.todo 

app = Flask(__name__)
title = "TODO with Flask"

@app.route("/list")
def lists ():
    #Display the all Tasks
    todos_l = todos.find()
    a1="active"
    return render_template('index.html',a1=a1,todos=todos_l,t=title,h=heading)

if __name__ == "__main__":
    env = os.environ.get('APP_ENV', 'development')
    port = int(os.environ.get('PORT', 5000))
    debug = False if env == 'production' else True
    app.run(host='0.0.0.0', port=port, debug=debug)

From the above code snippet, you can see that the application requires MongoDB as the database. With the lists() method, you can then see an example of how a route is defined (i.e. @app.route("/list")), how data is fetched from MongoDB and finally, how the template is rendered.

Another thing to notice here is the use of environment variables for MONGO_HOST and MONGO_PORT and Flask-related environment variables. The most important is debug. When set to True, the Flask server automatically reloads when it detects and changes. This is especially handy during development and is something we’ll exploit.

Developing with docker containers

When working on applications, I spent a lot of time setting up my environment and installing all the dependencies. After that, I could get up and running by adding new features. However, this only describes an ideal scenario, right?

How often have you gone back to an application that you developed (say six months ago), only to find out that you are slowly descending into dependency hell? Dependencies are often a moving target; unless you lock things down, your application might not work properly. One way to get around this is to package all the dependencies into Docker containers.

Another nice thing that Docker brings is automation. That means no more copying and pasting commands and setting up things like databases.

Dockerizing the flask application

Here’s the Dockerfile:

FROM alpine:3.7
COPY . /app
WORKDIR /app

RUN apk add --no-cache bash git nginx uwsgi uwsgi-python py2-pip \
    && pip2 install --upgrade pip \
    && pip2 install -r requirements.txt \
    && rm -rf /var/cache/apk/*

EXPOSE 5000
ENTRYPOINT ["python"]

We start with a minimal (in terms of size and functionality) base image. Then, the application’s contents go into the container’s directory. Next, we execute a series of commands to install Python, the Nginx web server and all the Flask application’s requirements. These are exactly the steps needed to set up the application on a fresh system.

You can build the Docker container like so:

% docker build -t <yourusername>/todo-app .

You should see something like this:

# ...
Successfully built c650af8b7942
Successfully tagged benjamintanweihao/todo-app:latest

What about MongoDB?

Should you go through the same process of creating a Dockerfile for MongoDB? The good news is that someone else has done it more often than not. In our case: https://hub.docker.com/_/mongo. However, now you have two containers, with the Flask container depending on the MongoDB one.

One way is to start the MongoDB container first, followed by the Flask one. However, let’s say you want to add caching and decide to bring in a Redis container. Then the process of starting each container gets old fast. The solution is Docker Compose, a tool that lets you define and run multiple Docker containers, which is exactly the situation that we have here.

Docker compose

Here’s the Docker compose file, docker-compose.yaml:

services:
  flaskapp:
    build: .
    image: benjamintanweihao/todo-app:latest
    ports:
      - 5000:5000
    container_name: flask-app
    environment:
      - MONGO_HOST=mongo
      - MONGO_PORT=27017
    networks:
      - todo-net
    depends_on:
      - mongo
    volumes:
      - .:/app # <--- 
  mongo:
    image: mvertes/alpine-mongo
    ports:
      - 27017:27017
    networks:
      - todo-net

networks:
  todo-net:
    driver: bridge

Even if you’re unfamiliar with Docker Compose, the YAML file presented here isn’t complicated. Let’s go through the important bits.

At this highest level, this file defines services, composed of the flaskapp and mongo, and networksSpecifying a bridged connection. This creates a network connection so that the containers defined in services can communicate with each other.

Each service defines the image, along with the port mappings, and the network defined earlier. Environment variables have also been defined in flaskapp (look at app.py to see that they are indeed the same ones.)

I want to call your attention to the volumes specified in flaskapp. What we are doing here is mapping the current directory of the host (which should be the project directory containing app.py to the /app directory of the container.)  Why are we doing this? Recall that in the Dockerfile, we copied the app into the /app directory like so:

COPY . /app

Now imagine that you want to make a change to the app. You wouldn’t be able to easily change app.py in the container. By mapping over the local directory, you are essentially overwriting the app.py in the container with the local copy in your directory. So assuming that the Flask application is in debug mode (it is if you have not changed anything at this point), when you launch the containers and make a change, the rendered output reflects the change.

However, it is important to realize that the app.py in the container is still the old version, and you will still need to remember to build a new image. (Hopefully, you have CI/CD set up to do this automatically!)

Enough talk; let’s see this in action. Run the following command:

docker-compose up

This is what you should see:

Creating network "flask-mongodb-k3s-knative-todoapp_my-net" with driver "bridge"
Creating flask-mongodb-k3s-knative-todoapp_mongo_1 ... done
Creating flask-app                                 ... done
Attaching to flask-mongodb-k3s-knative-todoapp_mongo_1, flask-app
# ... more output truncated
flask-app   |  * Serving Flask app "app" (lazy loading)
flask-app   |  * Environment: production
flask-app   |    WARNING: Do not use the development server in a production environment.
flask-app   |    Use a production WSGI server instead.
flask-app   |  * Debug mode: on
flask-app   |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
flask-app   |  * Restarting with stat
mongo_1     | 2021-05-15T15:41:37.993+0000 I NETWORK  [listener] connection accepted from 172.23.0.1:48844 #2 (2 connections now open)
mongo_1     | 2021-05-15T15:41:37.993+0000 I NETWORK  [conn2] received client metadata from 172.23.0.1:48844 conn2: { driver: { name: "PyMongo", version: "3.11.4" }, os: { type: "Linux", name: "", architecture: "x86_64", version: "5.8.0-53-generic" }, platform: "CPython 2.7.15.final.0" }
flask-app   |  * Debugger is active!
flask-app   |  * Debugger PIN: 183-021-098

Now head to http://localhost:5000 in your browser:

If you see this, congratulations! Flask and Mongo are working properly together. Feel free to play around with the application to get a feel of it.

Now let’s make a tiny change to app.py in the title of the application:

index d322672..1c447ba 100644
--- a/app.py
+++ b/app.py
-heading = "tOdO Reminder"
+heading = "TODO Reminder!!!!!"

Save the file and reload the app:

Once you are done, you can issue the following command:

docker-compose down

Getting the application onto Kubernetes

Now comes the fun part. Up to this point, we have containerized our application and its supporting services (just MongoDB for now). How can we start to deploy our application onto Kubernetes?

Before that, let’s install Kubernetes. For this, I’m picking K3s because it’s the easiest way to install Kubernetes and super easy to get up and running.

% curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy=traefik"  sh -s -

In a few moments, you will have Kubernetes installed:

[INFO]  Finding release for channel stable
[INFO]  Using v1.20.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.20.6+k3s1/sha256sum-amd64.txt
# truncated ...
[INFO]  systemd: Starting k3s

Verify that K3s has been set up properly:

% kubectl get no
NAME      STATUS   ROLES                  AGE     VERSION
artemis   Ready    control-plane,master   2m53s   v1.20.6+k3s1

MongoDB

There are multiple ways of doing this. You could use the image we created, a MongoDB operator or Helm:

helm install mongodb-release bitnami/mongodb --set architecture=standalone --set auth.enabled=false
** Please be patient while the chart is being deployed **

MongoDB(R) can be accessed on the following DNS name(s) and ports from within your cluster:

    mongodb-release.default.svc.cluster.local

To connect to your database, create a MongoDB(R) client container:

    kubectl run --namespace default mongodb-release-client --rm --tty -i --restart='Never' --env="MONGODB_ROOT_PASSWORD=$MONGODB_ROOT_PASSWORD" --image docker.io/bitnami/mongodb:4.4.6-debian-10-r0 --command -- bash

Then, run the following command:
    mongo admin --host "mongodb-release"

To connect to your database from outside the cluster execute the following commands:

    kubectl port-forward --namespace default svc/mongodb-release 27017:27017 &
    mongo --host 127.0.0.1

Install Knative and Istio

In this post, we will be using Knative. Knative builds on Kubernetes, making it easy for developers to deploy and run applications without knowing a lot of the gnarly details of Kubernetes.

Knative is made up of two parts: Serving and Eventing. In this section, we will deal with the Serving portion. With Knative Serving, you can create scalable, secure and stateless services in a matter of seconds, and that is what we will do with our TODO app! Before that, let’s install Knative:

The following instructions were based on: https://knative.dev/docs/install/install-serving-with-yaml/:

kubectl apply -f https://github.com/knative/serving/releases/download/v0.22.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/v0.22.0/serving-core.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/v0.22.0/istio.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/v0.22.0/net-istio.yaml

This sets up Knative and Istio. You might be wondering why do we need Istio. The reason is that Knative requires an Ingress controller so that it can perform things like traffic splitting (for example, version 1 and version 2 of the TODO app running concurrently) and automatic HTTP request retries.

Are there alternatives to Istio? At this point, I am only aware of one: Gloo. Traefik is not supported now, so we had to disable it when installing K3s. Since Istio is the default and the most supported, we’ll go with it.

Now wait till all the knative-serving pods are running:

kubectl get pods --namespace knative-serving -w
NAME                                READY   STATUS    RESTARTS   AGE
controller-57956677cf-2rqqd         1/1     Running   0          3m39s
webhook-ff79fddb7-mkcrv             1/1     Running   0          3m39s
autoscaler-75895c6c95-2vv5b         1/1     Running   0          3m39s
activator-799bbf59dc-t6v8k          1/1     Running   0          3m39s
istio-webhook-5f876d5c85-2hnvc      1/1     Running   0          44s
networking-istio-6bbc6b9664-shtd2   1/1     Running   0          44s

Setting up a custom domain

By default, Knative Serving uses example.com as the default domain. If you have set up K3s as per the instructions, you should have a load balancer installed. This means that with some setup, you can create a custom domain using a “magic” DNS service like sslip.io.

sslip.io is a service that returns that IP Address when queried with a hostname with an embedded IP address. For example, a URL such as 192.168.0.1.sslip.io will point to 192.168.0.1. This is excellent for experimenting where you don’t have to go buy your own domain name.

Go ahead and apply the following manifest:

kubectl apply -f https://storage.googleapis.com/knative-nightly/serving/latest/serving-default-domain.yaml

If you open the  serving-default-domain.yaml, you will notice the following in the spec:

# other parts truncated      spec:
    serviceAccountName: controller
    containers:
        - name: default-doma
          image: ko://knative.dev/serving/cmd/default-domain
          args: ["-magic-dns=sslip.io"]

This enables the “magic” DNS that you will use in the next step.

Testing that everything works

Download the kn binary. You can find the links here: https://knative.dev/development/client/install-kn/. Be sure to rename the binary  kn and place it somewhere in your $PATH. Once you get that sorted out, go ahead and create the sample Hello World service. I have already pushed the benjamintanweihao/helloworld-python image to Docker Hub:

% kn service create helloworld-python --image=docker.io/benjamintanweihao/helloworld-python --env TARGET="Python Sample v1"

This results in the following output:

Creating service 'helloworld-python' in namespace 'default':

  0.037s The Route is still working to reflect the latest desired specification.
  0.099s Configuration "helloworld-python" is waiting for a Revision to become ready.
 29.277s ...
 29.314s Ingress has not yet been reconciled.
 29.446s Waiting for load balancer to be ready
 29.605s Ready to serve.

Service 'helloworld-python' created to latest revision 'helloworld-python-00001' is available at URL:
http://helloworld-python.default.192.168.86.26.sslip.io

To list all the deployed Knative services in all namespaces, you can do:

% kn service  list -A

With kubectl, this becomes:

% kubectl get ksvc -A

To delete the service, it is as simple as:

kn service delete helloworld-python # or kubectl delete ksvc helloworld-python

If you haven’t done so, ensure the todo-app image has been pushed to DockerHub. (If you are unfamiliar with pushing images to DockerHub, then the DockerHub Quickstart is a great place). Remember to replace {username} with your DockerHub ID :

% docker push {username}/todo-app:latest

Once the image has been pushed, you can then use the kn command to create the TODO service. Remember to replace {username} with your DockerHub ID:

kn service create todo-app --image=docker.io/{username}/todo-app --env MONGO_HOST="mongodb-release.default.svc.cluster.local" 

If everything went well, you will see this:

Creating service 'todo-app' in namespace 'default':

  0.022s The Route is still working to reflect the latest desired specification.
  0.085s Configuration "todo-app" is waiting for a Revision to become ready.
  4.586s ...
  4.608s Ingress has not yet been reconciled.
  4.675s Waiting for load balancer to be ready
  4.974s Ready to serve.

Service 'todo-app' created to latest revision 'todo-app-00001' is available at URL:
http://todo-app.default.192.168.86.26.sslip.io

Now head over to http://todo-app.default.192.168.86.26.sslip.io (or whatever has been printed on the last line of the previous output) and you should see the application! Now take a step back and see what Knative has done for you.  Knative has spun up a service for you in a single command and given you a URL that you can access from your cluster.

I’ve barely scratched the surface with Knative, but I hope this motivates you to learn more about it! When I started looking at Knative, I didn’t quite understand what it did. Hopefully, the example sheds some light on the awesomeness of Knative and its convenience.

Conclusion

In this article, we took a whirlwind tour of taking a web application built in Python and requiring MongoDB and learned how to:

  1. Containerize the TODO application using Docker
  2. Use Docker to alleviate dependency hell
  3. Use Docker for development
  4. Use Docker Compose to package multiple containers
  5. Install K3s
  6. Install KNative (Serving) and Istio
  7. Use Helm to deploy MongoDB
  8. Use Knative to deploy the TODO application in a single line

While migrating an application to Kubernetes is certainly not a trivial task, containerizing your application usually gets you halfway there. Of course, there are still many things that weren’t covered, such as security and scaling.

K3s is an excellent platform to test and run Kubernetes workloads and is especially useful when running on a laptop/desktop.

I’ve barely scratched the surface with Knative, but I hope this motivates you to learn more about it! When I started looking at Knative, I didn’t quite understand what it did. Hopefully, the example sheds some light on the awesomeness of Knative and its conveniences. Indeed, one of the highlights of Knative is to “Stand up a scalable, secure, stateless service in seconds.” And as you can see, Knative delivers on that promise.

I will cover more about Knative and go deeper into its core features in a future article. I hope you can take what you have read here and adapt it to your applications!

A Developer’s Introduction to Buildpacks

Monday, 14 June, 2021

Compiling software is not glamorous but critical to every developer’s workflow. The process of compiling software has evolved over the decades, moving from developers building artifacts locally, to centralized build servers, to multistage Docker images — and now, to a relatively new process called buildpacks.

In this post, we’ll look at the history of compiling software and see how buildpacks have evolved to provide developers with an opinionated and convenient process for building their source code.

The History of Compiling Software

Not so long ago, it was common for a developer on a team to perform a release by compiling code on their local workstation, generating deployable artifacts, and performing a deployment using tools like FTP or RDP onto a server. This process had some obvious limitations: it was not reproducible, it was entirely manual, and it depended on an individual’s workstation to be properly configured with all the required platforms and scripts required to complete a build.

Continuous Integration Servers

The next logical step was to build servers, often called Continuous Integration (CI) servers. A CI server provides a source of truth where the process of testing and building code is centrally defined and managed. Traditionally, the build process is defined as a series of steps configured through a web console. However, those steps are more recently defined as code and checked in alongside the application source code. A CI server allows code changes to be continuously built by automating testing and building code as a shared Source Control Management (SCM) system receives new code commits.

However, CI servers still require a great deal of specialized configuration. You need to install Software Development Kits (SDKs) for the various languages used to write applications and project and package management tools. Nowadays, languages release new versions multiple times a year, which can burden CI servers to maintain multiple SDKs side by side to accommodate software projects that upgrade to new language versions at differing rates. In addition, the CI build steps require a particular CI server to execute them. So even if the step configuration has been expressed as code and checked in alongside the application code, only a particular CI server can execute those steps, and they are not generally reproducible on a developer’s local machine.

The Docker Solution

These limitations are neatly solved by using Docker. Each Docker image provides an isolated file system, allowing entire build chains to exist within their own independent Docker image. This means even software with no native support for installing and running multiple versions can exist side by side in their own Docker images. And by using multistage Docker builds, it is possible to define the steps and environment required to build application code with instructions that any server or workstation can execute with Docker installed.

However, the isolated nature of Docker images can present a challenge when building software. Today, most applications rely on dozens, if not hundreds, of external dependencies that must be downloaded from the internet. A native implementation of a Docker build script will result in these dependencies being downloaded with every build. It is not uncommon for an application to download hundreds of megabytes worth of dependencies, so compiling applications using Docker can be extremely inefficient without care.

Still, Docker is a compelling platform for several reasons: its ability to precisely define the environment where an application is built, allow multiple such definitions to painlessly coexist on a single machine and allow builds to be performed anywhere Docker is installed. The only thing missing is the ability to abstract away the highly specialized processes required to efficiently build software using Docker.

Introducing Buildpacks

This is where buildpacks come in. A buildpack implements an opinionated build process within a Docker container that takes care of all the fiddly aspects of managing SDKs, caching dependencies and reducing build times to create an executable Docker image from the supplied source code. This means developers can write their code as they always have (with no Docker build scripts), compile that code with a buildpack and have their compiled application embedded into an executable Docker image.

Buildpacks are:

  • Convenient: Their opinionated workflows require no special knowledge to use and are trivial to execute.
  • Efficient: They are specially designed to leverage Docker’s functionality, ensuring builds are as fast as possible.
  • Repeatable: Requiring only one application to be installed alongside Docker to build any number of languages.
  • Flexible: With an open specification allowing anyone to define their own build process.

Despite these benefits, buildpacks are not a complete replacement for your CI system. For a start, buildpacks only generate Docker images. If you deploy applications to a web or server, buildpacks won’t generate the traditional artifacts you need. You will likely execute buildpacks on a CI server to retain the benefits of a centralized source of truth.

To demonstrate just how powerful buildpacks are, let’s take a typical Java application with no Docker build configurations and create an executable Docker image.

Building a Sample Application

Petclinic is a sample Java Spring web application that has been lovingly maintained over the years to demonstrate the Spring platform. It represents the kind of code base you would find in many engineering departments. Although the git repository contains a docker-compose.yml file, this is only to run a MySQL database instance. Neither the application source code nor the build scripts provide any facility to create Docker images.

Clone the git repository with the command:

git clone https://github.com/spring-projects/spring-petclinic.git

To use buildpacks, ensure you have Docker installed.

To build the sample application with a buildpack, we’ll need to install what is known as a platform, which in our case is a CLI tool called pack. You can find installation instructions for pack here, with packages available for most major operating systems.

When you first run the pack, you will be prompted to configure a default builder. We’ll cover terminology like builder in subsequent posts. For now, we need to understand that a builder contains the buildpacks that compile our code and that companies like Heroku and Google, and groups like Paketo, provide several builders we can use.

Here is the output of the pack asking us to define a default builder:

Please select a default builder with:

pack config default-builder <builder-image>

Suggested builders:
Google: gcr.io/buildpacks/builder:v1 Ubuntu 18 base image with buildpacks for .NET, Go, Java, Node.js, and Python
Heroku: heroku/buildpacks:18 Base builder for Heroku-18 stack, based on ubuntu:18.04 base image
Heroku: heroku/buildpacks:20 Base builder for Heroku-20 stack, based on ubuntu:20.04 base image
Paketo Buildpacks: paketobuildpacks/builder:base Ubuntu bionic base image with buildpacks for Java, .NET Core, NodeJS, Go, Ruby, NGINX and Procfile
Paketo Buildpacks: paketobuildpacks/builder:full Ubuntu bionic base image with buildpacks for Java, .NET Core, NodeJS, Go, PHP, Ruby, Apache HTTPD, NGINX and Procfile
Paketo Buildpacks: paketobuildpacks/builder:tiny Tiny base image (bionic build image, distroless-like run image) with buildpacks for Java Native Image and Go

Tip: Learn more about a specific builder with:
pack builder inspect <builder-image>

We’ll make use of the Heroku Ubuntu 20.04 builder, which we configure with the command:

pack config default-builder heroku/buildpacks:20

Then, in the spring-petclinic directory, run the command:

pack build myimage

It is important to note that we do not need to have the Java Development Kit (JDK) or Maven installed for pack to build our source code. We also don’t need to tell pack that we are trying to build Java code. The Heroku builder (or any other builder you may be using) conveniently takes care of all of this for us.

The first time build runs, all of the application dependencies are downloaded. And there are a lot! It took around 30 minutes to complete the downloads on my home internet connection. Once these downloads are complete, the application is compiled and a Docker image called myimage is created. We can verify this by running the command:

docker image ls --filter reference=myimage

To run the Docker image, run the command:

docker run -p 8080:8080 myimage

The resulting web application has been exposed on port 8080, so we can access it via the URL http://localhost:8080. The result is a web page that looks like this:

It’s worth taking a moment to appreciate what we just achieved here. With a single command, we compiled source code that had no Docker configuration into an executable Docker image. We never needed to install any Java tooling or configure any Java settings.

To verify that the application dependencies were cached, run:

pack build myimage2

Notice this time that the build process completes much faster as all the downloads from the previous build are reused. This demonstrates how buildpacks provide an efficient build process.

The process we just ran through here is also easily repeated on any machine with Docker and the pack CLI installed. It would take very little to recreate this process in a CI server, meaning builds on a centralized build server and local developer’s machines work the same way.

Conclusion

The evolution of building software has seen engineering teams go from building on their local machines to building via a CI server to multistage Docker builds. Buildpacks take the best ideas from all of these practices to provide a build process that is identical whether run on a developer’s local machine or a CI server. They take advantage of the isolation and reproducibility of Docker without the overhead of forcing every developer to craft a best-practice Docker build script.

This post demonstrated how you can use publicly available buildpacks to quickly compile a traditional Java application into an executable Docker image.

Want to customize your build experience? Stay tuned for the next blog post, where we create a simple buildpack of our own to compile a Java application with Maven.

Tags: ,, Category: Community page, Containers Comments closed

Stupid Simple Kubernetes: Everything You Need to Know to Start Using Kubernetes Part 4

Monday, 24 May, 2021

In the era of MicroservicesCloud Computing and Serverless architecture, it’s useful to understand Kubernetes and learn how to use it. However, the official Kubernetes documentation can be hard to decipher, especially for newcomers. In this blog series, I will present a simplified view of Kubernetes and give examples of how to use it for deploying microservices using different cloud providers, including AzureAmazonGoogle Cloud and even IBM.

In this first article, we’ll talk about the most important concepts used in Kubernetes. Later in the series, we’ll learn how to write configuration files, use Helm as a package manager, create a cloud infrastructure, easily orchestrate our services using Kubernetes and create a CI/CD pipeline to automate the whole workflow. With this information, you can spin up any kind of project and create a solid infrastructure/architecture.

First, I’d like to mention that containers have multiple benefits, from increased deployment velocity to delivery consistency with a greater horizontal scale. Even so, you should not use containers for everything because just putting any part of your application in a container comes with overhead, like maintaining a container orchestration layer. So, don’t jump to conclusions. Instead, create a cost/benefit analysis at the start of the project.

Now, let’s start our journey in the world of Kubernetes.

Kubernetes Hardware Structure

Nodes

Nodes are worker machines in Kubernetes, which can be any device with CPU and RAM. For example, a node can be anything, from a smartwatch, smartphone, or laptop to a Raspberry Pi. A node is a virtual machine (VM) when we work with cloud providers. So, a node is an abstraction over a single device.

As you will see in the next articles, the beauty of this abstraction is that we don’t need to know the underlying hardware structure. We will use nodes; this way, our infrastructure is platform-independent.

Cluster

cluster is a group of nodes. When you deploy programs onto the cluster, it automatically handles the distribution of work to the individual nodes. If more resources are required (for example, we need more memory), new nodes can be added to the cluster, and the work will be redistributed automatically.

We run our code on a cluster and shouldn’t care about which node. The distribution of the work is automatic.

Persistent Volumes

Because our code can be relocated from one node to another (for example, a node doesn’t have enough memory, so the work is rescheduled on a different node with enough memory), data saved on a node is volatile. But there are cases when we want to save our data persistently. In this case, we should use Persistent Volumes. A persistent volume is like an external hard drive; you can plug it in and save your data on it.

Google developed Kubernetes as a platform for stateless applications with persistent data stored elsewhere. As the project matured, many organizations wanted to leverage it for their stateful applications, so the developers added persistent volume management. Much like the early days of virtualization, database servers are not typically the first group of servers to move into this new architecture. That’s because the database is the core of many applications and may contain valuable information, so on-premises database systems still largely run in VMs or physical servers.

So, the question is, when should we use Persistent Volumes? First, we should understand the different types of database applications to answer that question.

We can classify the data management solutions into two classes:

  1. Vertically scalable — includes traditional RDMS solutions such as MySQL, PostgreSQL and SQL Server
  2. Horizontally scalable — includes “NoSQL” solutions such as ElasticSearch or Hadoop-based solutions

Vertical scalable solutions like MySQL, Postgres and Microsoft SQL should not go in containers. These database platforms require high I/O, shared disks, block storage, etc., and do not (by design) handle the loss of a node in a cluster gracefully, which often happens in a container-based ecosystem.

Use containers for horizontally scalable applications (Elastic, Cassandra, Kafka, etc.). They can withstand the loss of a node in the database cluster, and the database application can independently rebalance.

Usually, you can and should containerize distributed databases that use redundant storage techniques and can withstand a node’s loss in the database cluster (ElasticSearch is a good example).

Kubernetes Software Components

Container

One of the goals of modern software development is to keep applications on the same host or cluster isolatedVirtual machines are one solution to this problemBut virtual machines require their own OS, so they are typically gigabytes in size.

Containers, by contrast, isolate application execution environments from one another but share the underlying OS kernel. So, a container is like a box where we store everything needed to run an application: code, runtime, system tools, system libraries, settings, etc. They’re typically measured in megabytes, use far fewer resources than VMs and start up almost immediately.

Pods

pod is a group of containers. In Kubernetes, the smallest unit of work is a pod. A pod can contain multiple containers, but usually, we use one container per pod because the replication unit in Kubernetes is the pod. If we want to scale each container independently, we add one container in a pod.

Deployments

The primary role of deployment is to provide declarative updates to both the pod and the ReplicaSet (a set in which the same pod is replicated multiple times). Using the deployment, we can specify how many replicas of the same pod should be running at any time. The deployment is like a manager for the pods; it automatically spins up the number of pods requested, monitors the pods and recreates them in case of failure. Deployments are helpful because you don’t have to create and manage each pod separately.

We usually use deployments for stateless applications. However, you can save the deployment state by attaching a Persistent Volume to it and make it stateful.

Stateful Sets

StatefulSet is a new concept in Kubernetes, and it is a resource used to manage stateful applications. It manages the deployment and scaling of a set of pods and guarantees these pods’ ordering and uniqueness. It is similar to deployment; the only difference is that the deployment creates a set of pods with random pod names and the order of the pods is not important, while the StatefulSet creates pods with a unique naming convention and order. So, if you want to create three replicas of a pod called example, the StatefulSet will create pods with the following names: example-0, example-1, example-2. In this case, the most important benefit is that you can rely on the name of the pods.

DaemonSets

DaemonSet ensures that the pod runs on all the cluster nodes. If a node is added/removed from a cluster, DaemonSet automatically adds/deletes the pod. This is useful for monitoring and logging because you can monitor every node without having to manually monitor the cluster.

Services

While deployment is responsible for keeping a set of pods running, the service is responsible for enabling network access to a set of pods. Services provide standardized features across the cluster: load balancing, service discovery between applications and zero-downtime application deployments. Each service has a unique IP address and a DNS hostname. Applications that consume a service can be manually configured to use either the IP address or the hostname and the traffic will be load balanced to the correct pods. In the External Traffic section, we will learn more about the service types and how we can communicate between our internal services and the external world.

ConfigMaps

If you want to deploy to multiple environments, like staging, dev and prod, it’s a bad practice to bake the configs into the application because of environmental differences. Ideally, you’ll want to separate configurations to match the deploy environment. This is where ConfigMap comes into play. ConfigMaps allow you to decouple configuration artifacts from image content to keep containerized applications portable.

External Traffic

Now that you’ve got the services running in your cluster, how do you get external traffic into your cluster? There are three different service types for handling external traffic: ClusterIPNodePort and LoadBalancer. The 4th solution is adding another abstraction layer, called Ingress Controller.

ClusterIP

ClusterIP is the default service type in Kubernetes and lets you communicate with other services inside your cluster. While ClusterIP is not meant for external access, with a little hack using a proxy, external traffic can hit our service. Don’t use this solution in production, but only for debugging. Services declared as ClusterIP should NOT be directly visible from the outside.

NodePort

As we saw in the first part of this article, pods are running on nodes. Nodes can be different devices, like laptops or virtual machines (when working in the cloud). Each node has a fixed IP address. By declaring a service as NodePort, the service will expose the node’s IP address so that you can access it from the outside. You can use NodePort in production, but for large applications, where you have many services, manually managing all the different IP addresses can be cumbersome.

LoadBalancer

Declaring a service of type LoadBalancer exposes it externally using a cloud provider’s load balancer. How the external load balancer routes traffic to the Service pods depends on the cluster provider. With this solution, you don’t have to manage all the IP addresses of every cluster node, but you will have one load balancer per service. The downside is that every service has a separate load balancer and you will be billed per load balancer instance.

This solution is good for production but can be a bit expensive. Let’s look at a less expensive solution.

Ingress

Ingress is not a service but an API object that manages external access to a cluster’s services. It acts as a reverse proxy and single entry-point to your cluster that routes the request to different services. I usually use NGINX Ingress Controller, which takes on reverse proxy while also functioning as SSL. The best production-ready solution to expose the ingress is to use a load balancer.

With this solution, you can expose any number of services using a single load balancer to keep your bills as low as possible.

Next Steps

In this article, we learned about the basic concepts used in Kubernetes and its hardware structure. We also discussed the different software components including PodsDeploymentsStatefulSets and Services, and saw how to communicate between services and with the outside world.

In the next article, we’ll set up a cluster on Azure and create an infrastructure with a LoadBalanceran Ingress Controller and two Services and use two Deployments to spin up three Pods per Service.

There is another ongoing “Stupid Simple AI” series. Find the first two articles here: SVM and Kernel SVM and KNN in Python.

Thank you for reading this article!

Beyond Docker: A Look at Alternatives to Container Management

Monday, 24 May, 2021

A deep dive into container stacks and the choices the ecosystem provides

Docker appeared in 2013 and popularized the idea of containers to the point that most people still equate the notion of a container to a “Docker container.”

Being first in its category, Docker set some standards that newcomers must adhere to. For example, there is a large repository of Docker system images. All of the alternatives had to use the same image format while trying, at the same time, to change one or more parts of the entire stack on which Docker was based.

In the meantime, new container standards appeared, and the container ecosystem grew in different directions. Now there are many ways to work with containers besides Docker.

In this blog post, I will

  • introduce chrootcgroups and namespaces as the technical foundation of containers
  • define the software stack that Docker is based upon
  • state the standards that Docker and Kubernetes adhere to and then
  • describe alternative solutions which try to replace the original Docker containers with better and more secure components.

Software Stack for Containers

Linux features such as chroot calls, cgroups and namespaces help containers run in isolation from all other processes and thus guarantee safety during runtime.

Chroot

All Docker-like technologies have their roots in a root directory of a Unix-like operating system (OS). Above the root directory is a root file system and other directories.

On Linux, root directory is both the basis of file system and the start of all other directories. This is dangerous in the long term, as any unwanted deletion in the root directory affects the entire OS. That’s why a system call chroot() exists. It creates additional root directories, such as one to run legacy software, another to contain databases, etc.

To all those environments, chroot appears to be a true root directory, but in reality, it just prepends pathnames to any name starting with. The real root directory still exists; any process can refer to any location beyond the designated root.

Linux cgroups

Control groups (cgroups) have been a feature of the Linux kernel since version 2.6.24 in 2008. A cgroup will limit, isolate and measure usage of system resources (memory, CPU, network and I/O) for several processes at once.

Let’s say we want to prevent our users from sending many email messages from the server. We create a cgroup with memory limit of 1GB and 50 percent of CPU and add the application process id to the group. The system will throttle down the email-sending process when these limits are reached. It may even kill the process, depending on the hosting strategy.

Namespaces

Linux namespace is another useful abstraction layer. A namespace allows us to have many process hierarchies, each with its own nested “subtree.” A namespace can use a global resource and present it to its members as if it were their own resource.

Here’s an example. A Linux system starts with a process identifier (PID) of 1 and all other processes will be contained in its tree. PID namespace allows us to span a new tree, with its own PID 1 process. There are now two PIDs with the value of 1. Each namespace can spawn its own namespaces and the same process can have several PIDs attached to it.

A process in a child namespace will have no idea of the parent’s process existence, while the parent namespace will have access to the entire child namespace.

There are seven types of namespaces: cgroup, IPC, network, mount, PID, user and UTS.

Network Namespace

Some resources are scarce. By convention, some ports have predefined roles and should not be used for anything else: port 80 only serves HTTP calls, port 443 only serves HTTPS calls and so on. In a shared hosting environment, two or more sites can listen to HTTP requests from port 80. The one that first got hold of it, would not let any other app access the data on that port. That first app would be visible on the Internet, while all the others would be invisible.

The solution is to use network namespaces, with which inner processes will see different network interfaces.

In one network namespace, the same port can be open, while in another, it may be shut down. For this to work, we must adopt additional “virtual” network interfaces, which belong to several namespaces simultaneously. There also must be a router process somewhere in the middle, to connect requests coming to a physical device to the appropriate namespace and the process in it.

Complicated? Yes! That’s why Docker and similar tools are so popular. Let’s now introduce Docker and see how it compares to its alternatives.

Docker: Containers Everyone!

Before containers came to rule the world of cloud computing, virtual machines were quite popular. If you have a Windows machine but want to develop mobile apps for iOS, you can either buy a new Mac (expensive but excellent solution) or install its virtual machine onto the Windows hardware (a cheap but slow and unreliable solution). VMs can also be clumsy, they often gobble up resources that they do not need and are usually slow to start (up to a minute).

Enter containers.

Containers are standard units of software that have everything needed for the program to run: the operating system, databases, images, icons, software libraries, code and everything else. A container also runs in isolation from all other containers and even from the OS itself. Containers are lightweight compared to VMs, so they can start fast and are easily replaced.

To run isolated and protected, containers are based on chroot, cgroups and namespaces.

The image of a container is a template from which the application is formed on the actual machine. Creating as many containers as needed from a single image is possible. A text document called Dockerfile contains all the information needed to assemble an image.

The true revolution that Docker brought was creation of a registry of docker images and the development of Docker engine, with which those images ran everywhere in the same manner. Being the first and widely adopted, an implicit world standard for container images was formed and all eventual competitors had to pay attention to it.

CRI and OCI

Open Container Initiative or OCI  publishes specifications for images and containers. It was started in 2015 by Docker and was accepted by Microsoft, Facebook, Intel, VMWare, Oracle and many other industry giants.

OCI also implements the specification It is called runc and deals directly with containers, creates them, runs them and so on.

Container Runtime Interface or CRI is a Kubernetes API that defines how Kubernetes interacts with container runtimes. It also is standardized so you can choose which CRI implementation to adopt.

Software Stack for Containers with CRI and OCI

The software stack that runs containers will have Linux as its most basic part:

Note that containerd and CRI-O both adhere to the CRI and OCI specifications. For Kubernetes, it means that it can use either containerd or CRI-O without the user ever noticing the difference. It can also use any of the other alternatives that we are now going to mention – which was exactly the goal when software standards such as OCI and CRI were created and adopted.

Docker Software Stack

The software stack for Docker is

— docker-cli, Docker command line interface for developers

— containerd, originally written by Docker, later spun off as an independent project; it implements the CRI specification

— runc, which implements the OCI spec

— containers (using chroot, cgroups, namespaces, etc.)

The software stack for Kubernetes is almost the same; instead of containerd, Kubernetes uses CRI-O, a CRI implementation created by Red Hat/IBM and others.

containerd

containerd runs as a daemon on Linux and Windows. It loads images, executes them as containers, supervises low-level storage and takes care of the entire container runtime and lifecycle.

Containerd started as a part of Docker in 2014 and in 2017 became a part of Cloud Native Computing Foundation (CNCF). The CNCF is a vendor-neutral home for Kubernetes, Prometheus, Envoy, containerd, CRI-O, podman and other cloud-based software.

runc

runc is a reference implementation for the OCI specification. It creates and runs containers and the processes within them. It uses lower-level Linux features, such as cgroups and namespaces.

Alternatives to runc include kata-runtime, gVisor and CRI-O.

kata-runtime implements the OCI specification using hardware virtualization as individual lightweight VMs. Its runtime is compatible with OCI, CRI-O and containerd, so it works seamlessly with Docker and Kubernetes.

gVisor from Google creates containers that have their own kernel. It implements OCI through a program called runsc, which integrates with Docker and Kubernetes. A container with its own kernel is more secure than without, but it is not a panacea, and there is a penalty to pay in resource usage with that approach.

CRI-O, a container stack designed purely for Kubernetes, was the first implementation of the CRI standard. It pulls images from any container registry and serves as a lightweight alternative to using Docker.

Today it supports runc and Kata Containers as the container runtimes, but any other OC-compatible runtime can also be plugged in (at least, in theory).

It is a CNCF incubating project.

Podman

Podman is a daemon-less Docker alternative. Its commands are intentionally as compatible with Docker as possible, so you can make an alias and start using word “docker” instead of “podman” in a CLI interface.

Podman aims to replace Docker, so sticking to the same set of commands makes sense. Podman tries to improve on two problems in Docker.

First, Docker is always executing with an internal daemon. The daemon is single process, running in the background. If it fails, the whole system will fail.

Second, Docker runs as a background process with root privileges, so when you give access to a new user, you are actually giving access to the entire server.

Podman is a remote Linux client that runs containers directly from the operating system. You also can run them completely rootless.

It downloads images from DockerHub and runs them in exactly the same way as Docker, with exactly the same commands.

Podman runs the commands and the images as user other than root, so it is more secure than Docker.

On the other hand, many tools developed for Docker are not available on Podman, such as Portainer and Watchtower. Moving away from Docker means sacrificing your established workflow.

Podman has a similar directory structure to buildahskopeo and CRI-I. Its pods are also very similar to Kubernetes pods.

Developed by RedHat, Podman is a player to watch in this space.

Honorable Mention: LXC/LXD

Introduced in 2008, LXC (LinuX Containers) stack was the first upstream-kernel container on Linux. The first version of Docker used LXC but in a later development, they moved away, having implemented runc.

The goal of LXC is to run multiple isolated Linux virtual environments on a control host using a single Linux kernel. To that end, it uses cgroups functionality without needing to start any virtual machines; it also uses namespaces to completely isolate the application from the underlying system.

LXC aims to create system containers, almost like you would have in a virtual machine – but without the overhead that comes from trying to emulate the entire virtualized hardware.

LXC does not emulate hardware and packages but contains only the needed applications, so it executes almost at the bare metal speed. In contrast, virtual machines contain the entire OS, then emulate hardware such as hard drives, virtual processor and network interfaces.

So, LXC is small and fast while VMs are big and slow. On the other hand, virtual environments cannot be packaged into ready-made and quickly deployable machines and are difficult to manage through GUI management consoles. LXC requires high technical skills, and the result may be an optimized machine that is incompatible with other environments.

LXC vs Docker Approach

LXC Is like a supercharged chroot on Linux and produces “small” servers that boot faster and need less RAM. Docker, however, offers much more:

  • Portable deployment across machines: the object that you create with one version of Docker can be transferred and installed onto any other Docker-enabled Linux host.
  • Versioning: Docker can track versions in a git-like manner – you can create new versions of a container, roll them back and so on.
  • Reusing components: With Docker, you can stack already created packages into new packages. If you want a LAMP environment, you can install its components once and then reuse them as an already pre-made LAMP image.
  • Docker image archive: hundreds of thousands of Docker images can be downloaded from dedicated sites, and it is very easy to upload a new image to one such repository.

Finally, LXC is geared toward system admins while Docker is more geared to developers. That’s why Docker is more popular.

LXD

LXD has a privileged daemon that exposes a REST API over a local UNIX socket and over the network (if enabled). You can access it through a command line tool, but it always communicates with REST API calls. It will always function the same whether the client is on your local machine or somewhere on a remote server.

LXD can scale from one local machine to several thousand remote machines. Like Docker, it is image-based, with images available for the more popular Linux distributions. Canonical, the company that owns Ubuntu, is financing the development of LXD, so it will always run on the latest versions of Ubuntu and other similar Linux operating systems.

LXD integrates seamlessly with OpenNebula and OpenStack standards.

Technically, LXD is written “on top” of LXC (both are using the same liblxc library and Go language to create containers) but the goal of LXD is to improve user experience compared to LXC.

Docker Forever or Not?

Docker boasts 11 million developers, 7 million applications and 13 billion monthly image downloads. To say that Docker is still the leader would be an understatement. However, this article shows that replacing one or more parts of the Docker software stack is possible, often without compatibility problems. Alternatives do exist, with security as the main goal compared to what Docker offers.

Introducing Kubewarden, an Open Source Policy Engine

Sunday, 16 May, 2021

Kubewarden

Security has always been a wide and complex topic. A recent survey from StackRox about the state of containers and Kubernetes security provides some interesting data on these topics. In this blog post, I’ll dive into some of the findings in that survey and introduce you to Kubewarden, an open source policy engine.

Security Measures and Skills are Lacking

A staggering 66 percent of the survey participants do not feel confident enough in the security measures they have in place. Their companies are investing energy and resources in the creation of DevSecOps roles to address this problem, but unfortunately, this task is not progressing as smoothly as planned.

Looking more into the survey, we discover that many companies are struggling with a shortage of skills. Not many professionals are proficient with both security and cloud native topics. Moreover, growing these competences takes quite some time because there’s a steep learning path to master both of them.

There are many other interesting data points in these survey results. However, I’d like to highlight one last metric: only 16 percent of the participants are implementing security policies as code.

This number surprised me because the trend of “Everything-as-Code” is nothing new – in fact, it’s quite the opposite. This well-established pattern is changing many parts of the IT industry and practices. As a matter of fact, there are already some projects in the Kubernetes ecosystem that are addressing the topic of policy-as-code.

I have advocated the use of some of these projects in the past, but I have to admit that I never spent too much time trying to create new policies from scratch. I embarked on this journey and, after some trial and error, I came to better understand the pain points of StackRox’s survey participants.

Challenges with Writing Security Policies

Creating policy-as-code with the current policy frameworks requires a significant investment in time. There’s a reasonable amount of documentation, but the majority of it focuses on simple policies. I personally found it difficult to write non-trivial ones.

Policy as code is just a matter of writing validation logic that processes some input data. During this learning journey, I ended up many times in situations where I knew exactly how the validation logic should be written using a regular programming language, but I found myself mentally stuck: I didn’t know how to translate all of that in the domain-specific language imposed by the policy framework.

Several times I found myself searching Google and Stack Overflow for how to write what I would consider trivial code using a regular programming language. That made me feel pretty frustrated, I wasn’t progressing as fast as I would have liked.

With this first-hand experience, I reached out to customers and people from the field. It turns out that they face the same challenges. They see the benefits of having policies as code, but they struggle in writing them.

Some companies are trying to address this topic by growing experts among their ranks. This is something that takes time and, worse of all, it doesn’t scale. These companies have only a handful of people who know how to write and review policies. All the “policy as code” work has to be delivered by them. As a result, these DevSecOps folks are under high stress and represent a bottleneck.

Other companies have outsourced the process of writing policies to external consultants. This approach has its own drawbacks. Writing these policies requires knowledge of internal processes and applications/infrastructure operational details. Explaining all of that takes time, and this has to be done every time something changes. Finally, these companies have a hard time reviewing the policies provided by the external consultants because, obviously, they lack the skills needed to understand them. Hence, they have to blindly trust these policies.

A Way Around the Security Policy Learning Curve

As I learned, the biggest obstacle for a policy author is the steep learning curve needed to write policies. It takes time to become comfortable with the coding paradigms that existing solutions impose – especially because these paradigms are different from what developers are used to.

Wouldn’t it be great to be able to reuse existing knowledge? If only there was a way to write policy as code using a programming language of your choice. If that was possible, suddenly teams who want to write policies as code would be able to tap into their existing skills and significantly reduce the barrier to entry.

These and more are the questions that lead to the creation of the Kubewarden project.

Introducing Kubewarden: An Open Source Policy Engine for Kubernetes

Kubewarden is an open-source policy engine. It integrates with Kubernetes using the widely adopted Webhook Admission Control mechanism. The project provides a set of Kubernetes Custom Resources that simplify the process of enforcing policies on a cluster. So far, this sounds like existing solutions, right? Kubewarden differentiates itself in the way it creates, distributes and executes policies.

For starters, Kubewarden policies can be written in almost any programming language. This is possible because Kubewarden leverages the power of WebAssembly.

How WebAssembly Works

If you’re not familiar with WebAssembly, it’s a portable binary execution format. To put it in layman’s terms, WebAssembly is a compilation target for many programming languages. That means you can compile some source code and, instead of having a Linux/Windows/macOS executable or library, you end up having a so-called WebAssembly module. Then you can execute this binary artifact using a dedicated runtime on the platform and operating system of your choice.

This is summarized by the drawing below:

Diagram

WebAssembly started as a solution to expand the capabilities of modern web browsers. However, it’s gone beyond the browser, with lots of interesting use cases emerging.

Creating, Building and Running Kubewarden Security Policies

Kubewarden is one of these emerging use cases: its policies can be written in any programming language that can be compiled into a WebAssembly module.

Kubewarden policies are truly portable binary artifacts. Thanks to the power of WebAssembly, you can build your policy on a macOS host powered by an Apple Silicon chip, and then deploy the resulting artifact on top of a Kubernetes cluster made of x86_64 Linux nodes.

On top of being polyglot and portable, WebAssembly is also secure. WebAssembly was originally conceived as a way to enrich web applications. When browsers have to run WebAssembly modules downloaded from the web on an end-user computer, this can be a huge security risk. That’s why, by design, WebAssembly modules are executed inside of dedicated sandboxes. These sandboxes put tight constraints on the WebAssembly module execution by limiting access to the host memory, filesystem, devices and more. Even though this sounds similar to how containers operate, the level of isolation and the limits placed on WebAssembly programs is far more stringent.

Going back to Kubewarden, all our policies are WebAssembly modules that are loaded and run by our policy server. Each policy lives inside its own sandbox, with no access to the host environment. The policy server receives the AdmissionReview requests originated by Kubernetes and then uses the relevant policies to evaluate them.

Distributing Security Policies with Kubewarden

So far, we covered creating, building and running Kubewarden policies. But what about policy distribution? Kubewarden has two solutions to this problem.

The first approach consists of hosting the Kubewarden policy binary on a regular web server. This is a pretty straightforward approach that can work well in many circumstances. The other approach, which happens to be my favorite, leverages regular container registries as a way to store and distribute policies.

Kubewarden policies can be pushed and pulled to/from container registries as OCI artifacts. If you are not familiar with the concept of OCI artifact, there’s nothing too new to learn about it. This is just a way to increase the flexibility of regular container registries by allowing them to store other kinds of artifacts in addition to regular container images. The majority of container registries, like those from Amazon, Azure, Google and GitHub offer this ability out of the box. The same applies to many other self-hosted solutions.

Everybody has one or more container registries in place to serve the container images consumed by Kubernetes. These registries are also secured with custom made access rules. Storing Kubewarden policies allows you to reuse this infrastructure, both from an operational and security point of view. This is why it’s my favorite distribution mechanism for Kubewarden policies.

I hope this overview of Kubewarden sparked your interest. As a Kubernetes administrator, getting started with Kubewarden is just a Helm chart away. Visit our documentation and follow our quickstart guide. Finally, don’t forget to check out our Policy Hub, where you can find ready-to-use Kubewarden policies.

If you are an aspiring policy author, go to our documentation and follow one of the step-by-step tutorials that guides you through the process of creating your first policy. Right now, we have policies written with Rust, Go, AssemblyScript and Swift. The WebAssembly ecosystem is growing bigger day by day, with more languages supported or underway.

Whether you are a Kubernetes administrator or a policy author, we are looking forward to knowing what you achieved with Kubewarden and what we can do to further smooth your security journey in the cloud native world. Let us know in the comments below.

Finally, if you are planning to attend KubeCon Europe 2021, don’t miss these talks about Kubewarden:

Last but not least, I’ll be in the Rancher Virtual booth during the event. I’m looking forward to meeting you!

Rancher Recognized as a Leader in Latest Forrester Wave

Wednesday, 5 May, 2021
Rancher Recognized as a Leader in Latest Forrester Wave
READ THE REPORT: FORRESTER WAVE™: MULTICLOUD CONTAINER DEVELOPMENT PLATFORMS, Q3, 2020      READ MORE

The enterprise Kubernetes management space has definitely become a lot more crowded over the past two years as traditional vendors and startups alike attempt to grab a slice of this massive market. The increasingly competitive vendor landscape makes Forrester’s recent recognition of Rancher Labs that much more meaningful.

This week, Forrester Research named Rancher Labs a “Leader” in The Forrester Wave™: Multicloud Container Development Platforms, Q3, 2020 report. The report explains how cloud-native technologies like containers and Kubernetes are becoming the preferred tools to build new software experiences and modernize existing apps at scale and across clouds. The report then highlights Rancher’s leadership among eight significant enterprise container platform vendors and one of only three providers recognized as Leaders, based on Forrester’s evaluation of 29 criteria measuring the strength of each vendors’ current offering, strategy and market presence.

Why Rancher is a Leader in Multi-Cloud Container Development

Some of the highlights from the report include:

  • Rancher is 100 percent open source, embraces and extends native public cloud container services, focuses on intuitive, simplified DevOps automation and specializes in helping companies operate Kubernetes at massive scale and in edge computing scenarios.
  • Rancher includes a comprehensive application catalog, centralized visibility and control of distributed clusters, service mesh innovations, built-in CI/CD pipelines, broad runtime and registry support, and very strong ‘Day 2’ cluster operations.
  • Reference customers praised Rancher for our “comprehensive application catalog coverage, broad public cloud infrastructure integrations, strong participation in the cloud-native open source community, excellent customer support and rock-solid stability combined with fast time to value.”

As you may recall, Rancher was previously named a leader in the Forrester Wave™: Enterprise Container Platform Software Suites, Q4 2018. To understand the foundation of our success and continued market leadership, you need to have a clear picture of how the Kubernetes market has evolved and keep in mind the adage, “Begin with the end in mind.” When Kubernetes first emerged out of Google, it was designed for managing large, complex, homogeneous clusters. It was also data center-centric and not designed for cloud infrastructure. The early value of Kubernetes was in a vendor’s certified Kubernetes distribution and a vendor’s differentiation was delivered through a vertically integrated, proprietary solution stack. Today, many vendors still see this as the correct approach to Kubernetes, but at Rancher, we believe this approach leads to higher cost, increased complexity and vendor lock-in.

Rancher’s No Lock-in Approach to Kubernetes

Rancher has always had a clear vision of enterprise Kubernetes management as a means to enabling a computing everywhere experience based upon an open approach to open source software. That’s why we designed Rancher from Day 1 to support heterogeneous infrastructure, including hybrid cloud, multi-cloud and edge. We also recognized that Kubernetes distributions would rapidly commoditize and, therefore, have focused our value on management simplicity and consistency across multiple, certified clusters from any vendor, at massive scale.

So, unlike competitors who still see Kubernetes as a means to lock customers into a monolithic, legacy software stack, we see the value of Kubernetes as a means to provide a consistent, compute everywhere platform across data center, cloud and edge. We are grateful to Forrester for acknowledging our unique approach to enterprise Kubernetes management in the most recent Forrester Wave, highlighting:

  • Rancher’s ability to simplify multicloud Kubernetes management at scale.
  • Rancher’s pervasive deployment across a broad range of enterprise and cloud-native companies, who are using it to run large-scale clusters across many public and on-premises platforms.
  • Rancher as an ideal solution for firms seeking a proven multi-cloud container management platform available across a wide variety of cloud platforms and edge environments.

Being recognized as a leader by Forrester Research is strong validation of our computing everywhere vision and our continued investment in innovation. As containerized applications continue to proliferate across on-premises, cloud, and Edge environments, Rancher’s “begin with the end and mind” approach and focus on openness, innovation, and simplicity will continue to differentiate us from the competition.

READ THE REPORT: FORRESTER WAVE™: MULTICLOUD CONTAINER DEVELOPMENT PLATFORMS, Q3, 2020
Category: Rancher Kubernetes Comments closed

Stupid Simple Kubernetes Part 1

Wednesday, 5 May, 2021

In the era of MicroservicesCloud Computing, and Serverless architecture, it is very useful to understand Kubernetes and learn how to use it. However, the official documentation of Kubernetes can be hard to decipher, especially for newcomers. In the following series of articles, I will try to present a simplified view of Kubernetes and give examples of how to use it for deploying microservices using different cloud providers like AzureAmazonGoogle Cloud, and even IBM.

In this first article of the series, we will talk about the most important concepts used in Kubernetes. In the following article, we will learn how to write configuration files, use Helm as a package manager, create a cloud infrastructure, and easily orchestrate our services using Kubernetes. In the last article, we will create a CI/CD pipeline to automate the whole workflow. You can spin up any project and create a solid infrastructure/architecture with this information.

Before starting, I would like to mention that containers have multiple benefits, from increased deployment velocity to the consistency of delivery with a greater horizontal scale. Even so, you should not use containers for everything because just putting any part of your application in a container comes with overhead like maintaining a container orchestration layer. So don’t jump to conclusions; instead, at the start of the project, please create a cost/benefit analysis.

Now, let’s start our journey in the world of Kubernetes!

Hardware

Nodes

Nodes are worker machines in Kubernetes, which can be any device with CPU and RAM. For example, a node can be anything, from smartwatches to smartphones, laptops, or even a RaspberryPi. When we work with cloud providers, a node is a virtual machine. So a node is an abstraction over a single device.

As you will see in the next articles, the beauty of this abstraction is that we don’t need to know the underlying hardware structure; we will use nodes, so our infrastructure will be platform-independent.

Node

Cluster

cluster is a group of nodes. When you deploy programs onto the cluster, it automatically handles the distribution of work to the individual nodes. If more resources are required (for example we need more memory), new nodes can be added to the cluster, and the work will be redistributed automatically.

We run our code on a cluster, and we shouldn’t care about which node; the work distribution will be automatically handled.

Cluster

Persistent Volumes

Because our code can be relocated from one node to another (for example a node doesn’t have enough memory so that the work will be rescheduled on a different node with enough memory), data saved on a node is volatile. But there are cases when we want to save our data persistently. In this case, we should use Persistent Volumes. A persistent volume is like an external hard drive, you can plug it in and save your data on it.

Kubernetes was originally developed as a platform for stateless applications with persistent data stored elsewhere. As the project matured, many organizations also wanted to begin leveraging it for their stateful applications, so persistent volume management was added. Much like the early days of virtualization, database servers are not typically the first group of servers to move into this new architecture. The reason is that the database is the core of many applications and may contain valuable information so on-premises database systems still largely run in VMs or physical servers.

So the question is, when should we use Persistent Volumes? First, we should understand the different types of database applications to answer that question.

We can classify the data management solutions into two classes:

  1. Vertically scalable — includes traditional RDMS solutions such as MySQL, PostgreSQL, and SQL Server.
  2. Horizontally scalable — includes “NoSQL” solutions such as ElasticSearch or Hadoop-based solutions.

Vertical scalable solutions like MySQL, Postgres, Microsoft SQL, etc., should not go in containers. These database platforms require high I/O, shared disks, block storage, etc., and were not designed to handle the loss of a node in a cluster gracefully, which often happens in a container-based ecosystem.

For horizontally scalable applications (Elastic, Cassandra, Kafka, etc.), containers should be used because they can withstand the loss of a node in the database cluster and the database application can independently re-balance.

Usually, you can and should containerize distributed databases that use redundant storage techniques and withstand the loss of a node in the database cluster (ElasticSearch is a really good example).

Software

Container

One of the goals of modern software development is to keep applications on the same host or cluster isolated from one another. One solution to this problem has been virtual machines. But virtual machines require their own OS, so they are typically gigabytes in size.

Containers, by contrast, isolate applications’ execution environments from one another but share the underlying OS kernel. So a container is like a box in which we store everything needed to run an application, like code, runtime, system tools, system libraries, and settings. They’re typically measured in megabytes, use far fewer resources than VMs, and start up almost immediately.

Pods

pod is a group of containers. In Kubernetes, the smallest unit of work is a pod. A pod can contain multiple containers, but usually, we use one container per pod because the replication unit in Kubernetes is the pod. So if we want to scale each container independently, we add one container in a pod.

Deployments

The main role of deployment is to provide declarative updates to both the pod and the Replica Set (a set in which the same pod is replicated multiple times). Using the deployment, we can specify how many replicas of the same pod should be running at any time. The deployment is like a manager for the pods, it will automatically spin up the number of pods requested, it will monitor the pods, and re-create the pods in case of failure. Deployments are really helpful because you don’t have to create and manage each pod separately.

Deployments are usually used for stateless applications. However, you can save the deployment state by attaching a Persistent Volume to it and making it stateful.

Stateful Sets

StatefulSet is a new concept in Kubernetes, and it is a resource used to manage stateful applications. It manages the deployment and scaling of pods and guarantees these pods’ ordering and uniqueness. It is similar to Deployment, the only difference is that the Deployment creates a set of pods with random pod names and the order of the pods is not important, while the StatefulSet creates pods with a unique naming convention and order. So if you want to create three replicas of a pod called example, the StatefulSet will create pods with the following names: example-0, example-1, example-2. In this case, the most important benefit is that you can rely on the name of the pods.

Daemon Sets

DaemonSet ensures that the pod runs on all the node clusters. If a node is added/removed from a cluster, DaemonSet automatically adds/deletes the pod. This is useful for monitoring and logging because this way you can be sure that all the time you are monitoring every node and don’t have to manually manage the monitoring of the cluster.

Services

While deployment is responsible for keeping a set of pods running, the service enables network access to a set of pods. Services provide important features that are standardized across the cluster: load-balancing, service discovery between applications, and features to support zero-downtime application deployments. Each service has a unique IP address and a DNS hostname. Applications that consume a service can be manually configured to use either the IP address or the hostname, and the traffic will be load-balanced to the correct pods. In the External Traffic section, we will learn more about the service types and how we can use them to communicate between our internal services and with the external world.

ConfigMaps

If you want to deploy to multiple environments, like staging, dev, and prod, it’s a bad practice to bake the configs into the application because of environmental differences. Ideally, you’ll want to separate configurations to match the deploy environment. This is where ConfigMap comes into play. ConfigMaps allow you to decouple configuration artifacts from image content to keep containerized applications portable.

External Traffic

So you have all the services running in your cluster, but now the question is how to get external traffic into your cluster? Three different service types can be used for handling external traffic: ClusterIP, NodePort, and LoadBalancer. The 4th solution is adding another abstraction layer called Ingress Controller.

ClusterIP

This is the default service type in Kubernetes and allows you to communicate with other services inside your cluster. This is not meant for external access, but with a little hack, by using a proxy, external traffic can hit our service. Don’t use this solution in production, but only for debugging. Services declared as ClusterIP should NOT be directly visible from the outside.

NodePort

As we saw in the first part of this article, pods are running on nodes. Nodes can be different devices, like a laptop, or can be a virtual machine (when working in the cloud). Each node has a fixed IP address. By declaring a service as NodePort, the service will expose the node’s IP address so that you can access it from the outside. This can be used in production, but manually managing all the different IP addresses can be cumbersome for large applications where you have many services.

LoadBalancer

Declaring a service of type LoadBalancer exposes it externally using a cloud provider’s load balancer. How the traffic from that external load balancer is routed to the Service pods depends on the cluster provider. This is a really good solution, you don’t have to manage all the IP addresses of every node of the cluster, but you will have one load balancer per service. The downside is that every service will have a separate load balancer and you will be billed per load balancer instance.

This solution is good for production, but it can be a bit expensive. So let’s see a cheaper solution.

Ingress

Ingress is not a service but an API object that manages external access to the services in a cluster. It acts as a reverse proxy and single entry point to your cluster that routes requests to different services. I usually use NGINX Ingress Controller, which takes on the role of reverse proxy while also functioning as SSL. The best production-ready solution is to expose the ingress by using a load balancer.

With this solution you can expose any number of services using a single load balancer, keeping your bills as low as possible.

Next Steps

In this article, we learned about the basic concepts used in Kubernetes, its hardware structure, and the different software components like PodsDeploymentsStatefulSets, and Services and saw how to communicate between services and with the outside world.

In the next articlewe will set up a Azure cluster, create an infrastructure with a LoadBalancer, an Ingress Controller, and two Services, and use two Deployments to spin up three Pods per Service.

If you want more “Stupid Simple” explanations, please follow me on Medium!

There is another ongoing “Stupid Simple AI” series. The first two articles can be found here: SVM and Kernel SVM and KNN in Python.