Debugging your Rancher Kubernetes Cluster the GenAI Way with k8sgpt, Ollama & Rancher Desktop

Thursday, 29 August, 2024

The advancements in GenAI technology are creating a significant impact across domains/sectors, and the Kubernetes ecosystem is no exception. Numerous interesting GenAI projects and products have emerged aimed at enhancing the efficiency of Kubernetes cluster creation and management. From simplifying application containerization for engineers to addressing complex Kubernetes-related queries or troubleshooting issues within a cluster, GenAI demonstrates immense potential.

One of the areas where DevOps engineers and beginner-level cluster operators often face challenges is in identifying, understanding, and resolving problems within a Kubernetes cluster. In this regard, k8sgpt, an open source GenAI project within the cloud native ecosystem, appears to offer a promising AI-driven solution.

Let us explore how to configure and utilize k8sgpt, open source LLMs via Ollama and Rancher Desktop to identify problems in a Rancher cluster and gain insights into resolving those problems the GenAI way.

 

Tools Setup

 

Install k8sgpt CLI

K8sGPT is a GenAI tool for scanning your Kubernetes clusters, diagnosing and triaging issues in simple English. Follow instructions at https://docs.k8sgpt.ai/getting-started/installation/ to download and install the k8sgpt CLI tool on your machine.
 

Install Ollama and download an open-source LLM

Ollama is a simple tool that gets you up and running with open source large language models (LLMs). You can download Ollama from https://ollama.com/ and install it on your machine. Launch Ollama. Run the command given below to pull a LLM. Note that the mistral model requires ~ 4GB of disk space.

$ ollama pull mistral

 

Install Rancher Desktop

Rancher Desktop is an open source application that provides all the essentials to work with containers and Kubernetes on the desktop. You can download Rancher Desktop from https://rancherdesktop.io/ and install it on your machine. Launch Rancher Desktop application with the default configuration. Run the command below to ensure the app is running fine.

$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy               
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy

 

Download the KubeConfig file of your Rancher Cluster

You can download the KubeConfig file of your Rancher cluster from the Rancher UI. If you are trying the setup provided in this blog post on your local cluster provided by Rancher Desktop, then you can skip this step.

 

Configure k8sgpt to use Ollama backend

You can use k8sgpt with a variety of backends such as OpenAI, Azure OpenAI, Google Vertex and many others. Run the command below to set up the local Ollama as the backend.

$ k8sgpt auth add --backend localai --model mistral --baseurl http://localhost:11434/v1

That’s all with the setup.

Analyze your cluster for issues

Before trying the k8sgpt analyze command, let’s first introduce a problem in the Kubernetes cluster so k8sgpt has something to report on. The sample deployment is from the  GitHub repository robusta-dev/kubernetes-demos where you can find many interesting examples to mimic common problems that occur in a Kubernetes environment.

$ kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

Run the command below to get GenAI-generated insights about problems in your cluster. Note that if you are analyzing the local cluster provided by Rancher Desktop, you can skip the --kubeconfig flag.

$ k8sgpt analyze --explain --backend localai --with-doc --kubeconfig C:\\path\\to\\your\rancher\\.kube\\config

100% |██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 958 it/s)

AI Provider: localai




0: Pod default/payment-processing-worker-74754cf949-srpbk(Deployment/payment-processing-worker)

- Error: the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-74754cf949-srpbk

Error: The container 'payment-processing-container' in the pod 'payment-processing-worker-74754cf949-srpbk' has completed.

Solution:

Check if the completion of the container is intended or not, as it might be due to a successful processing task.
If it's unintended, verify if there are any misconfigurations in the deployment or pod specification.
Inspect the logs of the container for any error messages leading up to its completion.
If necessary, restart the container or pod to ensure proper functioning.
Monitor the application to prevent similar issues in the future and ensure smooth operation.

As shown in the k8sgpt analyze command output, k8sgpt identified and flagged the problematic pod and also provided some guidance on potential steps you can take to understand and resolve the problem.

Bonus: Setting up the k8sgpt operator

k8sgpt project also provides a Kubernetes operator that you can directly install in a Kubernetes cluster. The operator constantly watches for problems in the cluster and generates insights, which you can access by querying the operator’s custom resource (CR).

Follow the steps below to install and set up the operator. Please note that the instructions provided here are for the local Kubernetes cluster provided by Rancher Desktop. However, you can use the same instructions for a remote Kubernetes cluster. For a remote cluster, you just need to point the configuration to an Ollama instance that is accessible to your cluster.

Install the k8sgpt operator

Run the following commands

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace

 

Configure k8sgpt to use the Ollama backend

Run the command below.

$ kubectl apply -n k8sgpt-operator-system -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-ollama
spec:
  ai:
    enabled: true
    model: mistral
    backend: localai
    baseUrl: http://host.docker.internal:11434/v1
  noCache: false
  filters: ["Pod"]
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.8
EOF

That’s all with the operator setup. The operator will watch for problems in the cluster and generate analysis results that you can view using the command below. Depending on your machine’s horsepower, it takes some time for the operator to call the LLM and generate the insights.

$ kubectl get results -n k8sgpt-operator-system -o json | jq .
    {
      "apiVersion": "v1",
      "items": [
        {
          "apiVersion": "core.k8sgpt.ai/v1alpha1",
          "kind": "Result",
          "metadata": {
            "creationTimestamp": "2024-08-09T23:05:29Z",
            "generation": 1,
            "labels": {
              "k8sgpts.k8sgpt.ai/backend": "localai",
              "k8sgpts.k8sgpt.ai/name": "k8sgpt-ollama",
              "k8sgpts.k8sgpt.ai/namespace": "k8sgpt-operator-system"
            },
            "name": "defaultpaymentprocessingworker74754cf949srpbk",
            "namespace": "k8sgpt-operator-system",
            "resourceVersion": "11274",
            "uid": "e9967f21-e62c-461b-8b36-908c1f852420"
          },
          "spec": {
            "backend": "localai",
            "details": " Error: The container named `payment-processing-container` in the pod named `payment-processing-worker-74754cf949-srpbk` is restarting repeatedly and failed. This failure is temporary and will back off for 5 minutes before retrying.\n\nSolution:\n1. Check the logs of the container using the command `kubectl logs payment-processing-worker-74754cf949-srpbk --container payment-processing-container` to determine the cause of the failure.\n2. If the issue persists, update the container definition in the Deployment or ReplicaSet to increase the number of `restartPolicy: Always`. This will ensure that the container respawns if it encounters an error.\n3. Investigate any underlying issues causing the repeated failures, such as insufficient memory or resources, incorrect configurations, or errors with dependent services.\n4. If necessary, make corrections to the application code, configuration files, or environment variables, as they may be directly influencing the performance of the container.\n5. Once the issue is resolved, check that the container does not restart again in the subsequent periods by reviewing the logs and verifying that the application functions correctly.",
            "error": [
              {
                "text": "back-off 5m0s restarting failed container=payment-processing-container pod=payment-processing-worker-74754cf949-srpbk_default(45b88d2e-a270-4dff-830e-57532d984a87)"
              }
            ],
            "kind": "Pod",
            "name": "default/payment-processing-worker-74754cf949-srpbk",
            "parentObject": ""
          },
          "status": {
            "lifecycle": "historical"
          }
        }
      ],
      "kind": "List",
      "metadata": {
        "resourceVersion": ""
      }
    }

Yay! You have set up a GenAI companion to help yourself investigate problems in your Kubernetes cluster using fully open source tools.

Announcing the Rancher Kubernetes API

Friday, 5 January, 2024

It is our pleasure to introduce the first officially supported API with Rancher v2.8: the Rancher Kubernetes API, or RK-API for short.

Since the introduction of Rancher v2.0, a publicly supported API has been one of our most requested features. The Rancher APIs, which you may recognize as v3 (Norman) or v1 (Steve), have never been officially supported and can only be automated using our Terraform Provider.

The driving motivation behind our new RK-API is to manage Rancher like you would manage Kubernetes. Leveraging the Custom Resources already used by Rancher Manager and communicating directly with the Kubernetes API. By using Custom Resources, any Kubernetes compatible tooling, such as kubectl or GitOps engines like Fleet, will make automating Rancher easy and accessible.

With the release of Rancher v2.8.0, we are bringing the first phase of our supported CRDs: Projects, GlobalRoles, RoleTemplates, GlobalRoleBindings, Clu sterRoleTemplateBindings and ProjectRoleTemplateBindings, with more on the way. Once upgraded, you can begin to interact immediately with these Custom Resources via the RK-API in any existing cluster.

Now, let’s see how it works.

Creating and modifying resources

Projects are a great example of how useful this new API can be. If I wanted to create a project in a managed cluster, instead of creating it in the dashboard, I could simply apply a project yaml directly to my Rancher’s management cluster.

First, you will need a Kubeconfig for the Rancher management cluster. The API’s quickstart guide explains how to make one.

Next, I will retrieve my downstream cluster’s ID. Note that this is the metadata.name of the cluster, not the displayName.

Then I can use that to create a project in that cluster, in this case with a randomly generated project name:

$ kubectl get clusters.management.cattle.io
NAME           AGE
c-m-bsg27dgn   13m
local          76m
$ kubectl create -f - <<EOF
apiVersion: management.cattle.io/v3
kind: Project
metadata:
  generateName: p-
  namespace: c-m-bsg27dgn
spec:
  clusterName: c-m-bsg27dgn
  displayName: project-from-api
EOF

Now we can see in the UI that a project exists:

To understand what each of these fields means and which are required, you can simply use `kubectl explain project` as you would with Kubernetes Resources.

$ kubectl explain projects.management.cattle.io
KIND:     Project
VERSION:  management.cattle.io/v3
DESCRIPTION:
     Project is a group of namespaces. Projects are used to create a
     multi-tenant environment within a Kubernetes cluster by managing namespace
     operations, such as role assignments or quotas, as a group.
FIELDS:
   apiVersion    <string>
     APIVersion defines the versioned schema of this representation of an.....

To modify or use the project, first look up the name of the project in the cluster namespace since it was created with an unpredictable ID using metadata generateName.

Deleting or editing the project is now simple; reference it under the cluster namespace:

$ kubectl --namespace c-m-bsg27dgn get projects
NAME           AGE
p-9x7lg        45s
$ kubectl --namespace c-m-bsg27dgn edit project p-9x7lg

One important note is that custom resources can live in the Rancher management cluster or the managed (downstream) cluster. Projects and namespaces are one such example. While the project resources exist in the management cluster, the namespace resources exist in the managed cluster. So, in order to create a namespace in your new project, you would need to switch your Kubeconfig to the managed cluster, then simply create the namespace with the proper annotation:

field.cattle.io/projectId: <cluster- id>:<project-id>

For example:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: mynamespace
  annotations:
    field.cattle.io/projectId: c-m-bsg27dgn:p-9x7lg
EOF

Now we can see that namespace exists:

Using the API via GitOps

One of the exciting applications of this new API is the ability to deploy resources with a modern infrastructure pipeline. In this example, we’ll use Rancher’s built-in Continuous Delivery system, also called Fleet. To start, you can create a git repo that contains the following in project/project.yml:

apiVersion: management.cattle.io/v3
kind: Project
metadata:
  name: project-test
  namespace: c-m-bsg27dgn
spec:
  clusterName: c-m-bsg27dgn
  displayName: project-test

Then, apply this in Rancher by creating a GitRepo, which will deploy the project to the management cluster:

$ kubectl apply -f - <<EOF
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: test
namespace: fleet-local
spec:
repo: "REPO_URL"
branch: main
paths:
- projects
EOF

Once applied, you’ll find your new project created in the managed cluster. Let’s take this one final step: adding a user to the project.

If we check the project, we’ll see it now has a project member added; success! You can see how the management of resources at scale is as easy as editing or copying a few files.

apiVersion: management.cattle.io/v3
kind: ProjectRoleTemplateBinding
metadata:
  name: add-proj-member
  namespace: project-test
projectName: c-m-bsg27dgn:project-test
roleTemplateName: project-member
userName: user-gvmvn

Summary

While only a limited set of CRDs are going live in 2.8.0, they are some of the most widely requested, and many more are currently being developed for upcoming releases. We believe this new API will be a large benefit to customers, as well as a commitment to the stability and documentation of Rancher Manager and its features.

For more details, see the new documentation page. These include a quick-start, API reference and common workflows.

When to Use K3s and RKE2

Wednesday, 14 December, 2022

K3s and Rancher Kubernetes Engine (RKE2) are two Kubernetes distributions from the SUSE Rancher container platform. Either project can be used to run a production-ready cluster; however, they target different use cases and consequently possess unique characteristics.

This article will explain the similarities and differences between the projects. You’ll learn when it makes sense to use RKE2 instead of K3s and vice versa. Selecting the right choice is important because it affects the security and compliance of the containerized workloads you deploy.

K3s and RKE2

K3s provides a production-ready Kubernetes cluster from a single binary that weighs in at under 60MB. Because K3s is so lightweight, it’s a great option for running Kubernetes at the edge on IoT devices, low-power servers and your developer workstations.

Meanwhile, RKE2 is an alternative project that also runs a production-ready cluster. It offers similar simplicity to K3s while adding additional security and conformance layers, including Federal Information Processing Standard (FIPS) 140-2 compliance for use in the U.S. federal government and DISA STIG compliance.

RKE2 has evolved from the original RKE project. It’s also known as RKE Government, reflecting its suitability for the most demanding sectors. It’s not just government agencies that can benefit, though — the distribution is ideal for all organizations that prioritize security and compliance, so it continues to be primarily marketed as RKE2.

Similarities between K3s and RKE2

K3s and RKE2 are both lightweight Cloud Native Computing Foundation (CNCF)-certified Kubernetes distributions that Rancher fully supports. Although they diverge in their target use cases, the two platforms have several intentional similarities in how they’re launched and operated. Both can be deployed with Rancher Manager, and they each run your containers using the industry-standard containerd runtime.

Usability

K3s and RKE2 each offer good usability with a quick setup experience.

Starting a K3s cluster on a new host can be achieved with one command and around 30 seconds of waiting:

$ curl -sfL https://get.k3s.io | sudo sh -

The service is registered and started for you, so you can immediately run kubectl commands to interact with your cluster:

$ k3s kubectl get pods

RKE2 doesn’t fare much worse. A similarly straightforward installation script is available to download its binary:

$ curl -sfL https://get.rke2.io | sudo sh -

RKE2 doesn’t start its service by default. Run the following commands to enable and start RKE2 in server (control plane) mode:

$ sudo systemctl enable rke2-server.service
$ sudo systemctl start rke2-server.service

You can find the bundled kubectl binary at /var/lib/rancher/rke2/bin. It’s not added to your PATH by default; a kubeconfig file is deposited to /etc/rancher/rke2/rke2.yaml:

$ export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
$ /var/lib/rancher/rke2/bin/kubectl get pods

Ease of operation

In addition to their usability, K3s and RKE2 are simple to operate. You can upgrade clusters created with either project by repeating the installation script on each node:

# K3s
$ curl -sfL https://get.k3s.io | sh -

# RKE2
$ curl -sfL https://get.rke2.io | sh -

You should repeat any flags you supplied to the original installation command.

Automated upgrades are supported using Rancher’s system-upgrade-controller. After installing the controller, you can declaratively create Plan objects that describe how to migrate the cluster to a new version. Plan is a custom resource definition (CRD) provided by the controller.

Backing up and restoring data is another common Kubernetes challenge. K3s and RKE2 also mirror each other in this field. Snapshots are automatically written and retained for a configurable period. You can easily restore a cluster from a snapshot by running the following command:

# K3s
$ k3s server \
    --cluster-reset \
    --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/etcd-old-<BACKUP_DATE>

# RKE2
$ rke2 server \
    --cluster-reset \
    --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/etcd-old-<BACKUP_DATE>

Deployment model

K3s and RKE2 share their single-binary deployment model. They bundle all their dependencies into one download, allowing you to deploy a functioning cluster with minimal Kubernetes experience.

The projects also support air-gapped environments to accommodate critical machines that are physically separated from your network. Air-gapped images are provided in the release artifacts. Once you’ve transferred the images to your machine, running the regular installation script will bootstrap your cluster.

High availability and multiple nodes

K3s and RKE2 are designed to run in production. K3s is often used in the development, too, having gained a reputation as an ideal single-node cluster. It has robust multi-node management and is capable of supporting fleets of IoT devices.

Both projects can run the control plane with high availability, too. You can distribute replicas of control plane components over several server nodes and use external data stores instead of the embedded ones.

Differences between K3s and RKE2

K3s and RKE2 both offer single-binary Kubernetes, high availability and easy backups, with many commands interchangeable between the two. However, some key differences affect where and when they should be used. It’s these characteristics that justify the distributions existing as two independent projects.

RKE2 is closer to upstream Kubernetes

K3s is CNCF-certified, but it deviates from upstream Kubernetes in a few ways. It uses SQLite instead of etcd as its default data store, although an embedded etcd instance is available as an option in modern releases. K3s also bundles additional utilities, such as the Traefik Ingress controller.

RKE2 sticks closer to standard Kubernetes, promoting conformance as one of its main features. This gives you confidence that workloads developed for other distributions will run reliably in RKE2. It reduces the risk of inconvenient gotchas that can occur when K3s steps out of alignment with upstream Kubernetes. RKE2 automatically uses etcd for data storage and omits nonstandard components included in K3s.

RKE2 uses embedded etcd

The standard SQLite database in K3s is beneficial for compactness and can optimize performance in small clusters. In contrast, RKE2’s default use of etcd creates a more conformant experience, allowing you to integrate directly with other Kubernetes tools that expect an etcd data store.

While K3s can be configured with etcd, it’s an option you need to turn on. RKE2 is designed around it, which reduces the risk of misconfiguration and subpar performance.

K3s also supports MySQL and PostgreSQL as alternative storage solutions. These let you manage your Kubernetes data using your existing tooling for relational databases, such as backups and maintenance operations. RKE2 only works with etcd, offering no support for SQL-based storage.

RKE2 is more security-minded

RKE2 has a much stronger focus on security. Whereas edge operation is K3s’s specialism, RKE2 prioritizes security as its greatest strength.

Hardened against the CIS benchmark

The distribution comes configured for compatibility with the CIS Kubernetes Hardening Benchmark v1.23 (v1.6 in RKE2 v1.25 and earlier). The defaults that RKE2 applies allow your clusters to reach the standard with minimal manual work.

You’re still responsible for tightening the OS-level controls on your nodes. This includes applying appropriate kernel parameters and ensuring the etcd data directory is correctly protected.

You can enforce that a safe configuration is required by starting RKE2 with the profile flag set to cis-1.23. RKE2 will then exit with a fatal error if the operating system hasn’t been suitably hardened.

Beyond configuring the OS, you must also set up suitable network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies) and [Pod security admission rules] to secure your cluster’s workloads. The security admission controller can be configured to use profiles which meet the CIS benchmark. This will prevent non-compliant Pods from being deployed to your cluster.

Regularly scanned for threats

The safety of the RKE2 distribution is maintained within its build pipeline. Components are regularly scanned for new common vulnerabilities and exposures (CVEs) using the Trivy container vulnerability tool. This provides confidence that RKE2 itself isn’t harboring threats that could let attackers into your environment.

FIPS 140-2 compliant

K3s lacks any formal security accreditations. RKE2 meets the FIPS 140-2 standard that the U.S. government uses to approve cryptographic modules.

The project’s Go code is compiled using FIPS-validated crypto modules instead of the versions in the Go standard library. Each of the distribution’s components, from the Kubernetes API server through to kubelet and the bundled kubectl binary, are compiled with the FIPS-compatible compiler.

The FIPS mandate means RKE2 can be deployed in government environments and other contexts that mandate verifiable cryptographic performance. The entire RKE2 stack is compliant when you use the built-in components, such as the containerd runtime and etcd data store.

When to use K3s

K3s should be your preferred solution when you’re seeking a performant Kubernetes distribution to run on the edge. It’s also a good choice for single-node development clusters as well as ephemeral environments used in your CI pipelines and other build tools.

This distribution makes the most sense in situations where your primary objective is deploying Kubernetes with all dependencies from a single binary. It’s lightweight, quick to start and easy to manage, so you can focus on writing and testing your application.

When to use RKE2

You should use RKE2 whenever security is critical, such as in government services and other highly regulated industries, including finance and healthcare. As previously stated, the complete RKE2 distribution is FIPS 140-2 compliant and comes hardened with the CIS Kubernetes Benchmark. It’s also the only DISA STIG-certified Kubernetes distribution, meaning it’s approved for use in the most stringent U.S. government environments, including the Department of Defense.

RKE2 is fully certified and tightly aligned with upstream Kubernetes. It omits the K3s components that aren’t standard Kubernetes or that are unstable alpha features. This increases the probability that your deployments will be interoperable across different environments. It also reduces the risk of nonconformance that can occur through oversight when you’re manually hardening Kubernetes clusters.

Near edge computing is another primary use case for RKE2 over K3s. RKE2 ships with support for multiple CNI networking plugins, including Cilium, Calico, and Multus. Multus allows pods to have multiple network interfaces attached, making it ideal for use cases such as telco distribution centers and factories with several different production facilities. In these situations, it’s critical to have robust networking support with different network adapters. K3s bundles Flannel as its built-in CNI plugin; you can install a different provider, but all configuration has to be performed manually. RKE2’s default distribution provides integrated options for common networking solutions.

Conclusion

K3s and RKE2 are two popular Kubernetes distributions that overlap each other in several ways. They both offer a simple deployment experience, frictionless long-term maintenance, and high performance and compatibility.

While designed for tiny and far-edge use cases, K3s are not limited to these scenarios. It’s also widely used for development, in labs, or in resource-constrained environments. However, K3s is not focused on security, so you’ll need to secure and enforce your clusters.

RKE2 takes the usability ethos from K3s and applies it to a fully conformant distribution. It’s tailored for security, close alignment with upstream Kubernetes, and compliance in regulated environments such as government agencies. RKE2 is suitable for both data centers and near-edge situations as it offers built-in support for advanced networking plugins, including Multus.

Which you should choose depends on where your cluster will run and the workloads you’ll deploy. You should use RKE2 if you want a hardened distribution for security-critical workloads or if you need FIPS 140-2 compliance. It will help you establish and maintain a healthy security baseline. K3s remain an actively supported alternative for less sensitive situations and edge workloads. It’s a batteries-included project for when you want to focus on your apps instead of the environment that runs them.

Both distributions can be managed by Rancher and integrated with your DevOps toolbox. You can use solutions like Fleet to deploy applications at scale with GitOps strategies, then head to the Rancher dashboard to monitor your workloads.

Keeping Track of Kubernetes Deprecated Resources

Wednesday, 9 November, 2022

It’s a fact of life: as the Kubernetes API evolves, it’s periodically reorganized or upgraded. This means some Kubernetes resources can be deprecated and later removed. We deserve to keep track of those deprecations and removals easily. For that, we have just released the new deprecated-api-versions policy for Kubewarden, our efficient Kubernetes policy engine that runs policies compiled to Wasm. This policy checks for the usage of Kubernetes resources that have been deprecated or removed from the Kubernetes API.

A look at the deprecated-api-versions policy

This policy has two settings:

  1. kubernetes_version: The starting version begins with where to detect deprecated or removed Kubernetes resources. This setting is mandatory.
  2. deny_on_deprecation: If true, it will deny the operation on a resource that has been deprecated but not yet removed from the Kubernetes version specified by kubernetes_version. This setting is optional and is set to true by default.

As an example, extensions/v1beta1/Ingress was deprecated in Kubernetes 1.14.0, and removed in v1.22.0.

With the following policy settings, the policy accepts an extensions/v1beta1/Ingress in the cluster, yet the policy logs this result:

kubernetes_version: "1.19.0"
deny_on_deprecation: false

In contrast, with these other settings, the policy blocks the Ingress object:

kubernetes_version: "1.19.0"
deny_on_deprecation: true # (the default)

Don’t live in the past

Kubernetes deprecations evolve; we will update the policy as soon as there are new deprecations. The policy versioning scheme tells you up to what version of Kubernetes the policy knows about, e.g. 0.1.0-k8sv1.26.0 means that the policy knows about deprecations up to Kubernetes v1.26.0.

Back to the future

You are about to update your cluster’s Kubernetes version and wonder, will your workloads keep working? Will you be in trouble because of deprecated or removed resources in the new version? Check before updating! Just instantiate the deprecated-api-versions policy with the targetted Kubernetes version and deny_on_deprecation set to false, and get an overview of future-you problems.

In action

As usual, instantiate a ClusterAdmissionPolicy (cluster-wide) or AdmissionPolicy (namespaced) that makes use of the policy.

For this example, let’s work in a k8s cluster of version 1.24.0.

Here’s a definition of a cluster-wide policy that rejects resources that were deprecated or removed in Kubernetes version 1.23.0 and earlier:

kubectl apply -f - <<EOF
apiVersion: policies.kubewarden.io/v1
kind: ClusterAdmissionPolicy
metadata:
  name: my-deprecated-api-versions-policy
spec:
  module: ghcr.io/kubewarden/policies/deprecated-api-versions:v0.1.0-k8sv1.26.0
  mutating: false
  rules:
  - apiGroups: ["*"]
    apiVersions: ["*"]
    resources: ["*"]
    operations:
    - CREATE
    - UPDATE
  settings:
    kubernetes_version: "1.23.0"
    deny_on_deprecation: true
EOF

Info: In spec.rules we are checking every resource in every apiGroup and apiVersions. We are doing it for simplicity in this example, yet the policy metadata.yaml comes with long and complete, machine-generated spec.rules that covers just the resources that are deprecated.

You can obtain the right rules by using the kwctl scaffold command.

Our cluster is on version 1.24.0, so for example, without the policy, we could still instantiate an autoscaling/v2beta2/HorizontalPodAutoscaler, even if it is deprecated since 1.23.0 (and will be removed in 1.26.0).

Now with the policy, trying to instantiate an autoscaling/v2beta2/HorizontalPodAutoscaler resource that is already deprecated will result in its rejection:

kubectl apply -f - <<EOF
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
EOF

Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Error from server: error when creating "STDIN":
admission webhook "clusterwide-my-deprecated-api-versions-policy.kubewarden.admission" denied the request:
autoscaling/v2beta2 HorizontalPodAutoscaler cannot be used. It has been deprecated starting from 1.23.0. It has been removed starting from 1.26.0. It has been replaced by autoscaling/v2.

Have ideas for new policies? Would you like more features on existing ones? Drop us a line at #kubewarden on Slack! We look forward to your feedback 🙂

Kubernetes Jobs Deployment Strategies in Continuous Delivery Scenarios

Wednesday, 5 October, 2022

Introduction

Continuous Delivery (CD) frameworks for Kubernetes, like the one created by Rancher with Fleet, are quite robust and easy to implement. Still, there are some rough edges you should pay attention to. Jobs deployment is one of those scenarios where things may not be straightforward, so you may need to stop and think about the best way to process them.

We’ll explain here the challenges you may face and will give some tips about how to overcome them.

While this blog is based on Fleet’s CD implementation, most of what’s discussed here also applies to other tools like ArgoCD or Flux.

The problem

Let’s start with a small recap about how Kubernetes objects work.

There are some elements in Kubernetes objects that are immutable. That means that changes to the immutable fields are not allowed once one of those objects is created.

A Kubernetes Job is a good example as the template field, where the actual code of the Job is defined, is immutable. Once created, the code can’t be changed. If we make any changes, we’ll get an error during the deployment.
This is how the error looks when we try to update the Job:

The Job "update-job" is invalid: spec.template: Invalid value: 
core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", 
UID:"", ResourceVersion:"", 
.... 
: field is immutable

And this is how the error is shown on Fleet’s UI:

Job replace error as seen on Rancher Fleet

When we do deployments manually and update code within Jobs, we can delete the previous Job manually and relaunch our deployment. However, when using Continuous Delivery (CD), things should work without manual intervention. It’s critical to find a way for the CD process to run without errors and update Jobs in a way that doesn’t need manual intervention.

Thus we have reached the point where we have a new version of a Job code that needs to be deployed, and the old Job should not stop us from doing that in an automated way.

Things to consider before configuring your CD process

Our first recommendation, even if not directly related to the Fleet or Jobs update problem, is always to try using Helm to distribute applications.

Helm installation packages (called Charts) are easy to build. You can quickly build a Chart by putting together all your Kubernetes deployment files in a folder called templates plus some metadata (name, version, etc.) defined in a file called Chart.yaml.

Using Helm to distribute applications with CD tools like Fleet and ArgoCD offers some additional features that can be useful when dealing with Job updates.

Second, In terms of how Jobs are implemented and their internal logic behaves, they must be designed to be idempotent. Jobs are meant to be run just once, and if our CD process manages to update and recreate them on each run, we need to be sure that relaunching them doesn’t break anything.

Idempotency is an absolute must while designing Kubernetes CronJobs, but all those best practices should also be applied here to avoid undesired side effects.

Solutions

Solution 1: Let Jobs self-destroy after execution

The process is well described in Kubernetes documentation:

“A way to clean up finished Jobs automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the spec.ttlSecondsAfterFinished field of the Job”

Example:

apiVersion: batch/v1
kind: TestJob
metadata:
  name: job-with-ttlsecondsafterfinished
spec:
  ttlSecondsAfterFinished: 5
  template:
    spec:
...

When we add ttlSecondsAfterFinished to the spec, the TTL controller will delete the Job once it finishes. If the field is not present or the value is empty, the Job won’t be deleted, following the traditional behavior. A value of 0 will fire the deletion just after it finishes its execution. An integer value greater than 0 will define how many seconds will pass after the execution before the Job is deleted. This new feature has been stable since Kubernetes 1.23.

While this seems a pretty elegant way to clean finished jobs (future runs won’t complain about the update of immutable resources), it creates some problems for the CD tools.

The entire CD concepts rely on the fact that our external repos holding our application definition are the source of truth. That means that CD tools are constantly monitoring our deployment and notifying us about changes.

As a result, Fleet detects a change when the Job is deleted, so the deployment is marked as “Modified”:

Rancher Fleet modified Git repo

If we look at the details, we can see how Fleet detected that the Job is missing:

Rancher Fleet modified Git repo details

The change can also be seen in the corresponding Fleet Bundle status:

...
status:
  conditions:
  - lastUpdateTime: "2022-10-02T19:39:09Z"
    message: Modified(2) [Cluster fleet-default/c-ntztj]; job.batch fleet-job-with-ttlsecondsafterfinished/job-with-ttlsecondsafterfinished
      missing
    status: "False"
    type: Ready
  - lastUpdateTime: "2022-10-02T19:39:10Z"
    status: "True"
    type: Processed
  display:
    readyClusters: 0/2
    state: Modified
  maxNew: 50
  maxUnavailable: 2
  maxUnavailablePartitions: 0
  observedGeneration: 1
...

This solution is really easy to implement, with the only drawback of having repositories marked as Modified, which may be confusing over time.

Solution 2: Use a different Job name on each deployment

Here is when Helm comes to our help.

Using Helm’s processing, we can easily generate a random name for the Job each time Fleet performs an update. The old Job will also be removed as it’s no longer in our Git repository.

apiVersion: batch/v1
kind: Job
metadata:
  name: job-with-random-name-{{ randAlphaNum 8 | lower }}
  labels:
    realname: job-with-random-name
spec:
  template:
    spec:
      restartPolicy: Never
...

Summary

We hope what we have shared here helps to understand better how to manage Jobs in CD scenarios and how to deal with changes on immutable objects.

The code examples used in this blog are available on our GitHub repository: https://github.com/SUSE-Technical-Marketing/fleet-job-deployment-examples

If you have other scenarios that you’d want us to cover, we’d love to hear them and discuss ideas that can help improve Fleet in the future.

Please join us at the Rancher’s community Slack channel for Fleet and Continuous Delivery or at the CNCF official Slack Channel for Rancher.

Epinio and Crossplane: the Perfect Kubernetes Fit

Thursday, 18 August, 2022

One of the greatest challenges that operators and developers face is infrastructure provisioning: it should be resilient, reliable, reproducible and even audited. This is where Infrastructure as Code (IaC) comes in.

In the last few years, we have seen many tools that tried to solve this problem, sometimes offered by the cloud providers (AWS CloudFormation) or vendor-agnostic solutions like Terraform and Pulumi. However, Kubernetes is becoming the standard for application deployment, and that’s where Crossplane fits in the picture. Crossplane is an open source Kubernetes add-on that transforms your cluster into a universal control plane.

The idea behind Crossplane is to leverage Kubernetes manifests to build custom control planes that can compose and provision multi-cloud infrastructure from any provider.

If you’re an operator, its highly flexible approach gives you the power to create custom configurations, and the control plane will track any change, trying to keep the state of your infrastructure as you configured it.

On the other side, developers don’t want to bother with the infrastructure details. They want to focus on delivering the best product to their customers, possibly in the fastest way. Epinio is a tool from SUSE that allows you to go from code to URL in just one push without worrying about all the intermediate steps. It will take care of building the application, packaging your image, and deploying it into your cluster.

This is why these two open source projects fit perfectly – provisioning infrastructure and deploying applications inside your Kubernetes platform!

Let’s take a look at how we can use them together:

# Push our app 
-> % epinio push -n myapp -p assets/golang-sample-app 

# Create the service 
-> % epinio service create dynamodb-table mydynamo 

# Bind the two 
-> % epinio service bind mydynamo myapp 

That was easy! With just three commands, we have:

  1. Deployed our application
  2. Provisioned a DynamoDB Table with Crossplane
  3. Bound the service connection details to our application

Ok, probably too easy, but this was just the developer’s perspective. And this is what Epinio is all about: simplifying the developer experience.

Let’s look at how to set up everything to make it work!

Prerequisites

I’m going to assume that we already have a Kubernetes cluster with Epinio and Crossplane installed. To install Epinio, you can refer to our documentation. This was tested with the latest Epinio version v1.1.0, Crossplane v.1.9.0 and the provider-aws v0.29.0.

Since we are using the enable-external-secret-stores alpha feature of Crossplane to enable it, we need to provide the args={--enable-external-secret-stores} value during the Helm installation:

-> % helm install crossplane \
    --create-namespace --namespace crossplane-system \
    crossplane-stable/crossplane \
    --set args={--enable-external-secret-stores}

 

Also, provide the same argument to the AWS Provider with a custom ControllerConfig:

apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
  name: aws-config
spec:
  args:
  - --enable-external-secret-stores
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: crossplane-provider-aws
spec:
  package: crossplane/provider-aws:v0.29.0
  controllerConfigRef:
    name: aws-config

 

Epinio services

To use Epinio and Crossplane together, we can leverage the Epinio Services. They provide a flexible way to add custom resources using Helm charts. The operator can prepare a Crossplane Helm chart to claim all resources needed. The Helm chart can then be added to the Epinio Service Catalog. Finally, the developers will be able to consume the service and have all the needed resources provisioned.

 

Prepare our catalog

We must prepare and publish our Helm chart to add our service to the catalog.

In our example, it will contain only a simple DynamoDB Table. In a real scenario, the operator will probably define a claim to a Composite Resource, but for simplicity, we are using some Managed Resource directly.

For a deeper look, I’ll invite you to take a look at the Crossplane documentation about composition.

We can see that this resource will “publish” its connection details to a secret defined with the publishConnectionDetailsTo attribute (this is the alpha feature that we need). The secret and the resource will have the app.kubernetes.io/instance label with the Epinio Service instance name. We can correlate the two with this label as Epinio services and configurations.

apiVersion: dynamodb.aws.crossplane.io/v1alpha1
kind: Table
metadata:
  name: {{ .Release.Name | quote }}
  namespace: {{ .Release.Namespace | quote }}
  labels:
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
spec:
  publishConnectionDetailsTo:
    name: {{ .Release.Name }}-conn
    metadata:
      labels:
        app.kubernetes.io/instance: {{ .Release.Name | quote }}
      annotations:
        kubed.appscode.com/sync: "kubernetes.io/metadata.name={{ .Release.Namespace }}"
  providerConfigRef:
    name: aws-provider-config
  forProvider:
    region: eu-west-1
    tags:
    - key: env
      value: test
    attributeDefinitions:
    - attributeName: Name
      attributeType: S
    - attributeName: Surname
      attributeType: S
    keySchema:
    - attributeName: Name
      keyType: HASH
    - attributeName: Surname
      keyType: RANGE
    provisionedThroughput:
      readCapacityUnits: 7
      writeCapacityUnits: 7

 

Note: You can see a kubed annotation in this Helm chart. This is because the generated secrets need to be in the same namespace as the services and applications. Since we are using directly a managed resource then the secret will be in the namespace defined in the default StoreConfig (the crossplane-system  namespace). We are using kubed to copy this secret in the release namespace.
https://github.com/crossplane/crossplane/blob/master/design/design-doc-external-secret-stores.md#secret-configuration-publishconnectiondetailsto

 

We can now package and publish this Helm chart to a repository and add it to the Epinio Service Catalog by applying a service manifest containing the information on where to fetch the chart.

The application, .epinio.io/catalog-service-secret-types, define the list of the secret types that Epinio should look for. Crossplane will generate the secrets with their own secret type, so we need to explicit it.

apiVersion: application.epinio.io/v1
kind: Service
metadata:
  name: dynamodb-table
  namespace: epinio
  annotations:
    application.epinio.io/catalog-service-secret-types: connection.crossplane.io/v1alpha1
spec:
  name: dynamodb-table
  shortDescription: A simple DynamoDBTable that can be used during development
  description: A simple DynamoDBTable that can be used during development
  chart: dynamodb-test
  chartVersion: 1.0.0
  appVersion: 1.0.0
  helmRepo:
    name: reponame
    url: https://charts.example.com/reponame
  values: ""

 

Now we can see that our custom service is available in the catalog:

-> % epinio service catalog

Create and bind a service

Now that our service is available in the catalog, the developers can use it to provision DynamoDBTables with Epinio:

-> % epinio service create dynamodb-table mydynamo

We can check that a dynamo table resource was created and that the corresponding table is available on AWS:

-> % kubectl get tables.dynamodb.aws.crossplane.io
-> % aws dynamodb list-tables

We can now create an app with the epinio push command. Once deployed, we can bind it to our service with the epinio service bind:

-> % epinio push -n myapp -p assets/golang-sample-app
-> % epinio service bind mydynamo myapp
-> % epinio service list

And that’s it! We can see that our application was bound to our service!

The bind command did a lot of things. It fetched the secrets generated by Crossplane and labeled them as Configurations. It also redeployed the application mounting these configurations inside the container.

We can check this with some Epinio commands:

-> % epinio configuration list

-> % epinio configuration show x937c8a59fec429c4edeb339b2bb6-conn

The shown access path is available in the application container. We can use exec in the app and see the content of that files:

-> % epinio app exec myapp

 

Conclusion

In this blog post, I’ve shown you it’s possible to create an Epinio Service that will use Crossplane to provide external resources to your Epinio application. We have seen that once the heavy lifting is done, the provision of a resource is just a matter of a couple of commands.

While some of these features are not ready, the Crossplane team is working hard on them, and I think they will be available soon!

Next Steps: Learn More at the Global Online Meetup on Epinio

Join our Global Online Meetup: Epinio on Wednesday, September 14th, 2022, at 11 AM EST. Dimitris Karakasilis and Robert Sirchia will discuss the Epinio GA 1.0 release and how it delivers applications to Kubernetes faster, along with a live demo! Sign up here.

Secure Supply Chain: Verifying Image Signatures in Kubewarden

Friday, 20 May, 2022

Secure Supply Chain: Verifying image signatures

 

After these last releases Kubewarden now has support for verifying the integrity and authenticity of artifacts within Kubewarden using the Sigstore project. In this post, we shall focus on verifying container image signatures using the new verify-image-signatures policy.

To learn more about how Sigstore works, take a look at our previous post

Verify Image Signatures Policy

This policy validates Pods by checking their container images for signatures (that is, containers, init containers and ephemeral containers in the pod)

The policy can inspect all the container images defined inside of a Pod or it can just analyze the ones that are matching a pattern provided by the user.

Container image tags are mutable, they can be changed to point to a completely different content. That’s why it’s a good security practice to reference container images by their immutable checksum.

This policy can rewrite the image definitions that are using a tag to instead reference the image by its checksum.

The policy will:

  • Ensure the image referenced by a tag is satisfying the signature requested by the operator
  • Extract the immutable reference of the image from the signatures
  • Rewrite the image reference to be in the form <image ref>@sha256:<digest>

Let’s see it in action!

For this example, a Kubernetes cluster with Kubewarden already installed is required. The installation process is described in the quick start guide.

We need an image with a signature that we can verify. You can use cosign to sign your images. For this example we’ll use the image ghcr.io/viccuad/app-example:v0.1.0 that was signed using keyless verification.

Obtain the issuer and subject using cosign.

COSIGN_EXPERIMENTAL=1 cosign verify ghcr.io/viccuad/app-example:v0.1.0
...
"Issuer": "https://token.actions.githubusercontent.com",
"Subject": "https://github.com/viccuad/app-example/.github/workflows/ci.yml@refs/tags/v0.1.0"
...

Let’s create a cluster-wide policy that will verify all images, and let’s use the issuer and subject for verification:

kubectl apply -f - <<EOF
apiVersion: policies.kubewarden.io/v1alpha2
kind: ClusterAdmissionPolicy
metadata:
  name: verify-image-signatures
spec:
  module: ghcr.io/kubewarden/policies/verify-image-signatures:v0.1.4
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    resources: ["pods"]
    operations:
    - CREATE
    - UPDATE
  mutating: true
  settings:
    signatures:
      - image: "*"
        keyless:
          - issuer: "https://token.actions.githubusercontent.com"
            subject: "https://github.com/viccuad/app-example/.github/workflows/ci.yml@refs/tags/v0.1.0"
EOF

Wait for the policy to be active:

kubectl wait --for=condition=PolicyActive clusteradmissionpolicies verify-image-signatures

Verify we can create pods with containers that are signed:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: verify-image-valid
spec:
  containers:
  - name: test-verify-image
    image: ghcr.io/viccuad/app-example:v0.1.0
EOF

Then check that the image was modified with the digest:

kubectl get pod verify-image-valid -o=jsonpath='{.spec.containers[0].image}'

This will produce the following output:

ghcr.io/viccuad/app-example:v0.1.0@sha256:d97d00f668dc5b7f0af65edbff6b37924c8e9b1edfc0ab0f7d2e522cab162d38

Finally, let’s try to create a pod with an image that it is not signed:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: verify-image-invalid
spec:
  containers:
  - name: test-verify-image
    image: ghcr.io/kubewarden/test-verify-image-signatures:unsigned
EOF

We will get the following error:

Error from server: error when creating "STDIN": admission webhook "clusterwide-verify-image-signatures.kubewarden.admission" denied the request: Pod verify-image-invalid is not accepted: verification of image ghcr.io/kubewarden/test-verify-image-signatures:unsigned failed: Host error: Callback evaluation failure: no signatures found for image: ghcr.io/kubewarden/test-verify-image-signatures:unsigned 

Recap

This policy is designed to meet all your needs. However, if you prefer you can build your own policy using one of the SDKs Kubewarden provides. We will show how to do this in an upcoming blog! Stay tuned!

Tags: Category: Community page, Kubernetes Comments closed

Secure Supply Chain: Securing Kubewarden Policies

Wednesday, 4 May, 2022

With recent releases, the Kubewarden stack supports verifying the integrity and authenticity of content using the Sigstore project.

In this post, we focus on Kubewarden Policies and how to create a Secure Supply Chain for them.

Sigstore?

Since a full Sigstore dive is not within the scope for this post, we recommend checking out their nice docs.

In short, Sigstore provides an automatable workflow to match the distributed Open Source development model. The workflow specifies how to digitally sign and verify artifacts which in our case are Kubewarden Policies. It also provides a transparency log to monitor such signatures. The workflow allows to sign artifacts with traditional Public-Private key pairs, or in Keyless mode.

In the keyless mode, signatures are created with short-lived certs using an OpenID Connect (OIDC) service as identity provider. Those short-lived certs are issued by Sigstore’s PKI infrastructure, Fulcio.

Fulcio acts as a Registration Authority, authenticating that you are who you say you are by using an OIDC service (SSO via your own Okta instance, GitHub, Google, etc). Once authenticated, Fulcio acts as a Certificate Authority, issuing the short-lived certificate that you will use to sign artifacts.

These short-lived certificate include the identity information obtained by the OIDC service inside of the certificate extensions attributes. The private key associated with the certificate is then used to sign the object while the certificate itself has a public key that can be used to verify the signatures produced by the private key.

The certificates issued by Fulcio have a short validity because they are generated to be short-lived. This is an interesting property that we will discuss shortly.

Once the artifact is signed, the proof of signature is then sent to an append-only transparency log, Rekor, that allows monitoring of such signatures and protects against timing attacks. The proof of signature is signed by Rekor and this information is stored inside of the signature itself.

By using the timestamp found inside of the proof of signature, the verifier can ensure that the signing action has been performed during the limited lifetime of the certificate.

Due to this the private key associated with the certificate doesn’t need to be safely stored. It can be discarded at the end of the signature process. An attacker could even reuse the private key, but the signature would not be considered valid if used outside of the limited lifetime of the certificate.

Nobody – developers, project leads, or sponsors, needs to have access to keys and Sigstore never obtains your private key. Hence the term keyless. Additionaly, one doesn’t need expensive infra for creating and validating signatures.

Since there’s no need for key secrets and the like in Keyless mode, it is easily automated inside CIs and implemented and monitored in the open. This is one of the reasons that makes it so interesting.

Building a Rust Sigstore stack

The policy server and libs within the Kubewarden stack are responsible for instantiating and running policies. They are written in Rust and therefore, we needed a good Rust implementation of Sigstore features. Since there weren’t any available, we are glad to announce that we have created a new crate, sigstore-rs, under the Sigstore org. This was done in an upstream-first manner and we’re happy to report that it is now taking a life of its own.

Securing kubewarden policies

As you may already know, Kubewarden Policies are small wasm-compiled binaries (~1 to ~6 MB) that are distributed via container registries as OCI artifacts. Let us see how Kubewarden protects policies against Secure Supply Chain attacks by signing and verifying them before they run.

Signing your Kubewarden Policy

Signing a Policy is done in the same way as signing a container image. This means just adding a new layer within the signature to a dedicated signature object managed by Sigstore. In the Sigstore workflow, one can sign with Public-Private keypair, or Keyless. Both can also add key=value annotations to the signatures.

The Public-Private key pair signing is straightforward, using sigstore/cosign:

$ COSIGN_PASSWORD=yourpass cosign generate-key-pair

Private key written to cosign.key
Public key written to cosign.pub

$ COSIGN_PASSWORD=yourpass cosign sign \
  --key cosign.key --annotations blog=yes \
  ghcr.io/kubewarden/policies/user-group-psp:v0.2.0

Pushing signature to: ghcr.io/kubewarden/policies/user-group-psp

The Keyless mode is more interesting:

$ COSIGN_EXPERIMENTAL=1 cosign sign \
  --annotations blog=yes \
  ghcr.io/kubewarden/policies/user-group-psp:v0.2.0

Generating ephemeral keys...
Retrieving signed certificate...
Your browser will now be opened to:
https://oauth2.sigstore.dev/auth/auth?access_type=online&client_id=sigstore&code_challenge=(...)
Successfully verified SCT...
tlog entry created with index: (...)
Pushing signature to: ghcr.io/viccuad/policies/volumes-psp

What happened? cosign prompted us for an OpenID Connect provider on the browser, which authenticated us, and instructed Fulcio to generate an ephemeral private key and a x509 certificate with the associated public key.

If this were to happen in a CI, the CI would provide the OIDC identity token in its environment. cosign has support for detecting some automated environments and producing an identity token. Currently that covers GitHub And Google Cloud, but one can always use a flag.

We shall now detail how it works for policies built by the Kubewarden team in GitHub Actions. First, we call cosign, and sign the policy in keyless mode. The certificate issued by Fulcio includes the following details about the identity of the signer inside of its x503v extensions:

  • An issuer, telling you who certified the image:
    https://token.actions.githubusercontent.com
    
  • A subject related to the specific workflow and worker, for example:
    https://github.com/kubewarden/policy-secure-pod-images/.github/workflows/release.yml@refs/heads/main
    

If you are curious, and want to see the contents of one of the certificates issued by Fulcio, install the crane cli tool, jq and openssl and execute the following command:

crane manifest \
  $(cosign triangulate ghcr.io/kubewarden/policies/pod-privileged:v0.1.10) | \
  jq -r '.layers[0].annotations."dev.sigstore.cosign/certificate"' | \
  openssl x509 -noout -text -in -

The end result is the same. A signature is added as a new image layer of a special OCI object that is created and managed by Sigstore. You can view those signatures as added layers,with sha256-<sha>.sig in the repo.

Even better, you can use tools like crane or the CLI tool, kwctl to perform the same action as demonstrated below.

kwctl pull <policy_url>; kwctl inspect <policy_url>

If you want to verify policies locally, you now can use kwctl verify:

$ kwctl verify --github-owner kubewarden registry://ghcr.io/kubewarden/policies/pod-privileged:v0.1.10
$ echo $?
0

When testing policies locally with kwctl pull or kwctl run, you can also enable signature verification by using any verification related flag. For example:

$ kwctl pull --github-owner kubewarden registry://ghcr.io/kubewarden/policies/pod-privileged:v0.1.10
$ echo $?
0

All the policies from the Kubewarden team are signed in keyless mode by the workers of the CI job, specifically the CI job of Github. We don’t leave certs around and they are verifiable by third parties.

Enforcing signature verification for instantiated Kubewarden policies

You can now configure PolicyServers to enforce that all policies being run need to be signed. When deploying Kubewarden via Helm charts, you can do it so for the default PolicyServer installed by kubewarden-defaults chart.

For this, the PolicyServers have a new spec.VerificationConfig argument. Here, you can put the name of a ConfigMap containing a “verification config”, to specify the needed signatures.

You can obtain a default verification config for policies from the Kubewarden team with:

$ kwctl scaffold verification-config
# Default Kubewarden verification config
#
# With this config, the only valid policies are those signed by Kubewarden
# infrastructure.
#
# This config can be saved to its default location (for this OS) with:
#   kwctl scaffold verification-config > /home/youruser/.config/kubewarden/verification-config.yml
#
# Providing a config in the default location enables Sigstore verification.
# See https://docs.kubewarden.io for more Sigstore verification options.
---
apiVersion: v1
allOf:
  - kind: githubAction
    owner: kubewarden
    repo: ~
    annotations: ~
anyOf: ~

The verification config format has several niceties, see its reference docs. For example, kind: githubAction with owner and repo, instead of checking the issuer and subject strings blindly. Or anyOf a list of signatures, with anyOf.atLeast a number of them: this allows for accepting at least a specific number of signatures, and makes migration between signatures in your cluster easy. It’s the little things ?.

If you want support for other CIs (such as GitLab, Jenkins, etc) drop us a note on Slack or file a GitHub issue!

Once you have crafted your verification config, create your ConfigMap:

$ kubectl create configmap my-verification-config \
  --from-file=verification-config=./my-verification-config.yml \
  --namespace=kubewarden

And pass it to your PolicyServers in spec.VerificationConfig, or if using the default PolicyServer from the kubewarden-defaults chart, set it there with for example:

$ helm upgrade --set policyServer.verificationConfig=my-verification-config \
  --wait -n kubewarden kubewarden-defaults ./kubewarden-defaults

Recap

Using cosign sign policy authors can sign or author their policies. All the policies owned by the Kubewarden team have already been signed in this way.

With kwctl verify, operators can verify them, and with kwctl inspect (and other tools such as crane manifest), operators can inspect the signatures. We can keep using kwctl pull and kwctl run to test policies locally as in the past, plus now verify their signatures too. Once we are satisfied, we can deploy Kubewarden PolicyServers so they enforce those signatures. If we want, the same verification config format can be used for kwctl and the cluster stack.

This way we are sure that the policies come from their stated authors, and have not been tampered with. Phew!

We, the Kubewarden team, are curious on how you approach this. What workflows are you interested in? What challenges do you have? Drop us a word in our Slack channel or foile a GitHub issue!

There are more things to secure in the chain and we’re excited for what lays ahead. Stay tuned for more blog entries on how to secure your supply chain with Kubewarden!

Managing Sensitive Data in Kubernetes with Sealed Secrets and External Secrets Operator (ESO)

Thursday, 31 March, 2022

Having multiple environments that can be dynamically configured has become akin to modern software development. This is especially true in an enterprise context where the software release cycles typically consist of separate compute environments like dev, stage and production. These environments are usually distinguished by data that drives the specific behavior of the application.

For example, an application may have three different sets of database credentials for authentication (AuthN) purposes. Each set of credentials would be respective to an instance for a particular environment. This approach essentially allows software developers to interact with a developer-friendly database when carrying out their day-to-day coding. Similarly, QA testers can have an isolated stage database for testing purposes. As you would expect, the production database environment would be the real-world data store for end-users or clients.

To accomplish application configuration in Kubernetes, you can either use ConfigMaps or Secrets. Both serve the same purpose, except Secrets, as the name implies, are used to store very sensitive data in your Kubernetes cluster. Secrets are native Kubernetes resources saved in the cluster data store (i.e., etcd database) and can be made available to your containers at runtime.

However, using Secrets optimally isn’t so straightforward. Some inherent risks exist around Secrets. Most of which stem from the fact that, by default, Secrets are stored in a non-encrypted format (base64 encoding) in the etcd datastore. This introduces the challenge of safely storing Secret manifests in repositories privately or publicly. Some security measures that can be taken include: encrypting secrets, using centralized secrets managers, limiting administrative access to the cluster, enabling encryption of data at rest in the cluster datastore and enabling TLS/SSL between the datastore and Pods.

In this post, you’ll learn how to use Sealed Secrets for “one-way” encryption of your Kubernetes Secrets and how to securely access and expose sensitive data as Secrets from centralized secret management systems with the External Secrets Operator (ESO).

 

Using Sealed Secrets for one-way encryption

One of the key advantages of Infrastructure as Code (IaC) is that it allows teams to store their configuration scripts and manifests in git repositories. However, because of the nature of Kubernetes Secrets, this is a huge risk because the original sensitive credentials and values can easily be derived from the base64 encoding format.

``` yaml

apiVersion: v1

kind: Secret

metadata:

  name: my-secret

type: Opaque

data:

  username: dXNlcg==

  password: cGFzc3dvcmQ=

```

Therefore, as a secure workaround, you can use Sealed Secrets. As stated above, Sealed Secrets allow for “one-way” encryption of your Kubernetes Secrets and can only be decrypted by the Sealed Secrets controller running in your target cluster. This mechanism is based on public-key encryption, a form of cryptography consisting of a public key and a private key pair. One can be used for encryption, and only the other key can be used to decrypt what was encrypted. The controller will generate the key pair, publish the public key certificate to the logs and expose it over an HTTP API request.

To use Sealed Secrets, you have to deploy the controller to your target cluster and download the kubeseal CLI tool.

  • Sealed Secrets Controller – This component extends the Kubernetes API and enables lifecycle operations of Sealed Secrets in your cluster.
  • kubeseal CLI Tool – This tool uses the generated public key certificate to encrypt your Secret into a Sealed Secret.

Once generated, the Sealed Secret manifests can be stored in a git repository or shared publicly without any ramifications. When you create these Sealed Secrets in your cluster, the controller will decrypt it and retrieve the original Secret, making it available in your cluster as per norm. Below is a step-by-step guide on how to accomplish this.

To carry out this tutorial, you will need to be connected to a Kubernetes cluster. For a lightweight solution on your local machine, you can use Rancher Desktop.

To download kubeseal, you can select the binary for your respective OS (Linux, Windows, or Mac) from the GitHub releases page. Below is an example for Linux.

``` bash

wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.17.3/kubeseal-linux-amd64 -O kubeseal

sudo install -m 755 kubeseal /usr/local/bin/kubeseal

```

Installing the Sealed Secrets Controller can either be done via Helm or kubectl. This example will use the latter. This will install Custom Resource Definitions (CRDs), RBAC resources, and the controller.

``` bash

wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.16.0/controller.yaml

kubectl apply -f controller.yaml

```

You can ensure that the relevant Pod is running as expected by executing the following command:

``` bash

kubectl get pods -n kube-system | grep sealed-secrets-controller

```

Once it is running, you can retrieve the generated public key certificate using kubeseal and store it on your local disk.

``` bash

kubeseal --fetch-cert > public-key-cert.pem

```

You can then create a Secret and seal it with kubeseal. This example will use the manifest detailed at the start of this section, but you can change the key-value pairs under the data field as you see fit.

``` bash

kubeseal --cert=public-key-cert.pem --format=yaml < secret.yaml > sealed-secret.yaml

```

The generated output will look something like this:

``` yaml

apiVersion: bitnami.com/v1alpha1

kind: SealedSecret

metadata:

  creationTimestamp: null

  name: my-secret

  namespace: default

spec:

  encryptedData:

    password: AgBvA5WMunIZ5rF9...

    username: AgCCo8eSORsCbeJSoRs/...

  template:

    data: null

    metadata:

      creationTimestamp: null

      name: my-secret

      namespace: default

    type: Opaque

```

This manifest can be used to create the Sealed Secret in your cluster with kubectl and afterward stored in a git repository without the concern of any individual accessing the original values.

``` bash

kubectl create -f sealed-secret.yaml

```

You can then proceed to review the secret and fetch its values.

``` bash

kubectl get secret my-secret -o jsonpath="{.data.user}" | base64 --decode

kubectl get secret my-secret -o jsonpath="{.data.password}" | base64 --decode

```

 

Using External Secrets Operator (ESO) to access Centralized Secrets Managers

Another good practice for managing your Secrets in Kubernetes is to use centralized secrets managers. Secrets managers are hosted third-party platforms used to store sensitive data securely. These platforms typically offer encryption of your data at rest and expose an API for lifecycle management operations such as creating, reading, updating, deleting, or rotating secrets. In addition, they have audit logs for trails and visibility and fine-grained access control for operations of stored secrets. Examples of secrets managers include HashiCorp Vault, AWS Secrets Manager, IBM Secrets Manager, Azure Key Vault, Akeyless, Google Secrets Manager, etc. Such systems can put organizations in a better position when centralizing the management, auditing, and securing secrets. The next question is, “How do you get secrets from your secrets manager to Kubernetes?” The answer to that question is the External Secrets Operator (ESO).

The External Secrets Operator is a Kubernetes operator that enables you to integrate and read values from your external secrets management system and insert them as Secrets in your cluster. The ESO extends the Kubernetes API with the following main API resources:

  • SecretStore – This is a namespaced resource that determines how your external Secret will be accessed from an authentication perspective. It contains references to Secrets that have the credentials to access the external API.
  • ClusterSecretStore – As the name implies, this is a global or cluster-wide SecretStore that can be referenced from all namespaces to provide a central gateway to your secrets manager.
  • ExternalSecret – This resource declares the data you want to fetch from the external secrets manager. It will reference the SecretStore to know how to access sensitive data.

Below is an example of how to access data from AWS Secrets Manager and make it available in your K8s cluster as a Secret. As a prerequisite, you will need to create an AWS account. A free-tier account will suffice for this demonstration.

You can create a secret in AWS Secrets Manager as the first step. If you’ve got the AWS CLI installed and configured with your AWS profile, you can use the CLI tool to create the relevant Secret.

``` bash

aws secretsmanager create-secret --name <name-of-secret> --description <secret-description> --secret-string <secret-value> --region <aws-region>

```

Alternatively, you can create the Secret using the AWS Management Console.

As you can see in the images above, my Secret is named “alias” and has the following values:

``` json

{

  "first": "alpha",

  "second": "beta"

}

```

After you’ve created the Secret, create an IAM user with programmatic access and safely store the generated AWS credentials (access key ID and a secret access key). Make sure to limit this user’s service and resource permissions in a custom IAM Policy.

``` json

{

  "Version": "2012-10-17",

  "Statement": [

    {

      "Effect": "Allow",

      "Action": [

        "secretsmanager:GetResourcePolicy",

        "secretsmanager:GetSecretValue",

        "secretsmanager:DescribeSecret",

        "secretsmanager:ListSecretVersionIds"

      ],

      "Resource": [

        "arn:aws:secretsmanager:<aws-region>:<aws-account-id>:secret:<secret-name>",

      ]

    }

  ]

}

```

Once that is done, you can install the ESO with Helm.

``` bash

helm repo add external-secrets https://charts.external-secrets.io



helm install external-secrets \

   external-secrets/external-secrets \

    -n external-secrets \

    --create-namespace

```

Next, you can create the Secret that the SecretStore resource will reference for authentication. You can optionally seal this Secret using the approach demonstrated in the previous section that deals with encrypting Secrets with kubeseal.

``` yaml

apiVersion: v1

kind: Secret

metadata:

  name: awssm-secret

type: Opaque

data:

  accessKeyID: PUtJQTl11NKTE5...

  secretAccessKey: MklVpWFl6f2FxoTGhid3BXRU1lb1...

```

If you seal your Secret, you should get output like the code block below.

``` yaml

apiVersion: bitnami.com/v1alpha1

kind: SealedSecret

metadata:

  creationTimestamp: null

  name: awssm-secret

  namespace: default

spec:

  encryptedData:

    accessKeyID: Jcl1bC6LImu5u0khVkPcNa==...

    secretAccessKey: AgBVMUQfSOjTdyUoeNu...

  template:

    data: null

    metadata:

      creationTimestamp: null

      name: awssm-secret

      namespace: default

    type: Opaque

```

Next, you need to create the SecretStore.

``` yaml

apiVersion: external-secrets.io/v1alpha1

kind: SecretStore

metadata:

  name: awssm-secretstore

spec:

  provider:

    aws:

      service: SecretsManager

      region: eu-west-1

      auth:

        secretRef:

          accessKeyIDSecretRef:

            name: awssm-secret

            key: accessKeyID

          secretAccessKeySecretRef:

            name: awssm-secret

            key: secretAccessKey

```

The last resource to be created is the ExternalSecret.

``` yaml

apiVersion: external-secrets.io/v1alpha1

kind: ExternalSecret

metadata:

  name: awssm-external-secret

spec:

  refreshInterval: 1440m

  secretStoreRef:

    name: awssm-secretstore

    kind: SecretStore

  target:

    name: alias-secret

    creationPolicy: Owner

  data:

  - secretKey: first

    remoteRef:

      key: alias

      property: first

  - secretKey: second

    remoteRef:

      key: alias

      property: second

```

You can then chain the creation of these resources in your cluster with the following command:

``` bash

kubectl create -f sealed-secret.yaml,secret-store.yaml,external-secret.yaml

```

After this execution, you can review the results using any of the approaches below.

``` bash

kubectl get secret alias-secret -o jsonpath="{.data.first}" | base64 --decode

kubectl get secret alias-secret -o jsonpath="{.data.second}" | base64 --decode

```

You can also create a basic Job to test its access to these external secrets values as environment variables. In a real-world scenario, make sure to apply fine-grained RBAC rules to Service Accounts used by Pods. This will limit the access that Pods have to the external secrets injected into your cluster.

``` yaml

apiVersion: batch/v1

kind: Job

metadata:

  name: job-with-secret

spec:

  template:

    spec:

      containers:

        - name: busybox

          image: busybox

          command: ['sh', '-c', 'echo "First comes $ALIAS_SECRET_FIRST, then comes $ALIAS_SECRET_SECOND"']

          env:

            - name: ALIAS_SECRET_FIRST

              valueFrom:

                secretKeyRef:

                  name: alias-secret

                  key: first

            - name: ALIAS_SECRET_SECOND

              valueFrom:

                secretKeyRef:

                  name: alias-secret

                  key: second

      restartPolicy: Never

  backoffLimit: 3

```

You can then view the logs when the Job has been completed.

Conclusion

In this post, you learned that using Secrets in Kubernetes introduces risks that can be mitigated with encryption and centralized secrets managers. Furthermore, we covered how Sealed Secrets and the External Secrets Operator can be used as tools for managing your sensitive data. Alternative solutions that you can consider for encryption and management of your Secrets in Kubernetes are Mozilla SOPS and Helm Secrets. If you’re interested in a video walk-through of this post, you can watch the video below.

Let’s continue the conversation! Join the SUSE & Rancher Community, where you can further your Kubernetes knowledge and share your experience.