Canary Releases with Rancher Continuous Delivery

Share
Share

Rancher Continuous Delivery, available since Rancher version 2.5.x, brings the ability to perform GitOps at scale on Rancher-managed clusters. Continuous Delivery, powered by Fleet,  allows users to manage the state of their clusters using a GitOps based approach.

Canary release is a popular technique used by software developers to release a new version of the application to a subset of users, and based on metrics such as availability, latency or custom metrics, can be scaled up to serve more users.

In this blog, we’ll explore using Continuous Delivery to perform canary releases for your application workloads.

The actual canary release will be performed by a project named Flagger. Flagger works as a Kubernetes operator. It allows users to specify a custom object that informs Flagger to watch a deployment and create additional primary and canary deployments. As part of this blog, we’ll use Flagger with Istio as the service mesh.

In a nutshell, when we create a deployment, Flagger clones the deployment to a primary deployment.  Then it then amends the service associated with the original deployment to point to this new primary deployment. The primary deployment itself gets scaled down to 0.

Flagger uses istio virtualservices to perform the actual canary release. When a new version of the app is deployed, Flagger scales the original deployment back to the original spec and associates a canary service to point to the deployment.

Now a percentage of traffic gets routed to this canary service. Based on predefined metrics, Flagger starts routing more and more traffic to this canary service. Once 100 percent of the traffic has been migrated to the canary service, the primary deployment is recreated with the same spec as the original deployment.

Next, the virtualservice is updated to route 100 percent of traffic back to the primary service. After this traffic switch, the original deployment is scaled back to 0 and the Flagger operator waits and monitors subsequent deployment updates.

Get Started with Flagger and Perform a Canary Release

To get started with Flagger, we will perform the following:

  1. Set up monitoring and istio
  2. Setup Flagger and flagger-loadtest
  3. Deploy a demo application and perform a canary release

1. Set Up Monitoring and Istio

To setup monitoring and istio, we will set up a couple of ClusterGroups in Continuous Delivery

monitoring

apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
  name: monitoring
  namespace: fleet-default
spec:
  selector:
    matchLabels:
      monitoring: enabled

istio

apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
  name: istio
  namespace: fleet-default
spec:
  selector:
    matchLabels:
      istio: enabled

Now we’ll set up our monitoring and istio GitRepos to point to use these ClusterGroups

monitoring repo

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: monitoring
  namespace: fleet-default
spec:
  branch: master
  insecureSkipTLSVerify: false
  paths:
  - monitoring
  - monitoring-crd
  repo: https://github.com/ibrokethecloud/core-bundles
  targets:
  - clusterGroup: monitoring

istio repo

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: istio
  namespace: fleet-default
spec:
  branch: master
  insecureSkipTLSVerify: false
  paths:
  - istio
  - kiali
  repo: https://github.com/ibrokethecloud/core-bundles
  targets:
  - clusterGroup: istio

To trigger the deployment, we’ll assign a cluster to these ClusterGroups using the desired labels 

 

In a few minutes, the monitoring and istio apps should be installed on the specified cluster 

2. Set up Flagger and flagger-loadtest

As part of installing Flagger, we will also install flagger-loadtest to help generate requests on our workload.

Note: Flagger-loadtest is only needed for this demo. In a real-world scenario, we assume that your application will serve real traffic. Flagger will use the metrics from the real traffic to start the switching.

We will set up a ClusterGroup canary as follows

apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
  name: canary
  namespace: fleet-default
spec:
  selector:
    matchLabels:
      canary: enabled

Now we can set up the flagger GitRepo to consume this ClusterGroup

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: flagger
  namespace: fleet-default
spec:
  branch: master
  insecureSkipTLSVerify: false
  paths:
  - flagger
  - flagger-loadtest
  repo: https://github.com/ibrokethecloud/user-bundles
  targets:
  - clusterGroup: canary

As we saw earlier, to trigger the deployment we will assign the cluster to the Flagger ClusterGroup

In a few minutes, the Flagger and flagger-loadtest helm charts will be deployed to this cluster

Note that while deploying Flagger, it copies all the labels and annotations from the source deployment to the canary and primary deployments. Continuous Delivery uses labels on objects to reconcile and identify which underlying Bundle they belong to. Flagger trips this up and in the default setup, Continuous Delivery will report additional primary and canary deployments that are not in the GitRepo.

To avoid this, the includeLabelPrefix setting in the Flagger helm chart is passed and set to dummy to instruct Flagger to only include labels that have dummy in their prefix.

This helps us work around the Continuous Delivery reconciliation logic.

The fleet.yaml looks like this

defaultNamespace: istio-system
helm:
  releaseName: flagger
  repo: https://flagger.app
  chart: flagger
  version: 1.6.2
  values:
    crd.create: true
    meshProvider: istio
    metricsServer: http://rancher-monitoring-prometheus.cattle-monitoring-system:9090
    includeLabelPrefix: dummy
diff:
  comparePatches:
  - apiVersion: apps/v1
    kind: Deployment
    name: flagger
    namespace: istio-system
    operations:
    - {"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits/cpu"}
    - {"op": "remove", "path": "/spec/template/spec/containers/0/volumeMounts"}
    - {"op": "remove", "path": "/spec/template/spec/volumes"}

With all the base services set up, we are ready to deploy our workload.

3. Deploy a Demo Application and Perform a Canary Release

Now we’ll now add the canary-demo-app GitRepo to target the canary ClusterGroup

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: canary-demo-app
  namespace: fleet-default
spec:
  branch: master
  insecureSkipTLSVerify: false
  paths:
  - canary-demo-app
  repo: https://github.com/ibrokethecloud/user-bundles
  targets:
  - clusterGroup: canary

This will trigger the deployment of the demo app to the canary-demo namespace.

(⎈ |digitalocean:canary-demo)
~
▶ kubectl get deployment
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
fleet-simple-app           0/0     0            0           80s
fleet-simple-app-primary   1/1     1            1           80s
(⎈ |digitalocean:canary-demo)

The Canary object controlling the behavior of the release is as follows:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: fleet-simple-app
  namespace: canary-demo
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fleet-simple-app
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.loadtester/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 1m -q 10 -c 2 http://fleet-simple-app-canary.canary-demo:8080"

The key item in this is the webhook to perform the load test to generate enough metrics for Flagger to be able to start switching traffic.

We should also be able to see the status of the canary object as follows:

(⎈ |digitalocean:canary-demo)
~
▶ kubectl get canary
NAME               STATUS        WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Initialized   0        2021-03-22T06:25:17Z

We can now trigger a canary release by updating the GitRepo for canary-demo-app with a new version of the image for the deployment

In a few minutes, we should see the original deployment scaled up with the new image from the GitRepo. In addition, the canary object moves to a Progressing state and the weight of the canary release changes.

▶ kubectl get deploy
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
fleet-simple-app           1/1     1            1           6m5s
fleet-simple-app-primary   1/1     1            1           6m5s
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get canary
NAME               STATUS        WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Progressing   0        2021-03-22T06:30:17Z
▶ kubectl get canary
NAME               STATUS        WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Progressing   10       2021-03-22T06:31:17Z

The progressing canary also corresponds to the changing weight in the istio virtualservice.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  creationTimestamp: "2021-03-22T06:25:17Z"
  generation: 2
  managedFields:
  - apiVersion: networking.istio.io/v1alpha3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"6ae2a7f1-6949-484b-ab48-c385e9827a11"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        .: {}
        f:gateways: {}
        f:hosts: {}
        f:http: {}
    manager: flagger
    operation: Update
    time: "2021-03-22T06:25:17Z"
  name: fleet-simple-app
  namespace: canary-demo
  ownerReferences:
  - apiVersion: flagger.app/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Canary
    name: fleet-simple-app
    uid: 6ae2a7f1-6949-484b-ab48-c385e9827a11
  resourceVersion: "10783"
  uid: b5aaaf34-7b16-4ba9-972c-b60756943da8
spec:
  gateways:
  - mesh
  hosts:
  - fleet-simple-app
  http:
  - route:
    - destination:
        host: fleet-simple-app-primary
      weight: 90
    - destination:
        host: fleet-simple-app-canary
      weight: 10

In a bit, we should see Flagger promoting the canary release and the primary deployment being switched to the new version.

▶ kubectl get canary
NAME               STATUS      WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Promoting   0        2021-03-22T06:37:17Z

▶ kubectl get pods
NAME                                        READY   STATUS    RESTARTS   AGE
fleet-simple-app-64cd54dfd-tkk8v            2/2     Running   0          9m2s
fleet-simple-app-primary-854d4d84b5-qgfc8   2/2     Running   0          74s

This is following by the finalization of the deployment and we should see the original deployment being scaled down.

▶ kubectl get canary
NAME               STATUS       WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Finalising   0        2021-03-22T06:38:17Z
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get pods
NAME                                        READY   STATUS        RESTARTS   AGE
fleet-simple-app-64cd54dfd-tkk8v            2/2     Terminating   0          9m53s
fleet-simple-app-primary-854d4d84b5-qgfc8   2/2     Running       0          2m5s
▶ kubectl get deploy
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
fleet-simple-app           0/0     0            0           15m
fleet-simple-app-primary   1/1     1            1           15m

Post this the canary object should have been successful

▶ kubectl get canary
NAME               STATUS      WEIGHT   LASTTRANSITIONTIME
fleet-simple-app   Succeeded   0        2021-03-22T06:39:17Z

That’s it! In summary, in this blog we’ve shown you how to use Continuous Delivery to leverage third party tools like Flagger to perform canary releases for our workload. What tools are you using for Continuous Delivery? Head over to the SUSE & Rancher Community and join the conversation!

Share
(Visited 1 times, 1 visits today)
Gaurav Mehta
1,794 views