Progressive Delivery on Kubernetes with Argo Rollouts

**Slides: https://devopstrain.pro/rollouts**

*Slide-generation engine borrowed from [container.training](https://github.com/jpetazzo/container.training)*

]

.debug[[workshop/title.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/title.md)]
---
## Introduction

- This presentation was created by [Ant Weiss](https://twitter.com/antweiss) to support 
  instructor-led workshops.

- We included as much information as possible in these slides

- Most of the information this workshop is based on is public knowledge and can also be accessed through [Argo Rollouts official documents and tutorials](https://argoproj.github.io/argo-rollouts/)

![image alt ><](images/argo-rollouts.png)
.debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)]
---

## Training environment

- This is a hands-on training with exercises and examples

- We assume that you have access to a Kubernetes cluster

- The training labs for today's session were generously sponsored by [Otomato Software Delivery](https://otomato.io)
- We will be using [k3d](https://k3d.io) to run these clusters

.debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)]
---

## Getting started

- Get the source code and the slides for this workshop:

- On your Strigo VM:

```bash
  git clone https://github.com/otomato-gh/rollouts.workshop.git
  cd rollouts.workshop
  ./scripts/setup_rollouts.sh
  # enter new shell for kubectl completion
  sudo su - ${USER}
  ```

]

- This will install the Argo Rollouts controller in your k3d cluster

- And the Argo Rollouts kubectl plugin. It is optional, but is convenient for managing and visualizing rollouts from the command line.

.debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)]
---

## The Argo Rollouts kubectl plugin

- The commands for the plugin require a lot of typing:

.exercise[ 
  ```bash
  kubectl argo rollouts get rollouts dummy
  # compare that to:
  kubectl get rollouts dummy
  ``` 
]

- So instead let's define an alias:

.exercise[
  ```bash
  alias kar="kubectl argo rollouts"
  ```
  ```bash
  kar version
  ```
]
.debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)]
---

## Chapter 1

- [Progressive Delivery Defined](#toc-progressive-delivery-defined)

## Chapter 2

- [Argo Rollouts Basic Concepts](#toc-argo-rollouts-basic-concepts)

- [Our First Rollout](#toc-our-first-rollout)

- [The Rollouts Dashboard](#toc-the-rollouts-dashboard)

## Chapter 3

- [Traffic Management](#toc-traffic-management)

## Chapter 4

- [Analyzing Our Canaries](#toc-analyzing-our-canaries)

## Chapter 5

- [Summing It All Up](#toc-summing-it-all-up)

.debug[[workshop/toc.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/toc.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/1.jpg)]

---

Progressive Delivery Defined

.nav[
[Previous section](#toc-)
|
[Back to table of contents](#toc-module-1)
|
[Next section](#toc-argo-rollouts-basic-concepts)
]

---
# Progressive Delivery Defined

**Progressive Delivery** is the collective definition 
for a set of deployment techniques that allow for gradual,
reliable and low-stress release of new software versions into production environments.

Argo Rollouts allow us to automate these techniques.

Techniques we will be looking at today are:

- Blue-Green

- Canary deployments

- Traffic Mirroring

- Experiments
.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---

## Blue-Green

A Blue-Green deployment (sometimes referred to as a Red-Black) has both the new and old version of the application deployed at the same time. During this time, only the old version of the application will receive production traffic. This allows the developers to run tests against the new version before switching the live traffic to the new version.

![img alt=><](images/bluegreen.png)

.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---

## Canary Deployments

**Canary Deployments** is the process in which a new version that is released to production gets only a tiny percent of actual production traffic. While the rest of traffic continues to be served by the old version. This may cause a minimal, sufferable service disruption. If the new version functions fine - we gradually switch more traffic over to it from the old version. Until all traffic is served by the new version and the old version can be retired.

![img alt=><](images/canary.png)

.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---

## Traffic Mirroring
*Traffic Mirroring* (or traffic shadowing) is more of a testing technique whereas we release the new version to production and channel all the production traffic to it.
This happens in parallel to serving this traffic by the old version. 
No responses are sent back from the new version. This allows us to test the new version with full production traffic and data without impacting our users.

![img alt=><](images/mirroring.png)

.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---
## Experiments

**Experiments** AKA A/B Testing is also a testing technique whereas we release 2 or more versions to production and run the for a specifed period in order to analyse their performance.

This can be a technical step in the process of either blue-green or canary rollout.

Or it can be a stand-alone technique for verifying feature completeness of a version before release.

.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---

## How Can Rollouts Help

- With Argo Rollouts we can define and manage precisely:
  -  What versions of a service are rolled out
  -  The percentage of traffic routed to each version
  -  How many stages the rollout has
  -  How long will each stage take
  -  The criteria for promoting to the next stage
  -  The experiments and analysis to run at each stage

- But first let's learn the basics how Argo Rollouts work

.debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/2.jpg)]

---

Argo Rollouts Basic Concepts

.nav[
[Previous section](#toc-progressive-delivery-defined)
|
[Back to table of contents](#toc-module-2)
|
[Next section](#toc-our-first-rollout)
]

---
# Argo Rollouts Basic Concepts

## A Rollout

A **Rollout** is a Kubernetes workload resource which is equivalent to a Kubernetes Deployment object. It is intended to replace a Deployment object in scenarios when more advanced deployment or progressive delivery functionality is needed. A Rollout provides the following features which a Kubernetes Deployment cannot:

- blue-green deployments
 - canary deployments
 - integration with ingress controllers and service meshes for advanced traffic routing
 - integration with metric providers for blue-green & canary analysis
 - automated promotion or rollback based on successful or failed metrics

.debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)]
---
## Ingress/Service

This is the mechanism that allows traffic from live users to enter your cluster and be redirected to the appropriate version. Argo Rollouts use the standard Kubernetes *Service* resource, but with some extra metadata needed for management.

Argo Rollouts is very flexible on networking options. First of all you can have different services during a Rollout, that go only to the new version, only to the old version or both. Specifically for Canary deployments, Argo Rollouts supports several service mesh and ingress solutions for splitting traffic with specific percentages instead of simple balancing based on pod counts and it is possible to use multiple routing providers simultaneously.

.debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)]
---

## An Analysis

Argo Rollouts provides several ways to perform analysis to drive progressive delivery. This allows us to  achieve various forms of progressive delivery, varying the point in time analysis is performed, it's frequency, and occurrence.

An analysis is enabled by the following custom resources:

- An *AnalysisTemplate* is a template spec which defines how to perform a canary analysis, such as the metrics which it should perform, its frequency, and the values which are considered successful or failed. AnalysisTemplates may be parameterized with input values

- An *AnalysisRun* is an instantiation of an AnalysisTemplate. AnalysisRuns are like Jobs in that they eventually complete. Completed runs are considered Successful, Failed, or Inconclusive, and the result of the run affect if the Rollout's update will continue, abort, or pause, respectively.

.debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)]
---
## An Experiment
 
An *Experiment* is a limited run of one or more ReplicaSets for the purposes of analysis. Experiments typically run for a pre-determined duration, but can also run indefinitely until stopped.

Experiments may reference an AnalysisTemplate to run during or after the experiment.

The canonical use case for an Experiment is to start a baseline and canary deployment in parallel, and compare the metrics produced by the baseline and canary pods for an equal comparison.

.debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)]
---

## Explore Argo Rollouts

Let's see what Argo Rollouts components we have in our cluster
.exercise[
  ```bash
  kubectl get all -n argo-rollouts
  ```
]

And the custom resources used to configure the rollouts.

.debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/3.jpg)]

---

Our First Rollout

.nav[
[Previous section](#toc-argo-rollouts-basic-concepts)
|
[Back to table of contents](#toc-module-2)
|
[Next section](#toc-the-rollouts-dashboard)
]

---
# Our First Rollout

Rollouts replace the Deployments in our K8s cluster.

To deploy our very first version we will deploy a Rollout and a matching Service to access it.

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---
## Deploy our first rollout
.exercise[
- Deploy the Rollout
  ```bash
  kubectl apply -f ~/rollouts.workshop/code/rollout.yaml
  ```
- Deploy the Service
  ```bash
  kubectl apply -f ~/rollouts.workshop/code/service.yaml
  ```
]
--

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---
## What did we roll out?

- Let's look at rollout.yaml

```yaml
spec:
  replicas: 5
  strategy:
    canary:     # The strategy is Canary Deployment
      steps:
      - setWeight: 20 # We first send 20% to the canary
      - pause: {}     # We wait for a manual promotion
      - setWeight: 40 # Promote canary to 40%
      - pause: {duration: 10} # Wait 10 sec
      - setWeight: 60 # Promote canary to 60%
      - pause: {duration: 10}
      - setWeight: 80 # Promote canary to 80%
      - pause: {duration: 10} # Wait 10 sec
      # Finally canary replaces previous version
```

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

## Watching the Rollout

Initial creations of any Rollout will immediately scale up the replicas to 100% (skipping any canary upgrade steps, analysis, etc...) since there was no upgrade that occurred.

The Argo Rollouts **kubectl plugin** allows you to visualize the Rollout, its related resources (ReplicaSets, Pods, AnalysisRuns), and presents live state changes as they occur. To watch the rollout as it deploys, run the get rollout --watch command from plugin:

.exercise[
```bash
kubectl argo rollouts get rollout rollouts-demo --watch
 # or rather
kar get rollout rollouts-demo --watch
```
]

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

## Updating the Rollout

- Now let's update the Rollout and see the magic of the staged deployment in action:

.exercise[
  ```bash
  # Run in a new shell while "watch" is running
  kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow
  ```
]

- Go back to the first shell to watch the rollout progress

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---
## Promoting the Rollout

- We can see from the plugin output that the Rollout is in a paused state, and now has 1 of 5 replicas running the new version of the pod template, and 4 of 5 replicas running the old version.

- This equates to the 20% canary weight as defined by the `setWeight: 20` step.

- When a Rollout reaches a pause step with no duration, it will remain in a paused state indefinitely until it is resumed/promoted. To manually promote a rollout to the next step, run the promote command of the plugin:

- Watch the Rollout as it proceeds to execute the remaining steps.

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

## Aborting a Rollout

- Sometimes a canary doesn't satisfy our quality requirements and we decide to abort it.

- Let's see how.

- First - let's deploy a new version:

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

## Aborting the Rollout

- Watch the Rollout reach the paused state and roll back to the previous stable:

- When a rollout is aborted, it will scale up the "stable" version of the ReplicaSet (in this case the `yellow` image), and scale down any other versions. Although the stable version of the ReplicaSet may be running and is healthy, the overall rollout is still considered Degraded, since the desired version (the red image) is not the version which is actually running.

.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

## Going Back to Healthy

- In order to make Rollout considered `Healthy` again and not `Degraded`, it is necessary to change the desired state back to the previous, stable version. This typically involves running `kubectl apply` against the previous Rollout spec. In our case, we can simply re-run the set image command using the previous, "yellow" image.

- After running this command, you should notice that the Rollout immediately becomes `Healthy`, and there is no activity with regards to new ReplicaSets becoming created.
.debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/4.jpg)]

---

The Rollouts Dashboard

.nav[
[Previous section](#toc-our-first-rollout)
|
[Back to table of contents](#toc-module-2)
|
[Next section](#toc-traffic-management)
]

---
# The Rollouts Dashboard

The Argo Rollouts Kubectl plugin can serve a local UI Dashboard to visualize your Rollouts.

Let's  take a look:

Then visit localhost:3100 to view the user interface.

.debug[[workshop/dashboard.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/dashboard.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/5.jpg)]

---

Traffic Management

.nav[
[Previous section](#toc-the-rollouts-dashboard)
|
[Back to table of contents](#toc-module-3)
|
[Next section](#toc-analyzing-our-canaries)
]

---
# Traffic Management

- Controlling the amount of traffic by the number of pods is very limited

- Modern proxies allow us to have weighted load-balancing

- Kubernetes ingress controllers and service meshes can be configured to do weighted load balancing even with one pod per app version

- Argo Rollouts can be integrated with multiple service meshes and ingress controllers

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Ingress Controller in Our Cluster

- In our k3d cluster the Traefik ingress controller is installed

- It's mapped to port 80 of our host machine

- It creates routes based on standard kubernetes Ingress resources

- Or based on its own CRDs - for smarter traffic control

- Argo Rollouts integrates with Traefik CRDS

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Explore the Traefik CRDs

- See the *IngressRoute* and the *TraefikService* resources?

- These are the ones we will be using

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Create the Traefik Ingress

- Let's start with creating the ingress and the weighted load-balancing

- We will need:
  - A DNS record for our ingress (see next slide)
  - an additional Service for the canary access
  - a TraefikService to load balance between the stable and the canary service
  - an IngressRoute to route ingress traffic to the application

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Setting up DNS

- We will use nip.io for DNS

- In order to do that we'll need to edit the IngressRoute definition:

- Replace `<MY_IP>` in the `code/ingressroute.yaml` with the IP of your lab machine:

```yaml
 - kind: Rule
 match: Host(`demo.<MY_IP>.nip.io`) #replace <MY_IP> with your machine public IP
```

.exercise[
 ```bash
 MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com)
 sed -i s/\<MY_IP\>/$MY_IP/ ~/rollouts.workshop/code/ingressroute.yaml
 ```
]

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Creating the Ingress

- Let's create the ingress resources:

.exercise[
  ```bash
  kubectl apply -f ~/rollouts.workshop/code/rolloutscanaryservice.yaml
  kubectl apply -f ~/rollouts.workshop/code/traefikservice.yaml
  kubectl apply -f ~/rollouts.workshop/code/ingressroute.yaml
  ```
]

- Check by going to demo.`your.machine.ip`.nip.io in browser

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Updating our rollout

- We want to see canary Traefik(!) management in action

- Let's update our rollout to define traffic management

- Watch it scale down to 1 replica instead of 5 (we don't need multiple replicas if we use traffic management)

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Updating the Rollout with Traffic Management

- We will deploy a new version, promote it and see its progress in Web UI

- Update the version:
  ```bash
  kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow
  ```

- Watch the progress in rollouts console
  ```bash
  kar get rollout rollouts-demo -w
  ```

- Promote the canary to the second stage:
  ```bash
  kar promote rollouts-demo
  ```
]

- Watch the demo UI to see the canary progress

.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

## Exercise - add a stage

- Edit `~/rollouts.workshop/code/rollout-weighted.yaml` to include a stage before the final release that requires manual promotion

- Apply the resulting Rollout spec

- Deploy a new version with image `argoproj/rollouts-demo:red`

- Watch the rollout and prmote where needed.
.debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/6.jpg)]

---

Analyzing Our Canaries

.nav[
[Previous section](#toc-traffic-management)
|
[Back to table of contents](#toc-module-4)
|
[Next section](#toc-summing-it-all-up)
]

---
# Analyzing Our Canaries

- In order to have real CD - we usually don't want to promote our rollouts manually

- But in order to roll out automatically we need some verification

- Argo Rollouts allows us to analyze our canaries based on telemetry

- It integrates with multiple telemetry providers - Prometheus, DataDog, NewRelic, CloudWatch, InfluxDB.

- Let's see how to use it with Prometheus.

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## Deploy Prometheus

.exercise[
```bash
kubectl apply -f ~/rollouts.workshop/code/prometheus.yaml
```
- And an ingress for prometheus
```bash
MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com)
sed -i s/\<MY_IP\>/$MY_IP/ ~/rollouts.workshop/code/prometheus-ingress.yaml
kubectl apply -f ~/rollouts.workshop/code/prometheus-ingress.yaml
```
]

- Prometheus is now available at `http://prom.<YOUR.MACHINE.ADDRESS>.nip.io`
.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## A Rollout with Metrics

- We will deploy another application for analysis

- It's a bare bones Python Flask app with prometheus metrics

- One of its versions will increase a counter called `exceptions` on each healthcheck

- We'll define an AnalysisTemplate to count these execeptions.

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## Our AnalysisTemplate

```yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
 name: otoflask-exceptions
spec:
 args:
 - name: service-name
 metrics:
 - name: exceptions-count
 interval: 20s
 successCondition: result[0] <= 2.0
 failureLimit: 3
 provider:
 prometheus:
 address: http://prometheus.default.svc.cluster.local:9090
 query: exceptions_total{instance="{{args.service-name}}:80"} 
```

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## Let's deploy the Rollout

- Inspect all the resources in `~/rollouts.workshop/code/analyzedcanary.yaml`

- And deploy it all:

.exercise[
 ```bash
 MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com)
sed -i s/\<MY_IP\>/$MY_IP/ ~/rollouts.workshop/code/analyzedcanary.yaml
 kubectl apply -f ~/rollouts.workshop/code/analyzedcanary.yaml
 ```
]

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## First version is OK

- Browse to `prom.YOUR.MACHINE.IP.nip.io:9090` and find the `exceptions_total` count

- Now let's deploy a bad version

- In one shell - watch the rollout
```bash
    kar get rollout otoflask -w
```
- In another - checkout the analysisRun
```bash
    kubectl get analysisruns -oyaml 
```
]

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

## Let's Deploy a Fix

- Next version `0.3` fixes the bug:

- Watch the canary get peacefully rolled out

.debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)]
---

.interstitial[![Image separating from the next chapter](https://www.devopstrain.pro/images/argo/7.jpg)]

---

Summing It All Up

.nav[
[Previous section](#toc-analyzing-our-canaries)
|
[Back to table of contents](#toc-module-5)
|
[Next section](#toc-)
]

---
# Summing It All Up

- We've learned what progressive delivery is

- We've learned how Argo Rollouts works

- We've seen how to integrate Rollouts with:
  
  -  Traefik ingress controller for traffic management

-  Prometheus for Analysis

.debug[[workshop/summary.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/summary.md)]
---

## That's It for Today!

- Thanks for attending!

- Any future questions: `anton@otomato.io`

- Follow me on twitter - @antweiss

- For more training : https://devopstrain.pro
.debug[[workshop/summary.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/summary.md)]