class: title, in-person Progressive Delivery on Kubernetes with Argo Rollouts
.footnote[ *Presented by Anton Weiss*
*Otomato technical training.*
*https://otomato.io* **Slides: https://devopstrain.pro/rollouts** *Slide-generation engine borrowed from [container.training](https://github.com/jpetazzo/container.training)* ] .debug[[workshop/title.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/title.md)] --- ## Introduction - This presentation was created by [Ant Weiss](https://twitter.com/antweiss) to support instructor-led workshops. - We included as much information as possible in these slides - Most of the information this workshop is based on is public knowledge and can also be accessed through [Argo Rollouts official documents and tutorials](https://argoproj.github.io/argo-rollouts/)  .debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)] --- ## Training environment - This is a hands-on training with exercises and examples - We assume that you have access to a Kubernetes cluster - The training labs for today's session were generously sponsored by [Otomato Software Delivery](https://otomato.io) - We will be using [k3d](https://k3d.io) to run these clusters .debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)] --- ## Getting started - Get the source code and the slides for this workshop: .exercise[ - On your Strigo VM: ```bash git clone https://github.com/otomato-gh/rollouts.workshop.git cd rollouts.workshop ./scripts/setup_rollouts.sh # enter new shell for kubectl completion sudo su - ${USER} ``` ] - This will install the Argo Rollouts controller in your k3d cluster - And the Argo Rollouts kubectl plugin. It is optional, but is convenient for managing and visualizing rollouts from the command line. .debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)] --- ## The Argo Rollouts kubectl plugin - The commands for the plugin require a lot of typing: .exercise[ ```bash kubectl argo rollouts get rollouts dummy # compare that to: kubectl get rollouts dummy ``` ] - So instead let's define an alias: .exercise[ ```bash alias kar="kubectl argo rollouts" ``` ```bash kar version ``` ] .debug[[workshop/intro.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/intro.md)] --- name: toc-chapter-1 ## Chapter 1 - [Progressive Delivery Defined](#toc-progressive-delivery-defined) .debug[(auto-generated TOC)] --- name: toc-chapter-2 ## Chapter 2 - [Argo Rollouts Basic Concepts](#toc-argo-rollouts-basic-concepts) - [Our First Rollout](#toc-our-first-rollout) - [The Rollouts Dashboard](#toc-the-rollouts-dashboard) .debug[(auto-generated TOC)] --- name: toc-chapter-3 ## Chapter 3 - [Traffic Management](#toc-traffic-management) .debug[(auto-generated TOC)] --- name: toc-chapter-4 ## Chapter 4 - [Analyzing Our Canaries](#toc-analyzing-our-canaries) .debug[(auto-generated TOC)] --- name: toc-chapter-5 ## Chapter 5 - [Summing It All Up](#toc-summing-it-all-up) .debug[(auto-generated TOC)] .debug[[workshop/toc.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/toc.md)] --- class: pic .interstitial[] --- name: toc-progressive-delivery-defined class: title Progressive Delivery Defined .nav[ [Previous section](#toc-) | [Back to table of contents](#toc-module-1) | [Next section](#toc-argo-rollouts-basic-concepts) ] .debug[(automatically generated title slide)] --- # Progressive Delivery Defined **Progressive Delivery** is the collective definition for a set of deployment techniques that allow for gradual, reliable and low-stress release of new software versions into production environments. Argo Rollouts allow us to automate these techniques. Techniques we will be looking at today are: - Blue-Green - Canary deployments - Traffic Mirroring - Experiments .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- ## Blue-Green A Blue-Green deployment (sometimes referred to as a Red-Black) has both the new and old version of the application deployed at the same time. During this time, only the old version of the application will receive production traffic. This allows the developers to run tests against the new version before switching the live traffic to the new version.  .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- ## Canary Deployments **Canary Deployments** is the process in which a new version that is released to production gets only a tiny percent of actual production traffic. While the rest of traffic continues to be served by the old version. This may cause a minimal, sufferable service disruption. If the new version functions fine - we gradually switch more traffic over to it from the old version. Until all traffic is served by the new version and the old version can be retired.  .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- ## Traffic Mirroring *Traffic Mirroring* (or traffic shadowing) is more of a testing technique whereas we release the new version to production and channel all the production traffic to it. This happens in parallel to serving this traffic by the old version. No responses are sent back from the new version. This allows us to test the new version with full production traffic and data without impacting our users.  .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- ## Experiments **Experiments** AKA A/B Testing is also a testing technique whereas we release 2 or more versions to production and run the for a specifed period in order to analyse their performance. This can be a technical step in the process of either blue-green or canary rollout. Or it can be a stand-alone technique for verifying feature completeness of a version before release. .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- ## How Can Rollouts Help - With Argo Rollouts we can define and manage precisely: - What versions of a service are rolled out - The percentage of traffic routed to each version - How many stages the rollout has - How long will each stage take - The criteria for promoting to the next stage - The experiments and analysis to run at each stage - But first let's learn the basics how Argo Rollouts work .debug[[workshop/progdel.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/progdel.md)] --- class: pic .interstitial[] --- name: toc-argo-rollouts-basic-concepts class: title Argo Rollouts Basic Concepts .nav[ [Previous section](#toc-progressive-delivery-defined) | [Back to table of contents](#toc-module-2) | [Next section](#toc-our-first-rollout) ] .debug[(automatically generated title slide)] --- # Argo Rollouts Basic Concepts ## A Rollout A **Rollout** is a Kubernetes workload resource which is equivalent to a Kubernetes Deployment object. It is intended to replace a Deployment object in scenarios when more advanced deployment or progressive delivery functionality is needed. A Rollout provides the following features which a Kubernetes Deployment cannot: - blue-green deployments - canary deployments - integration with ingress controllers and service meshes for advanced traffic routing - integration with metric providers for blue-green & canary analysis - automated promotion or rollback based on successful or failed metrics .debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)] --- ## Ingress/Service This is the mechanism that allows traffic from live users to enter your cluster and be redirected to the appropriate version. Argo Rollouts use the standard Kubernetes *Service* resource, but with some extra metadata needed for management. Argo Rollouts is very flexible on networking options. First of all you can have different services during a Rollout, that go only to the new version, only to the old version or both. Specifically for Canary deployments, Argo Rollouts supports several service mesh and ingress solutions for splitting traffic with specific percentages instead of simple balancing based on pod counts and it is possible to use multiple routing providers simultaneously. .debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)] --- ## An Analysis Argo Rollouts provides several ways to perform analysis to drive progressive delivery. This allows us to achieve various forms of progressive delivery, varying the point in time analysis is performed, it's frequency, and occurrence. An analysis is enabled by the following custom resources: - An *AnalysisTemplate* is a template spec which defines how to perform a canary analysis, such as the metrics which it should perform, its frequency, and the values which are considered successful or failed. AnalysisTemplates may be parameterized with input values - An *AnalysisRun* is an instantiation of an AnalysisTemplate. AnalysisRuns are like Jobs in that they eventually complete. Completed runs are considered Successful, Failed, or Inconclusive, and the result of the run affect if the Rollout's update will continue, abort, or pause, respectively. .debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)] --- ## An Experiment An *Experiment* is a limited run of one or more ReplicaSets for the purposes of analysis. Experiments typically run for a pre-determined duration, but can also run indefinitely until stopped. Experiments may reference an AnalysisTemplate to run during or after the experiment. The canonical use case for an Experiment is to start a baseline and canary deployment in parallel, and compare the metrics produced by the baseline and canary pods for an equal comparison. .debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)] --- ## Explore Argo Rollouts Let's see what Argo Rollouts components we have in our cluster .exercise[ ```bash kubectl get all -n argo-rollouts ``` ] And the custom resources used to configure the rollouts. .exercise[ ```bash kubectl get crds | grep argo ``` ] .debug[[workshop/basics.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/basics.md)] --- class: pic .interstitial[] --- name: toc-our-first-rollout class: title Our First Rollout .nav[ [Previous section](#toc-argo-rollouts-basic-concepts) | [Back to table of contents](#toc-module-2) | [Next section](#toc-the-rollouts-dashboard) ] .debug[(automatically generated title slide)] --- # Our First Rollout Rollouts replace the Deployments in our K8s cluster. To deploy our very first version we will deploy a Rollout and a matching Service to access it. .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Deploy our first rollout .exercise[ - Deploy the Rollout ```bash kubectl apply -f ~/rollouts.workshop/code/rollout.yaml ``` - Deploy the Service ```bash kubectl apply -f ~/rollouts.workshop/code/service.yaml ``` ] -- .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## What did we roll out? - Let's look at rollout.yaml ```yaml spec: replicas: 5 strategy: canary: # The strategy is Canary Deployment steps: - setWeight: 20 # We first send 20% to the canary - pause: {} # We wait for a manual promotion - setWeight: 40 # Promote canary to 40% - pause: {duration: 10} # Wait 10 sec - setWeight: 60 # Promote canary to 60% - pause: {duration: 10} - setWeight: 80 # Promote canary to 80% - pause: {duration: 10} # Wait 10 sec # Finally canary replaces previous version ``` .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Watching the Rollout Initial creations of any Rollout will immediately scale up the replicas to 100% (skipping any canary upgrade steps, analysis, etc...) since there was no upgrade that occurred. The Argo Rollouts **kubectl plugin** allows you to visualize the Rollout, its related resources (ReplicaSets, Pods, AnalysisRuns), and presents live state changes as they occur. To watch the rollout as it deploys, run the get rollout --watch command from plugin: .exercise[ ```bash kubectl argo rollouts get rollout rollouts-demo --watch # or rather kar get rollout rollouts-demo --watch ``` ] .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Updating the Rollout - Now let's update the Rollout and see the magic of the staged deployment in action: .exercise[ ```bash # Run in a new shell while "watch" is running kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow ``` ] - Go back to the first shell to watch the rollout progress .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Promoting the Rollout - We can see from the plugin output that the Rollout is in a paused state, and now has 1 of 5 replicas running the new version of the pod template, and 4 of 5 replicas running the old version. - This equates to the 20% canary weight as defined by the `setWeight: 20` step. - When a Rollout reaches a pause step with no duration, it will remain in a paused state indefinitely until it is resumed/promoted. To manually promote a rollout to the next step, run the promote command of the plugin: .exercise[ ```bash kar promote rollouts-demo ``` ] - Watch the Rollout as it proceeds to execute the remaining steps. .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Aborting a Rollout - Sometimes a canary doesn't satisfy our quality requirements and we decide to abort it. - Let's see how. - First - let's deploy a new version: .exercise[ ```bash kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:red ``` ] .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Aborting the Rollout - Watch the Rollout reach the paused state and roll back to the previous stable: .exercise[ ```bash kar abort rollouts-demo ``` ] - When a rollout is aborted, it will scale up the "stable" version of the ReplicaSet (in this case the `yellow` image), and scale down any other versions. Although the stable version of the ReplicaSet may be running and is healthy, the overall rollout is still considered Degraded, since the desired version (the red image) is not the version which is actually running. .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- ## Going Back to Healthy - In order to make Rollout considered `Healthy` again and not `Degraded`, it is necessary to change the desired state back to the previous, stable version. This typically involves running `kubectl apply` against the previous Rollout spec. In our case, we can simply re-run the set image command using the previous, "yellow" image. .exercise[ ```bash kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow ``` ] - After running this command, you should notice that the Rollout immediately becomes `Healthy`, and there is no activity with regards to new ReplicaSets becoming created. .debug[[workshop/firstrollout.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/firstrollout.md)] --- class: pic .interstitial[] --- name: toc-the-rollouts-dashboard class: title The Rollouts Dashboard .nav[ [Previous section](#toc-our-first-rollout) | [Back to table of contents](#toc-module-2) | [Next section](#toc-traffic-management) ] .debug[(automatically generated title slide)] --- # The Rollouts Dashboard The Argo Rollouts Kubectl plugin can serve a local UI Dashboard to visualize your Rollouts. Let's take a look: .exercise[ ```bash kar dashboard ``` ] Then visit localhost:3100 to view the user interface. .debug[[workshop/dashboard.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/dashboard.md)] --- class: pic .interstitial[] --- name: toc-traffic-management class: title Traffic Management .nav[ [Previous section](#toc-the-rollouts-dashboard) | [Back to table of contents](#toc-module-3) | [Next section](#toc-analyzing-our-canaries) ] .debug[(automatically generated title slide)] --- # Traffic Management - Controlling the amount of traffic by the number of pods is very limited - Modern proxies allow us to have weighted load-balancing - Kubernetes ingress controllers and service meshes can be configured to do weighted load balancing even with one pod per app version - Argo Rollouts can be integrated with multiple service meshes and ingress controllers .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Ingress Controller in Our Cluster - In our k3d cluster the Traefik ingress controller is installed - It's mapped to port 80 of our host machine - It creates routes based on standard kubernetes Ingress resources - Or based on its own CRDs - for smarter traffic control - Argo Rollouts integrates with Traefik CRDS .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Explore the Traefik CRDs .exercise[ ```bash kubectl get crd | grep traefik ``` ] - See the *IngressRoute* and the *TraefikService* resources? - These are the ones we will be using .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Create the Traefik Ingress - Let's start with creating the ingress and the weighted load-balancing - We will need: - A DNS record for our ingress (see next slide) - an additional Service for the canary access - a TraefikService to load balance between the stable and the canary service - an IngressRoute to route ingress traffic to the application .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Setting up DNS - We will use nip.io for DNS - In order to do that we'll need to edit the IngressRoute definition: - Replace `
` in the `code/ingressroute.yaml` with the IP of your lab machine: ```yaml - kind: Rule match: Host(`demo.
.nip.io`) #replace
with your machine public IP ``` .exercise[ ```bash MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com) sed -i s/\
/$MY_IP/ ~/rollouts.workshop/code/ingressroute.yaml ``` ] .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Creating the Ingress - Let's create the ingress resources: .exercise[ ```bash kubectl apply -f ~/rollouts.workshop/code/rolloutscanaryservice.yaml kubectl apply -f ~/rollouts.workshop/code/traefikservice.yaml kubectl apply -f ~/rollouts.workshop/code/ingressroute.yaml ``` ] - Check by going to demo.`your.machine.ip`.nip.io in browser .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Updating our rollout - We want to see canary Traefik(!) management in action - Let's update our rollout to define traffic management .exercise[ ```bash kubectl apply -f ~/rollouts.workshop/code/rollout-weighted.yaml ``` ] - Watch it scale down to 1 replica instead of 5 (we don't need multiple replicas if we use traffic management) .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Updating the Rollout with Traffic Management - We will deploy a new version, promote it and see its progress in Web UI .exercise[ - Update the version: ```bash kar set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow ``` - Watch the progress in rollouts console ```bash kar get rollout rollouts-demo -w ``` - Promote the canary to the second stage: ```bash kar promote rollouts-demo ``` ] - Watch the demo UI to see the canary progress .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- ## Exercise - add a stage - Edit `~/rollouts.workshop/code/rollout-weighted.yaml` to include a stage before the final release that requires manual promotion - Apply the resulting Rollout spec - Deploy a new version with image `argoproj/rollouts-demo:red` - Watch the rollout and prmote where needed. .debug[[workshop/traffic.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/traffic.md)] --- class: pic .interstitial[] --- name: toc-analyzing-our-canaries class: title Analyzing Our Canaries .nav[ [Previous section](#toc-traffic-management) | [Back to table of contents](#toc-module-4) | [Next section](#toc-summing-it-all-up) ] .debug[(automatically generated title slide)] --- # Analyzing Our Canaries - In order to have real CD - we usually don't want to promote our rollouts manually - But in order to roll out automatically we need some verification - Argo Rollouts allows us to analyze our canaries based on telemetry - It integrates with multiple telemetry providers - Prometheus, DataDog, NewRelic, CloudWatch, InfluxDB. - Let's see how to use it with Prometheus. .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## Deploy Prometheus .exercise[ ```bash kubectl apply -f ~/rollouts.workshop/code/prometheus.yaml ``` - And an ingress for prometheus ```bash MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com) sed -i s/\
/$MY_IP/ ~/rollouts.workshop/code/prometheus-ingress.yaml kubectl apply -f ~/rollouts.workshop/code/prometheus-ingress.yaml ``` ] - Prometheus is now available at `http://prom.
.nip.io` .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## A Rollout with Metrics - We will deploy another application for analysis - It's a bare bones Python Flask app with prometheus metrics - One of its versions will increase a counter called `exceptions` on each healthcheck - We'll define an AnalysisTemplate to count these execeptions. .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## Our AnalysisTemplate ```yaml apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: otoflask-exceptions spec: args: - name: service-name metrics: - name: exceptions-count interval: 20s successCondition: result[0] <= 2.0 failureLimit: 3 provider: prometheus: address: http://prometheus.default.svc.cluster.local:9090 query: exceptions_total{instance="{{args.service-name}}:80"} ``` .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## Let's deploy the Rollout - Inspect all the resources in `~/rollouts.workshop/code/analyzedcanary.yaml` - And deploy it all: .exercise[ ```bash MY_IP=$(dig +short myip.opendns.com @resolver1.opendns.com) sed -i s/\
/$MY_IP/ ~/rollouts.workshop/code/analyzedcanary.yaml kubectl apply -f ~/rollouts.workshop/code/analyzedcanary.yaml ``` ] .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## First version is OK - Browse to `prom.YOUR.MACHINE.IP.nip.io:9090` and find the `exceptions_total` count - Now let's deploy a bad version .exercise[ ```bash kar set image otoflask otoflask=otomato/prom-flask:0.2 ``` - In one shell - watch the rollout ```bash kar get rollout otoflask -w ``` - In another - checkout the analysisRun ```bash kubectl get analysisruns -oyaml ``` ] .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- ## Let's Deploy a Fix - Next version `0.3` fixes the bug: .exercise[ ```bash kar set image otoflask otoflask=otomato/prom-flask:0.3 ``` ] - Watch the canary get peacefully rolled out .debug[[workshop/analyzed.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/analyzed.md)] --- class: pic .interstitial[] --- name: toc-summing-it-all-up class: title Summing It All Up .nav[ [Previous section](#toc-analyzing-our-canaries) | [Back to table of contents](#toc-module-5) | [Next section](#toc-) ] .debug[(automatically generated title slide)] --- # Summing It All Up - We've learned what progressive delivery is - We've learned how Argo Rollouts works - We've seen how to integrate Rollouts with: - Traefik ingress controller for traffic management - Prometheus for Analysis .debug[[workshop/summary.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/summary.md)] --- ## That's It for Today! - Thanks for attending! - Any future questions: `anton@otomato.io` - Follow me on twitter - @antweiss - For more training : https://devopstrain.pro .debug[[workshop/summary.md](https://github.com/otomato-gh/rollouts.workshop/tree/main/slides/workshop/summary.md)]