Chaos Engineering refers to the discipline that examines systems under extreme conditions and disruptive events and is used to optimise the operation of systems and build confidence in them using the knowledge we can obtain from chaos.
This 8-hour hands-on workshop will give participants important insights about Web applications, Kubernetes and clusters by utilising the power of destroying, obstructing, and delaying things in servers and in clusters but for a good cause.
Chaos engineering tries to find the limits of a system. It helps deduce what are the consequences when bad things happen. Participants will learn how to simulate the adverse effects in a controlled way with an aim to improve systems to make them more resilient and capable of recuperating and resisting harmful and unpredictable events.
Who should attend
This hands-on workshop will be particularly useful to Testers, Software Engineers, DevOps Engineers, SREs, sysadmins, Operators, or anyone else interested in making applications and systems more resilient and reliable.
Prerequisites
For this workshop, there are no particular prerequisites in terms of experience apart from a basic understanding of containers and Kubernetes. The instructor will provide code and instructions for anything participants will be required to run.
In terms of the course, participants will need to install the following tools:
- Git
- WSL or GitBash (if using Windows)
- kubectl
- Helm v3.x (if using EKS)
- Python v3.x
- Pip
- A Docker Desktop, Minikube, GKE, EKS, AKS, or any other Kubernetes cluster. Gists with the instructions how to create a cluster are below.
Please choose one of the “flavors” of a Kubernetes cluster and make sure that it is operational before the course.
- Docker Desktop with Istio: https://gist.github.com/9a9752cf5355f1b8095bd34565b80aae
- Minikube with Istio: https://gist.github.com/a5870806ae6f21de271bf9214e523b53
- Regional and scalable GKE with Istio: https://gist.github.com/88e810413e2519932b61d81217072daf
- Regional and scalable EKS with Istio: https://gist.github.com/d73fb6f4ff490f7e56963ca543481c09
- Regional and scalable AKS with Istio: https://gist.github.com/b068c3eadbc4140aed14b49141790940
Note that all the aforementioned requirements must be met before the workshop session.
What will you learn
This 8-hour virtual hands-on workshop will cover the following modules
Introduction To Chaos Engineering
- Principles Of Chaos Engineering
- Are You Ready For Chaos?
- Examples Of Chaos Engineering
- The Principles And The Process
- Chaos Experiments Checklist
Choosing The Right Tool
- Requirements Guiding The Choice
- Which Tool To Pick?
Setting Up The Environment
- Installing Chaos Toolkit
Destroying Application Instances
- Introduction
- Creating A Cluster
- Deploying The Application
- Discovering ChaosToolkit Kubernetes Plugin
- Terminating Application Instances
- Defining Steady State Hypothesis
- Pausing After Actions
- Probing Phases And Conditions
- Making The Application Fault-Tolerant
Experimenting With Application Availability
- Validating Application Health
- Validating Application Availability
- Terminating Application Dependencies
Obstructing And Destroying Network
- Installing Istio Service Mesh
- Deploying The Application
- Discovering ChaosToolkit Istio Plugin
- Aborting Network Requests
- Rolling Back Abort Failures
- Making The Application Resilient To Partial Network Failures
- Increasing Network Latency
- Aborting All Requests
- Simulating Denial Of Service Attacks
- Running Denial Of Service Attacks
Draining And Deleting Nodes (works only if NOT using local Kubernetes)
- Draining Worker Nodes
- Uncordoning Worker Nodes
- Making Nodes Drainable
- Deleting Worker Nodes
Creating Chaos Experiment Reports
- Exploring Experiments Journal
- Creating Experiment Report
- Creating A Multi-Experiment Report
Running Chaos Experiments Inside A Kubernetes Cluster
- Setting Up Chaos Toolkit In Kubernetes
- Types Of Experiment Executions
- Running One-Shot Experiments
- Running Scheduled Experiments
- Running Failed Scheduled Experiments
- Sending Experiment Notifications
- Sending Selective Notifications
Executing Random Chaos
- Deploying Dashboard Applications
- Exploring Grafana Dashboards
- Exploring Kiali Dashboards
- Preparing For Termination Of Instances
- Terminating Random Application Instances
- Disrupting Network Traffic
- Preparing For Termination Of Nodes
- Terminating Random Nodes
- Monitoring And Alerting With Prometheus
About the Instructor
Viktor Farcic is a Principal Software Delivery Strategist and Developer Advocate at CloudBees, a member of the Google Developer Experts and Docker Captains groups, and the published author of The DevOps Toolkit Series, DevOps Paradox, and Test-Driven Java Development.