Early bird!
virtual or
in class

Introduction to Site Reliability Engineering

0 out of 5

420.00 320.00 final price

This is now a virtual classroom course. You can find more information about our virtual classroom here

Site reliability Εngineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. During this course participants will learn the fundamentals of SRE, so the principles & practices that enable enterprises to reliably and economically scale critical services. They will learn about what makes SRE such an important discipline when practiced correctly, and how it can improve both the stability and performance of your enterprise applications.

Clear

Description

Site reliability Εngineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. During this 8-hour course participants will learn the fundamentals of SRE, so the principles & practices that enable enterprises to reliably and economically scale critical services. They will learn about what makes SRE such an important discipline when practiced correctly, and how it can improve both the stability and performance of your enterprise applications.

Who should attend

  • Software Engineers interested in learning about how to use and apply SRE within an operations environment
  • DevOps practitioners interested in understanding the role of SRE and how to consider using it within their own organization

Prerequisites

The course has no specific prerequisites.

What will you learn

During this course you will learn about the following topics.

Culture

  • The Dev and Ops Silos
  • What is SRE
  • The difference between SRE and DevOps
  • SRE Principles
  • Getting on-board the SRE Culture
  • SRE team topologies
  • Expectations vs Reality
  • Hiring SREs
  • Educating SREs

Automation

  • What is Toil
  • Techniques for reducing toil
  • Tips for automation
  • Exploring the different levels of automation
  • Automation Pitfalls

Testing

  • Why it matters
  • Pre-Production Testing: Unit, Integration, Load
  • Production Testing: Canarie, Flags and Chaos Engineering

Incident Management

  • Exploring the Team Topologies
  • Incident Response Protocol
  • Troubleshooting Sane Practices
  • Tools of the Trade
  • Writing Postmortems
  • Incident Management Training

Observability

  • Monitoring
  • Golden Signals
  • Alerting
  • Logging
  • Tracing
  • Common Pitfalls
  • Sane Practices

Introduction to SLOs/SLIs

  • What are the Service Level Indicators
  • What are the Service Level Objectivess
  • Error Budgets
  • Good Practices and Common Pitfalls
  • Workshop

The Reliability aspect of SRE

  • What is Reliability
  • The difference between failure and fault
  • Tolerating faults
  • Reliability Practices
  • Ensuring compliance with reliable practices (Production Readiness Reviews)
  • Deployment strategies
  • Benefits of cluster orchestrators
  • Kubernetes
  • The Operator Pattern
  • Disaster & Recovery
  • Capacity Planning
  • Drills

Introduction to Chaos Engineering

  • History
  • What is Chaos Engineering
  • Running Chaos Engineerign Experiments
  • Tools of the Trade
  • Common Pitfalls and Tips
  • Gameday example

Schedule

Next virtual session has been scheduled for 24-25 May 2021 10:00am – 2:00pm EEST.