Site Reliability Engineering at Google • Christof Leng • GOTO 2017

preview_player
Показать описание
This presentation was recorded at GOTO Berlin 2017

Christof Leng - Senior Site Reliability Engineer at Google @ChristofLeng

ABSTRACT
Site reliability engineers are Google's experts for operating its tech infrastructure and products. They need to keep up with the enormous scale, rapid growth, and daunting complexity of Google's systems landscape. As traditional methods would not work, SRE treats operations as if [...]

TIMECODES
0:00 Introduction
3:26 Reliability is easy to take for granted
6:24 What is Site Reliability Engineering (SRE)?
9:15 Part I: Dev and Ops
13:49 Is conflict inevitable?
14:48 Service Level Agreement (SLA)
20:19 What do you spend your budget on?
21:09 The rule
22:18 Two nice features of Error Budgets
24:08 Part II: Staffing, Work, Ops Overload
28:55 SRE hires only coders
31:05 50% cap on Ops work
32:09 Keep DEV in the rotation
34:09 Speaking of Dev and Ops work...
35:21 SRE Portability
37:24 Part III: Death, taxes, and outages...
39:07 Minimize Damage
40:59 A word on practice...
41:16 Wheel of Misfortune
43:22 Prevent recurrence
44:21 Post-mortem philosophy
46:13 Summary
47:00 O'Reilly Book

Read the full abstract here:

RECOMMENDED BOOKS

Looking for a unique learning experience?

SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Рекомендации по теме
Комментарии
Автор

I am a python developer and I am going to start working as an SRE... wish me luck... it's going to be an exciting new path :)

maximilianoromayfigueroa
Автор

The arrogance of these people is just unbounded.

TheCALMInstitute