filmov
tv
Site Reliability Engineering at Google • Christof Leng • GOTO 2017
Показать описание
This presentation was recorded at GOTO Berlin 2017
Christof Leng - Senior Site Reliability Engineer at Google @ChristofLeng
ABSTRACT
Site reliability engineers are Google's experts for operating its tech infrastructure and products. They need to keep up with the enormous scale, rapid growth, and daunting complexity of Google's systems landscape. As traditional methods would not work, SRE treats operations as if [...]
TIMECODES
0:00 Introduction
3:26 Reliability is easy to take for granted
6:24 What is Site Reliability Engineering (SRE)?
9:15 Part I: Dev and Ops
13:49 Is conflict inevitable?
14:48 Service Level Agreement (SLA)
20:19 What do you spend your budget on?
21:09 The rule
22:18 Two nice features of Error Budgets
24:08 Part II: Staffing, Work, Ops Overload
28:55 SRE hires only coders
31:05 50% cap on Ops work
32:09 Keep DEV in the rotation
34:09 Speaking of Dev and Ops work...
35:21 SRE Portability
37:24 Part III: Death, taxes, and outages...
39:07 Minimize Damage
40:59 A word on practice...
41:16 Wheel of Misfortune
43:22 Prevent recurrence
44:21 Post-mortem philosophy
46:13 Summary
47:00 O'Reilly Book
Read the full abstract here:
RECOMMENDED BOOKS
Looking for a unique learning experience?
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Christof Leng - Senior Site Reliability Engineer at Google @ChristofLeng
ABSTRACT
Site reliability engineers are Google's experts for operating its tech infrastructure and products. They need to keep up with the enormous scale, rapid growth, and daunting complexity of Google's systems landscape. As traditional methods would not work, SRE treats operations as if [...]
TIMECODES
0:00 Introduction
3:26 Reliability is easy to take for granted
6:24 What is Site Reliability Engineering (SRE)?
9:15 Part I: Dev and Ops
13:49 Is conflict inevitable?
14:48 Service Level Agreement (SLA)
20:19 What do you spend your budget on?
21:09 The rule
22:18 Two nice features of Error Budgets
24:08 Part II: Staffing, Work, Ops Overload
28:55 SRE hires only coders
31:05 50% cap on Ops work
32:09 Keep DEV in the rotation
34:09 Speaking of Dev and Ops work...
35:21 SRE Portability
37:24 Part III: Death, taxes, and outages...
39:07 Minimize Damage
40:59 A word on practice...
41:16 Wheel of Misfortune
43:22 Prevent recurrence
44:21 Post-mortem philosophy
46:13 Summary
47:00 O'Reilly Book
Read the full abstract here:
RECOMMENDED BOOKS
Looking for a unique learning experience?
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Комментарии