Life of an SRE at Google - JC van Winkel - Codemotion Rome 2017

preview_player
Показать описание
We've all heard about DEVOPS and companies adopting DEVOPS tactics and strategies. But how can we limit the inherent tension and resulting conflicts between DEV and OPS side? That is bad for effectiveness, the work environment, and attrition. But we want to have an organization that people love to work at, keeps "the site" reliable and moves systems forward at a high pace. In its 13 year history, SRE have learned what happens when you live by ground rules, such as automation, launching fast and often, having well defined SLAs and in case of outages, writing blameless postmortems.
Рекомендации по теме
Комментарии
Автор

Great presentation! And love the book so far! Thanks for sharing your wisdom

veganphilosopher
Автор

Awesome video !! Clear explanation about the SRE process at Google.

ganeshbabujothiganesan
Автор

Really interesting video. There is a question running in my head about reliability and what he says at minute 14:55. If you had 10 systems working, all with 2 nines of up time (let's suppose independent), the user will experience 1 nine, isn't it?
P (A U B U C) = P(A) + P(B) + P(C) - P(A ∩ B) - P(A ∩ C) - P(B ∩ C) + P(A ∩ B ∩ C). How is that taken into account?

Nuriau_u
Автор

This is way too similar to Ben Treynor's speech from 2014.

dijoxx
Автор

Error Budget 🧐😏
“Hmm, wonder if there’s a Carryover or Rollover concept exists? I can always strive to build out that for long term compound growth so when the day comes and We find ourselves at the center of some sort of site wide DR scenario, We can just look at the SREs and say “Just Smile and Wave boys, just smile and wave!”

ichoudhury
Автор

Hi there What is difference between system reliability Engineer and site reliability Engineer ? please try to ans early.

kirandeshmukh