Björn Rabenstein. Applied Alerting Philosophy. DevOps Fest 2019

preview_player
Показать описание
Upcoming DevOps Conference:
DevOps Fest 2020 - 5-6th of June, Kyiv, Ukraine

The talk from DevOps Fest conference in Kyiv, Ukraine.

More than five years ago, Rob Ewaschuk created an innocuous Google doc titled “My Philosophy On Alerting”. It became kind of viral and later formed the foundation of a chapter in the famous book Site Reliability Engineering – How Google Runs Production Systems. In parallel, the metrics-based monitoring and alerting system Prometheus was developed at SoundCloud. It is the open-source tool to put Rob’s philosophy into practice. Thus, I would like to present “applied alerting philosophy” and explain how we use Prometheus at SoundCloud to create meaningful and actionable alerts. In particular, SoundCloud follows a fairly radical “you build it – you run it” approach, which requires additional care to route alerts to the right group of engineers. Prometheus’s “label everything” mantra proves to be very helpful here.
Рекомендации по теме