When Prometheus Can’t Take the Load Anymore - Liron Cohen, Riskified

preview_player
Показать описание

When Prometheus Can’t Take the Load Anymore - Liron Cohen, Riskified

Riskified started from using a pair of Prometheus servers in each of its clusters, but soon enough, Prometheus couldn’t take the load anymore. Once it happened, the SRE team started to check what is the best tool for Multi, HA, long-term Prometheus. They decided to check Thanos, Cortex, and M3. In this session, Liron will share her outtakes of the different tools - which tool can provide the best performance and High Availability, the most cost-effective, and the easiest to deploy and operate.
By the end, you’ll get a better understanding of the different tools and which one is the best solution for your use case.
Рекомендации по теме
Комментарии
Автор

nice talk, I chose thanos for the simplicity as well but Im starting to think about cortex more and more lately

giovannicoutinho