Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

Показать описание

Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

These days, the most common reason for a Prometheus server to run out of memory is an excessive amount of time series in the so called head block, the part of the internal TSDB with the freshest data, which has to be kept in memory prior to consolidation into a block on disk. A large head block leads to a long restart time because the head block has to be rebuilt from the write-ahead log. On large servers, the restart time can be 10 minutes or more. Since restarts happen regularly to upgrade the binary or to change flags, the resulting interruption of sample collection is problematic. Even worse: After an OOM crash, the same replaying from the WAL has to happen, often causing another OOM crash immediately. Ganesh Vernekar will talk about the work started in late 2019 to persist parts of the head block earlier, thereby reducing both the memory footprint and the restart time.

CNCF [Cloud Native Computing Foundation]

Рекомендации по теме

Комментарии

Snapshotting the latest chunk is an amazing feature, WAL replaying takes a lot of time in my case. Looking forward to use this feature. Thanks for building this

lokeshwarank

Hi, is these features already available in the latest Prometheus Docker Image?

srikanthjnr

Excellent depiction and explanation. Do we have slides available?

chakradharnr

Can we have some scenario's based upon remote write failures? E.g. how WAL works in case of remote database failures? Also what happens if Prometheus pod itself is down for few minutes. How this wal and chunk behaves.

prabhatranjan

Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs

How Prometheus Halved Its Memory Usage - Bryan Boreham, Grafana Labs

Don't Make These 6 Prometheus Monitoring Mistakes | Prometheus Best Practices & Pitfalls

PromCon 2023 - Finding useless and resource-hungry Prometheus metrics

Grafana Dashboard📊: Monitor CPU, Memory, Disk and Network Traffic Using Prometheus and Node Exporter...

Leveraging Prometheus’ TSDB for conprof (Continuous Profiling) - Matthias Loibl

Thanos (Multi Cluster Prometheus) Tutorial: Global View - Long Term Storage - Kubernetes

How Prometheus Monitoring works | Prometheus Architecture explained

Introduction to the Prometheus Monitoring System | Key Concepts and Features

Server Monitoring // Prometheus and Grafana Tutorial

Hynek Schlawack - Get Instrumented: How Prometheus Can Unify Your Metrics - PyCon 2016

How to monitor Containers in Kubernetes using Prometheus & cAdvisor & Grafana? CPU, Memory, ...

How to Build Custom Prometheus Exporter? (Step-by-Step - Real-world Example - Parse Log + HTTP)

Doing Things Prometheus Can’t Do with Prometheus - Tim Simmons, DigitalOcean

Cortex 101: Horizontally Scalable Long Term Storage for Prometheus - Chris Marchbanks, Splunk

7 Things You Didn't Know About Prometheus | Little-Known Features and Implementation Details

Horizontal Pod Autoscaler CUSTOM METRICS & PROMETHEUS: (Kubernetes | EKS | Autoscaling | HPA | K...

Instrumenting applications for Prometheus

Monitoring Using Prometheus | Prometheus Monitoring Ecosystem

PromCon 2018: Autoscaling All Things Kubernetes with Prometheus

Adopting Prometheus the Hard Way - Tim Simmons, DigitalOcean

Best Server Monitoring with Prometheus and Grafana using Node Exporter and cAdvisor

Predictive Application Scaling with Prometheus and ML - Chris Dutra, Schireson

The Origins of the Engineers in Prometheus and the Alien Franchise