Monitoring, Logging, And Alerting In Kubernetes

Показать описание

What is the best combination of tools for monitoring, logging, and alerting in Kubernetes?

#prometheus #grafana #loki #robusta

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬

▬▬▬▬▬▬ 💰 Sponsoships 💰 ▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬

▬▬▬▬▬▬ 🚀 Courses, books, and podcasts 🚀 ▬▬▬▬▬▬

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬
00:00 Introduction to monitoring, logging, and alerting
00:35 Metrics And Alerting With Prometheus
08:04 Notifications With Robusta
10:01 Logs Collection With Loki
13:45 Dashboards With Grafana

Рекомендации по теме

Комментарии

IMPORTANT: I made a mistake in the video by saying that AlertManager is querying Prometheus. That's incorrect. It's the other way around. Prometheus is evaluating the rules and sending alerts to AlertManager which, in turn, is forwarding them to final destinations like Slack, email, etc.

What do you use for monitoring, logging, and alerting? What's your favorite stack?

DevOpsToolkit

I highly recommend kube-prometheus-stack, all-in-one helm chart to deploy prometheus, grafana and alertmanager, each with its own operators. So instead of pre-defining things in values.yaml, you can use CRs to define targets, rules, alerts, dashboards, datasources, etc - in a Kubernetes way. For the logging part, I found banzaiclud's logging-operator to be very interesting, again a way to simplify the deployment of software for logging collection, aggregation and shipment (Loki being just one possible destination). It is also built around an operator and deploys instances of fluentd and fluentbit.

coocoobau

I think we should include Tracing here. It can be Jaeger, Temp or something else. And all those thing should be standardized by OpenTelemetry.

nask

It would be interesting to see an example using opentelemetry to gather the observability data (avoding agents vendor lock-in) and use the otel pipelines to expose the data to different vendor solutions.

daivol

Great video!!! Observability is so important and allow a lot off evolution not yet explored today
I have the mnemonic word AMLET for alerting, monitoring, logging, eventing (context and others) and tracing
I think also that grafana is the de facto place to have all data to observe even as a saas)
thanks to tempo and loki we can add more meaning to metrics dashboards (and I have a small preference for sensu go over robusta to serve as a glue around all that) and leverage all that with a runbook system for auto remediation (stackstorm, awx, ansible platform, jenkins, rundeck....). The dream!

soubinan

Would be interesting to have a deeper dive, things like Thanos, Tempo, Mimir, etc. Also, what do you think of using their jsonnet libraries to manage those? I found the community helm charts to be not that well maintained and jsonnet is actually pretty flexible for an enterprise setup

jemag

I suggest you also add the alert example just like you do with querying. Otherwise great video enjoyed it 👍🏽

andriespiitso

If you are serious about monitoring, you need to setup your own monitoring system even on managed kubernetes like EKS, GKE and AKS. I hope you will take the topic of monitoring further with introduction of Prometheus Operator, Grafana Cloud Agent (and GCA Operator), Grafana Operator and perhaps also Grafana Tempo. I would also love to see separate video about VictoriaMetrics that is much better than Prometheus itself.

jirityr

as ever you are rigth, i try loki and wooow! woks perfect with grafana thanks a lot genius!

luismorteo

This could not have come at a better time! Looking forward to part2 with tracing, open telemetry etc. and maybe also cover the maintenance aspects. Prometheus does automatic data purging which makes it maintenance free; how does loki compare with it. With logs the data volumes are going larger and much more workload dependent so one could easily overwhelm the system. Plus some organizations may need log archives to be kept for several years, how loki supports that use case would be interesting to see. My organization uses elastic search. Can loki be a replacement for elastic search today, or in future? The reason I would prefer loki over elastic is because I can co-relate logs with metrics, events and maybe even traces.

In case of java/spring boot based apps, tracing can be very simple to achieve with auto-instrumentation. This would provide great visibility into the working of the application. I am myself exploring it this week.

lhxperimental

Great stuff, but the in my opinion the really tricky part is managing these things at scale. First of all there is the storage aspect, but also Prometheus seems to breakdown when the cluster gets too big. At that point you either need to use a federated setup or something else and it would be useful to hear your thoughts on that.

dmsalomon

After few mo later

Grafana stack extended make more flexible
Grafana tempo + open telemetry for auto instrument + Grafana agent
Grafana loki
Prometheus
Grafana Alertmanger

Basically included
Metrics, log, apm/tracing and alert
Also Grafana able to adding silence by UI so we don’t need expose Prometheus alertmanger to make alert mute

vn

In the latest versions grafana also shows the alert manager alerts and can be silenced from there too (bell icon)

oftheriverinthenight

I concur with others that Tracing is conspicuously absent, as is OpenTelemetry (OTEL), which is the emerging standard that ties all these CNCF pieces together with others such as Fluent Bit

joebowbeer

MELT stack = Monitoring, Event (alerting +OnCall), Logging and Tracing

richarmunicosamaniego

Great video.
Are there any other self-managed logging solutions other than ELK/EFK and Loki-Grafana?

AhmedAyman-gsoz

Thanks Viktor for your nice video & informative, really helpful

RaviSharma-vwpy

How did I miss this video, 2 days wasted. Thanks

SerhiiHromov

Nice explanation! Thank you very much.

azerbaijan

Thanks for this great video, what about black-box exporter ?

samehammar

Monitoring, Logging, And Alerting In Kubernetes

Monitoring, Logging, And Alerting In Kubernetes

Logging, Monitoring, and Alerting in AWS (The TL;DR) - SANS DFIR Summit 2018

Observability vs Monitoring vs APM vs Logging vs Alerting

Azure monitoring and alerting Create view and manage Alerts Using Azure Monitor Metrics LOG Alerts

Monitoring and Logging for DevOps Engineers | Production Best Practices

Cloud Monitoring in a minute

Monitoring, Logging and Alerting - What's the Difference?

Server Monitoring // Prometheus and Grafana Tutorial

AWS Observability as Code: Leveraging Datadog | Indika Wimalasuriya | Conf42 Platform Eng. 2024

Logging & Monitoring - Creating and Monitoring Custom Metrics

How Prometheus Monitoring works | Prometheus Architecture explained

Azure Logging and Monitoring for ISVs Session 3: Visualizations and Alerts

EP. 18 - GCP Cloud Logging And Monitoring

Observability: Metric, Logging, and Tracing, Oh My!

Monitoring, Logging & Alerting

Monitoring, Logging and Alerting — Part 1 | Prometheus

BSidesSF 2018 - Logging, Monitoring, and Alerting in AWS (The TL;DR) (Jonathon Poling)

Azure Application Insights Tutorial | Amazing telemetry service

The type of monitoring/logging/alerting that will make your engineers give up - Gil Zellner

Observability: Monitoring, Logging, Alerting | BeST Practice (6.4.2)

How to monitor app performance with Azure Monitor Application Insights

LMA stack: logging, monitoring and alerting

How to Use Proactive Monitoring and Logging for Early Cybersecurity Threat Detection

Log Analysis with Splunk | How to use Splunk to analyse a Real time Log | Splunk Use Cases | Edureka