Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

Показать описание

True sigma males don't care about your logs

00:00 Introduction
00:55 Functional Requirements
01:42 Capacity Estimates
02:41 Database Design
03:36 Architectural Overview

Jordan has no life

Рекомендации по теме

Комментарии

best part is that you really focus on 'why' aspects of things, keep rocking !!

rahulaga

Thanks for the videos. I think one important functional requirement that most logging solutions offer ( GCP logging ...) is text search. So potentially having a text search engine (ELS) is something to consider.

firoufirou

This video really helped me in one of my interviews, thanks a lot!!

advaitchabukswar

Good one Jordan. You are very clear in your thoughts. Keep this going !! :) Metrics/Logging system is challenging because of both high scale writes/reads. It looks like the write scale here depends on how much Kafka can scale. If we are looking at a very active public service receiving 100 billion msgs/day (1000 msgs/sec), I am guessing Kafka can handle that ? What about read load ? Since lot of people may use the log for customer investigations, there could be a lot of read load on the time series DB since the other path is for batch insights. As I am typing this, I am thinking about splunk. Could you make a video on how to design splunk like system ? (May be these are the building blocks)

prashantbharadwaj

High quality content. Keep doing these videos 👍

saisreenath

Thanks for the amazing video.
In one of the interview I was asked to design flight recorder to record the data within a flight. Could you please make a video on that.

helperclass

awesome content! sorry for skewing your metrics towards the other 97% but that was funny as ...! 😅

raysdev

My suggestion to keep this channel going would be to get ripped and document your fitness journey. Just a thought :D

VyasaVaniGranth

hey Jordan, what prevents us from sending the unstructured data directly from the client to the S3? If we do not care about data enrichment we might as well just send it straight from the client, unless I'm missing something?

also a couple of follow up questions just to clarify it for myself:
- why do we need a logging service, why can't we just push the data from the client straight to the queue?
- as far as I understand we leverage Timeseries DB for queries on relevant "recent" data, so I assume we would need some sort of clean up jobs that run periodically? And we use data warehouse (like Snowflake) to enable analytical queries that would be too big to run on our main DB?

dind

I quite hate system design interviews and regurgitating proper nouns I’ve never engaged with, think I’ve chosen the wrong career

bryanbrianbrian

Thank you for the amazing content! Instead of S3, can we use Cassandra? what would be the trade offs?

ShreyaGupta-nctd

Thank you for the amazing content! Can we use Cassandra instead of S3? What would be the trade offs?

ShreyaGupta-nctd

Thank you, how did you manage to grasp systems design in such a short time? What is your approach of studying?

TheImplemented

Thanks for this!
Is flink consumer just like a normal java/spring queue consumer that is monitoring a AWS kinesis stream? (I've never used flink/kafka.)
Do we have to use flink in conjunction with kafka queues or would any service work?

rajrsa

Hey, how about using Apache Pinot or druid to support better querying capabilities directly on the real time data?

tavneet

Once we have the data in the time series DB, how do you suppose we go about hooking up a monitoring/alerting service to it? I'm not sure what the optimal route is between 1. push based model where for every new metric (or batch) in the time series DB, we query an alarms/rules DB, or 2. pull based model where the alarming service periodically queries the time series DB for all alarms/rules in the DB. 1 seems excessive since majority of real time metrics aren't going to fire an alarm. 2 seems excessive in that most alarms aren't firing at a given instance.

calvio

can we use HDFS instead of S3? that way we'll achieve data locality and will be part of hadoop cluster? - will be cheaper as well?

KathaPatel-om

How can your single leader replication in TimeSeries DB handle the enormous amount of writes ? Won't it be overwhelming for that single leader ?

Piyush-kyee

Hi Jordan - Thanks for this video. Do you mind sharing which Pinterest video you referred to in this design?

sumeet

Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

Distributed Logging System Design | Distributed Logging in Microservices | Systems Design Interview

System Design Mock Interview - Design distributed metrics logging system

System Design Deep Dive - Design logging and monitoring system - 6 Dec 2020

System Design: Metrics and Alarms Service

OPS115 Log Analytics workspace design deep dive

Distributed Tracing and Logging: Deep dive with Forrest Knight

AWS re:Invent 2022 - Observability best practices at Amazon (COP343)

Observability, Distributed Tracing & the Complex World • Dave McAllister • GOTO 2019

Cache Systems Every Developer Should Know

Top K Leaderboard Design Deep Dive with Google SWE! | Systems Design Interview Question 19

Prometheus Intro and Deep Dive - Julius Volz, Björn Rabenstein, Matthias Rampke

Kafka Deep Dive w/ a Ex-Meta Staff Engineer

Distributed Locking Design Deep Dive with Google SWE! | Systems Design Interview Question 24

Kafka in 100 Seconds

Google SWE teaches systems design | EP16: Stream Processing

Building a reliable and scalable metrics aggregation and monitoring system by Vishnu Gajendran

Distributed Message Broker Design Deep Dive with Google SWE! | Systems Design Interview Question 27

Distributed Monitoring: How to understand the Chaos | Gianluca Arbezzano

How Does Linux Boot Process Work?

Elasticsearch for logs and metrics: A deep dive – Velocity 2016, O’REILLY CONFERENCES

Design a metrics monitoring system

Murron: Reliable Logging Pipeline | Slack

MongoDB in 100 Seconds