Realtime Advertisement Clicks Aggregator | System Design

Показать описание

Let’s design a real time advertisement clicks aggregator with Kafka, Flink and Cassandra. We start with a simple design and gradually make it scalable while talking about different trade offs.

🥹 If you found this helpful, follow me online here:

00:00 Why Track & Aggregate Clicks?
01:07 Simple System
02:12 Will it scale?
04:00 Logs, Kafka & Stream Processing
12:02 Database Bottlenecks
17:13 Replace MySQL
18:59 Data Model
25:45 Data Reconciliation
29:00 Offline Batch Process
32:10 Future Videos

#systemDesign #programming #softwareDevelopment

Рекомендации по теме

Комментарии

what I would've done differently: have both warm and cold storages. If your data access pattern is mostly reading data from the last 90 days (pick your number), then store that data in warm storage like Vitess (shared mysql or some distributed relation db). And run a background process periodically that vacuums the stale data from the warm tier and exports it to cold tier like data lakes.

This way your optimising both read query latency and storage cost. Best of both worlds.

indavarapuaneesh

We should use count-min sketch for real time click aggregation on the stream processor, it is going to be very fast and you query data on last minute granularity. A map-reduce system can be useful for exact click information. Clicks can be batched, put into HDFS system, reduced into aggregates and saved on DB.

rishabhjain

Your videos are really really great, no fluff, straight to the topic and covers a lot of details. Thank you and keep it up!

kevindebruyne

I think you deserve a lot more audience! The quality of the contents were really good. Thanks for sharing.

freezefrancis

Thank you so much for this perfect explanation!!

sarthakgupta

Some notes about this design
- Adding more topics is a very vague statement. We have to define the data model to capture each click event and then allow data partitioning based on advertisement_id and some form of timestamp
- Not sure why replication lag is stated as an issue here. The read patterns for this design doesn't require reading consistent data. So this should not be an issue
- Relational DBs won't do well with aggregation queries. This is a little misguiding. Doing aggregation queries efficiently requires storing the data model in a column major format that unlocks efficient compressions and data loading.
- Why provision a stream processing infra to upload data to cold storage . Once a log file reaches X MB, we can place an event in Kafka with a (file_id, offset) pair. There would be a consumer that reads this and uploads the data to s3. This avoids un-necessary dollar cost as well as operational cost of maintaining a stream infrastructure.

protyaybanerjee

@5:35 0.1KB*3B = 3 TB Hi, how is the computation done? I thought 3B is 9 zero; multiply it by 0.1 will get 8 zero. 1 TB is 1e9 KB. Then I thought it would be 0.3 TB. Did I get something wrong?

ax

This wasby farthe best video..thanks for dojng it

pratikjain

Thanks for the effort making this! Very informative and a perfect companion to the system design volume 2 book.

nosh

Thanks for such a clear and detailed explanation.
Could you please share a couple of blogs/articles for reference where companies are using this kind of systems?

PrateekSaini

capture the click with application logging, good idea, main crucks 6:30
21:30

sumonmal

Thanks for the tutorials! I think you're following the topics of the book System Design Interview 2 but using a way that a lot easier to understand. I'm very much struggled with those topics of the book until I came across your tutorials!

weixing

Since am working in adtech and would looking to upgrade our approach to modern, fortunately i got a look into your video and it helps me a lot. My question here is how about to use Clikchouse instead of Casandra, will it work well or lead to any issue?

karthikbidder

Can we use MapReduce for stream processing? Will it meet the latency requirement? Or we have to use some other streaming processors such as Flink/Spark?

tonyliu

That was an awesome video, i had a similar approach and got it validated. I was wondering if you could also start a code series on building such systems (as demonstrated in video).

parthmahajan

Event data stream platform . It’s more complex system, where data is being processed either in real time streams or batch, ETL, data pipelines etc

mohsanabbas

You videos are great!Very clearly articulated!Was curious why do we have to use Nosql DB, if we are storing only the aggregated data based on advertiser ID.What are the drawbacks of using any columnar DB like snowflake in thise case?

roopashastri

Thanks for clearly explaining the end to end design. Just a couple of questions:
1) Could you explain a little bit about how the Apache log files gets the clicks information and how is it realtime.
2) Also, Do you have any link of these notes/Diagram. As the one in description doesn't work.

chetanyaahuja

Correct me if I am wrong, Seems to me more like lambda architecture.. aggregation being fast but inaccurate whereas S3 being slow but accurate

utkarshgupta

We could also keep States in Kafka Stream application (local or Global State) and use Interactive Query to fetch result of the aggregation. Can you please share how to decide whether to offload the aggregation result to external DB vs when to use interactive Query ? I understand that durability can be one factor but what are others ?

VishalThakur-wovx

Realtime Advertisement Clicks Aggregator | System Design

Realtime Advertisement Clicks Aggregator | System Design

System Design Interview: Design an Ad Click Aggregator w/ a Ex-Meta Staff Engineer

realtime advertisement clicks aggregator system design

1. Realtime Ad Click Aggregator | System Design

Meta Interview Question | System Design: Ad Click Counter & Aggregator

System Design: Building a Scalable Ad Click Aggregator (Live Walkthrough!)

Ad Click Aggregator System Design | Step-by-Step Guide

What's Stream Processing + When Do We Use It? | Systems Design Interview 0 to 1 with Ex-Google...

Alex Xu Book Prediction | Chapter 8: Ad Click Prediction on Social Platforms

Ads serving platform system design | system design interview

Uber’s Real-Time Ad Processing System // System Design Review

14: Distributed Logging & Metrics Framework | Systems Design Interview Questions With Ex-Google ...

Super Lightweight Real-time Ingestion Design

Facebook System Design Interview: Design an Analytics Platform (Metrics & Logging)

Amazon Ads Architecture at Scale - ReInvent 2021

Logistics Interview Questions and Answers

Top 3 Things You Should Know About Webhooks!

How to Build Up an Ads Ranking System | Nancy Cheng | Ranking Engineer at Meta

Large Scale Real Time Ad Invalid Traffic Detection with Flink - Fares Sabbagh & Juan Gentile

This is what a System Design Advertisement looks like 😛

News aggregator system design | system design interview

How to Use Hootsuite in 4 Minutes!

End-to-end Exactly-once Aggregation Over Ad Streams | Yelp

This Many Spotify Streams Makes Artists Only $1 🤦🏽‍♂️ #musicmarketing #spotifyplaylist