Capacity Planning and Estimation: How much data does YouTube store daily?

preview_player
Показать описание
Back-of-the-envelope calculations are often expected in system design questions. They help logically state the parameters influencing a result, and estimating the capacity requires multiple estimations on the way. Also lets us individually state our assumptions.

Eg: Estimate the hardware requirements to set up a system like YouTube.
Eg: Estimate the number of petrol pumps in the city of Mumbai.

Chapters
00:06 Storage Requirements
01:20 Supplementary storage requirements
03:54 Back of Envelope calculations
05:38 Youtube caching estimation
08:58 Youtube video processing estimation
12:14 Conclusion

------STORAGE
Let's start with storage requirements:
About 1 billion active users.
I assume 1/1000 produces a video a day.
Which means 1 million new videos a day.

What's the size of each video?
Assume the average length of a video to be 10 minutes.
Assume a 10 minute video to be of size 1 GB. Or...
A video is a bunch of images. 10 minutes is 600 seconds. Each second has 24 frames. So a video has 25*600 = 150,000 frames.
Each frame is of size 1 MB. Which means (1.5 * 10^5) * (10^6) bytes = 150 GB.
This estimate is very inaccurate, and hence we must either revise our estimate or hope the interviewer corrects us. Normal video of 10 minutes is about 700 MB.

As each video is of about 1GB, we assume the storage requirement per day is 1GB * 1 million = 1 PB.

This is the bare minimum storage requirement to store the original videos. If we want to have redundancy for fault tolerance and performance, we have to store copies. I'll choose 3 copies.
That's 3 petabytes of raw data storage.
What about video formats and encoding? Let's assume a single type of encoding, mp4, and the formats will take a 720p video and store it in 480, 360, 240 and 144p respectively. That means approximately half the video size per codec.

If X is the original storage requirement = 1 PB,
We have X + X/2 + X/4 + X/8 == 2*X.
With redundancy, that's 2X * 3 = 6*X.

That's 6 PB(processed) + 3PB (raw) == 10 PB of data. About 100 hard drives. The cost of this system is about 1 million per day.

For a 3 year plan, we can expect a 1 billion dollar storage price.

Now let's look at the real numbers:
Video upload speed = 3 * 10^4 minutes per minute.
That's 3 * 10^4 *1440 video footage per day = 4.5 * 10^7 minutes.
Video encoding can reduce a 1-hour film to 1 GB. So 1 million GB is the requirement. That's 1 PB.

So the original cost is similar to what the real numbers say.

If we are off by order of magnitude, it's good. However, being off by 3 or more orders of magnitude is too much. We can then highlight the following:
Where our assumption was wrong, or
Which factor we didn't take into account.

References:

System Design Course:

Along with video lectures, this course has architecture diagrams, capacity planning, API contracts and evaluation tests. It's a complete package.

Use the coupon code 'earlybird' for a 20% discount.

Become a channel member!

You can follow me on:

#CapacityPlanning #SystemDesign #YouTube
Рекомендации по теме
Комментарии
Автор

9:30 I am confused 10^7min/60 to convert into hours right? then dividing by 3 is wrong...cz 10^4*1000/60 is what you want to compute and 1000/60 is way far from 1/3 so you should get 1000processors, not 20

prashantgupta
Автор

I was interviewed for Youtube recently, and this was the exact same question I was asked. Gave a similar reply. Love your solution, and the fact that you uploaded this video! Subscribed!

saurabhbhalla
Автор

[1:30, 2:00] Hi Gaurav - the part where you account a multiplier for the storage requirement due to replication across data centers is really smart! I haven't seen this mentioned in many books.

harisridhar
Автор

I was really looking for a way to calculate number of processors based on the bandwidth estimation. And there you have it. Thanks man! Love it. :)

aadhiraakutty
Автор

Here in the above situation we are taking 10*7 mintutes which means all these videos played once in a day. While multiple user play it at a same time. So we have to take its multiple on an average. Let's say per video is played by 1000 users simultaniously. So the time will be 10*7 x 1000.Now the processor count will be changed.
By the way Great explanation. Thanks

AMANSingh-ggnz
Автор

Bang on, this capacity estimation is very accurate and detailed. However I would like to avoid it in the actual system design interview since this estimation will take almost 10-15 mins of your time.
But I will say its important to go through the whole video to capture the essence and use the required details in your interview.

pulkitbMv
Автор

Please upload more such videos! Its much better to calculate, make mistakes and reach the answer than cramming and googling for the answer to questions like this!

KomalSingh-bhzr
Автор

Your brainstorming videos on designing systems and infrastructures are really helpful.

shreyanshsingh
Автор

9:30
1. Total Video Uploads Per Day = 10^7 minutes

2. Convert minutes to hours = Since there are 60 minutes in an hour, the total duration of videos uploaded in a day is
10^7 minutes / 60 ≈ 1.67 x 10^5

3. Calculate Data Size = If we assume that 1 hour of raw video is about 1GB, then the total data uploaded in a day is
1.67 x 10^5 x 1GB

4. Calculate Data Rate = There are 86400 seconds in a day. So, the rate of data upload per second is
(1.67 x 10^5 x 1GB) /86400 ≈ 1.93GB/sec = 1930MB/sec

5. Account for Redundancy and Resolution Processing: In real scenarios, we need to process more data due to redundancy and high resolution. So, we multiply the data rate by 10. This gives us
19300 MB/Sec

6. Calculate Number of Processors: If one processor can process 20MB of data per second, then to process 19300 MB/sec
we would need 19300 MB/Sec / 20 MB = 965 processors

So, to handle 1 days’ worth of YouTube video uploads we would need about 965 processors.

codexamofficial
Автор

So nicely he explains concepts..!!
Thank you so much for gr8 info..!!
Seeing u after so long..!!

Sushil
Автор

This is great stuff...!!! So good to see these videos being accessible easily here.

anish
Автор

This is one of the best video 😍
In terms of system designing 🙏

pythonepointtutorial
Автор

It is a really a good conceptual video, Always like the concepts you pick and showcase

suppuhs
Автор

Great video Gaurav!!
And yeah it would be 1000 processors as 10^7 minutes is hours and not 10^4/3 hours

nishitanand
Автор

Gaurav, thank you for your elaborate work! Cheers 😌

anastasianaumko
Автор

The computational power and storage you estimated is just for uploading, now if you take into account delivering the videos, serving ads, providing recommendations, that the calculation is far by several orders of magnitude

TheSalaho
Автор

Great. Keep up. I like your way of expressing things

SESURAJAPURAMARUL
Автор

Thanks for the upload gaurav :), thanks for the tips to approach such problems

ashutoshpandey
Автор

Your uploads are informative, good job man.

prasant
Автор

Love you bro. you are always there with something new and different from other youtubers. you are real. ❤❤❤❤

alixaprodev