Design file-sharing system like Google Drive / Dropbox (System design interview with EM)

preview_player
Показать описание
Today's system design mock interview: "Design Google Drive."

Candidate: Alex, engineering manager (ex-Shopify) and now a coach on our platform.

Chapters:
00:00 Intro
00:36 Question "Design a file-sharing system like Dropbox, Google Drive, etc"
00:55 1. Clarifications and requirements
09:51 2. High-level design (components)
13:08 2. High-level design (APIs)
19:20 3. Drill-down (client responsibilities)
22:09 3. Drill-down (schema)
29:41 3. Drill-down (upload flow)
33:53 3. Drill-down (download flow)
36:24 4. Refinements (regionality)
37:45 4. Refinements (S3)
39:22 4. Refinements (CDN)
40:13 4. Refinements (versioning)
41:00 4. Refinements (encryption)
41:45 4. Refinements (database)
42:42 5. Follow-up (read vs write)
44:07 5. Follow-up (folders)
47:35 6. Outro

About us:
IGotAnOffer is the leading career coaching marketplace ambitious professionals turn to for help at high-stakes moments in their career. Get a job, negotiate your salary, get a promotion, plan your next career steps - we've got you covered whenever you need us.

Рекомендации по теме
Комментарии
Автор

I think that this is the best IGotAnOffer video so far. Please bring in Alex for another one - perhaps to design Google maps? Thanks.

evgenirusev
Автор

@31:00 I believe that the rationale for adopting SQL to maintain consistency and reference the CAP theorem is misleading. It's important to note that the concept of consistency in the CAP theorem differs significantly from the consistency defined in ACID. Consistency in CAP means "every read should see the latest write" as opposed to consistency in ACID which ensures that your DB always transits from one valid state to another valid state.

lakshminarayanannandakumar
Автор

Great question asked about compression at 20:55 with a well-structured answer.

DevendraLattu
Автор

Great video, thanks. One thing I'm really missing is some sort of a judgement, what went well, what was not ideal. I see that DB design wasn't really well though out, or maybe it's just me. Sorting such things out as a conclusion to video would be a great value to those who watch these videos!

almirdavletov
Автор

This candidate has real life experience and it shows in the interview. He starts out simple and build on top of it. I love it.

scottlim
Автор

Very useful - Simple, Clear, no hurry, flow is really good

arghyaDance
Автор

I think the main requirement of a file-sharing system is how the edits are handled, something like every edit on a file does not sync the whole file across devices and just the data chunk that was edited, without this requirement, its same as any other design with models and data floating around. Overall it was a great design interview, But one question i have across all the design interviews is the math performed in the beginning wrt to no of users, traffic, QPS etc, how is it even used?

aju
Автор

im genuinely happy to discover this beautiful channel this was very insightful. thank you and keep sharing.

naseredinwesleti
Автор

Is that bit about partitions in S3 accurate? S3 uses a key-based structure where each object is stored with a unique key. The key can include slashes ("/") to create a hierarchy, effectively mimicking a folder structure but there aren't any actual folders in S3; it's all based on the keys you assign to your objects.
So, what does he mean by splitting into more folders when they become too large?

yacovskiv
Автор

Seemed legit to me for the most part, but why would you use a CDN? Unless there are lots of users with certain big files that are the exact same, what benefit would a CDN provide?

RenegadePawn
Автор

Good Video! I think there is a mis-calculation, the total storage use for 100 million users is around 1, 500 Pb, not 1.5pb.

Rajjj
Автор

For 100M users, each user has 15 GB storage space, shouldn't the total storage be 1.5 Exa bytes? Explanation: 100, 000, 000 * 15 / 1000 * 1000 = 1500 PB = 1.5 EB.

prasadhraju
Автор

I am really not able to undersand the file upload usecase in realistic manner. Initially client sent a POST : File /filemetadata to the server.. In response server sent pre validated S3 storage URL as a redirect, which client request redirected to s3 bucket with file directly and write the file into S3. S3 bucket service respond back to the client with S3 bucket file location. :: Now my question is that, how server will store this file location metadata into the server Mysql DB table? Does client will make one more request to POST this metadata to the server?

vivekmit
Автор

Well, from how I see it metadata is more suitable in a no-SQL setup
Because then the file structure can be stored in metadata as lightweight storage for sync ups amongst new devices of the user

and the JSON structure gives the capability of nested folder structure being stored.

So we have the tables - user_details-personal details+meta, user_login, user_folders, and file_meta-with versions.

also, NoSQL is fine as we do not have large volumes of concurrent requests updating on the same keys.
So, locking and transaction atomicity is not an issue.
Also, reads are easily handled.

rsbgarg
Автор

Its great video but I think we missed 2 important things:
1) The file permissions were missing while considering schema design which is "must have" for any file sharing system
2) For very large files how the upload and download can be optimized to save network bandwidth instead of just redirecting to S3.
Please take these inputs positively and keep sharing such videos.🙂

vivekengi
Автор

It is not ok for the client to tell the api server that the upload is done. The client may be unable to tell the api server about the upload being finished due to network outage, so your metadata database will now be out of sync. You will want to keep that logic on the server.

brabebhin
Автор

At 23:50 the guy talks about bigint vs int. I could not find anything on the internet to support his story about youtube database crashing. Does anyone have a link to this story? I would like to learn more

russ
Автор

What about chunking files for upload and using fingerprinting for integrity of data I think it’s totally missed, What about sync service you have offloaded it to notification and client

lucknowi
Автор

Great video. The way he approaches depth shows that he is very strong

AkritiBhat
Автор

I just wonder, if those calculations were not done then would there be any change in the design presented?

adilsheikh
join shbcf.ru