I built data pipelines at Netflix that ran 2000 TBs per day, here’s what I learned about huge data!

preview_player
Показать описание

Use code EARLYSUB30 at checkout to be one of my first 100 paid academy subscribers!

#dataengineering
#netflix
Рекомендации по теме
Комментарии
Автор

I’m so glad I found this video, I was just sitting here with 60 million gigabytes and was figuring out what joins to use so this was perfect timing.

sevrantw
Автор

Can't wait to build hyperscale pipelines for my startup with 0 users

bilbobeutlin
Автор

What I absolutely love about your videos is that as a beginner in the data engineering field, you often talk about things that I had no conception of. In this video for example, I have never heard of SMBs or broadcast joins. This gives me an oppurtunity to learn these things, even hearing them be mentioned from someone as widely experienced as you.

You need not necessarily have to even go into detail, but these short form videos act as beacons of knowledge that I can throw myself into learning about.

Thanks a lot, and keep these coming Zach!

subhasishsarkar
Автор

In the future a wrist watch will have a little blinking light that will have 60 million gigabytes of data in it

supercompooper
Автор

Thanks Zach, hopefully one day I will understand what all of that means

supafiyalaito
Автор

Boyfriend simulator: you sit with your bf and he starts talking about this nerdy stuff you have no idea about but need to keep listening because you love him

lucas.p.f
Автор

I love that you kept it short and to the point.

Bostonaholic
Автор

Holy crap. I’m currently learning about data science, the various roles, etc. —with the hope of one day switching careers. But the current state of learning is all about the languages and software used etc, not about the infrastructure and what to do with massive datasets. So this just 🤯

RichardOles
Автор

Great content, an honour to be able to listen to someone who has handled that volume of data.

rembautimes
Автор

2 pita bites a day, the same as me when I’m on a diet.😊

JGComments
Автор

Thank you Zach for taking the time to give us the hard truth and hands down your experience. It helps a lot of enthuastic students/people to know how we can in some way support or help others in the subjects we like. I don't imagine myself processing 2000TBs per day, but it helps give a bigger picture. Once again, appreciate the short video and thank you for sharing

WM-eggh
Автор

I am a regional IT installer who runs Cat6 Ethernet pipelines for managing 1gb loads on HP laptops, this video is really awesome and breaks down your workflow and mindset in a complicated field really efficiently. I would love to get more short videos about the industry like this.

jacobp
Автор

If you come across a scenario to join 2 large datasets. You could do an iterative broadcast join. Basically you are going the break one of the df into multiple dfs and join the dataframe in a loop till all the multiple dfs are joined.

ArjunRajaS
Автор

Half of what you said I had no idea what you were taking about but I was very engaged and now I’m gonna look all this stuff up for centering my div!

oakleyorbit
Автор

Thanks for the info Zach. Could you please make an elaboriative video on SMB join.

rohanbhakat
Автор

Just started following you. Really appreciate you for sharing your knowledge with the community.

mohammedaamer
Автор

I've never heard of these terms, thank you sharing your real case scenarios(The FB notification example)

SahilKashyap
Автор

In the 37 years I’ve been working in data, I’ve never heard anyone call it Peter 😂. PETA

dazzassti
Автор

I'd like to learn more about these pitabytes. What are they? What do they taste like?

hearhaw
Автор

The amount of knowledge you shared here is astonishing

nikolagrkovic