I built data pipelines at Netflix that ran 2000 TBs per day, here’s what I learned about huge data!

Показать описание

Use code EARLYSUB30 at checkout to be one of my first 100 paid academy subscribers!

#dataengineering
#netflix

Data with Zach

Рекомендации по теме

Комментарии

I’m so glad I found this video, I was just sitting here with 60 million gigabytes and was figuring out what joins to use so this was perfect timing.

sevrantw

Can't wait to build hyperscale pipelines for my startup with 0 users

bilbobeutlin

What I absolutely love about your videos is that as a beginner in the data engineering field, you often talk about things that I had no conception of. In this video for example, I have never heard of SMBs or broadcast joins. This gives me an oppurtunity to learn these things, even hearing them be mentioned from someone as widely experienced as you.

You need not necessarily have to even go into detail, but these short form videos act as beacons of knowledge that I can throw myself into learning about.

Thanks a lot, and keep these coming Zach!

subhasishsarkar

In the future a wrist watch will have a little blinking light that will have 60 million gigabytes of data in it

supercompooper

Thanks Zach, hopefully one day I will understand what all of that means

supafiyalaito

Boyfriend simulator: you sit with your bf and he starts talking about this nerdy stuff you have no idea about but need to keep listening because you love him

lucas.p.f

I love that you kept it short and to the point.

Bostonaholic

Holy crap. I’m currently learning about data science, the various roles, etc. —with the hope of one day switching careers. But the current state of learning is all about the languages and software used etc, not about the infrastructure and what to do with massive datasets. So this just 🤯

RichardOles

Great content, an honour to be able to listen to someone who has handled that volume of data.

rembautimes

2 pita bites a day, the same as me when I’m on a diet.😊

JGComments

Thank you Zach for taking the time to give us the hard truth and hands down your experience. It helps a lot of enthuastic students/people to know how we can in some way support or help others in the subjects we like. I don't imagine myself processing 2000TBs per day, but it helps give a bigger picture. Once again, appreciate the short video and thank you for sharing

WM-eggh

I am a regional IT installer who runs Cat6 Ethernet pipelines for managing 1gb loads on HP laptops, this video is really awesome and breaks down your workflow and mindset in a complicated field really efficiently. I would love to get more short videos about the industry like this.

jacobp

If you come across a scenario to join 2 large datasets. You could do an iterative broadcast join. Basically you are going the break one of the df into multiple dfs and join the dataframe in a loop till all the multiple dfs are joined.

ArjunRajaS

Half of what you said I had no idea what you were taking about but I was very engaged and now I’m gonna look all this stuff up for centering my div!

oakleyorbit

Thanks for the info Zach. Could you please make an elaboriative video on SMB join.

rohanbhakat

Just started following you. Really appreciate you for sharing your knowledge with the community.

mohammedaamer

I've never heard of these terms, thank you sharing your real case scenarios(The FB notification example)

SahilKashyap

In the 37 years I’ve been working in data, I’ve never heard anyone call it Peter 😂. PETA

dazzassti

I'd like to learn more about these pitabytes. What are they? What do they taste like?

hearhaw

The amount of knowledge you shared here is astonishing

nikolagrkovic

I built data pipelines at Netflix that ran 2000 TBs per day, here’s what I learned about huge data!

I built data pipelines at Netflix that ran 2000 TBs per day, here’s what I learned about huge data!...

What is Data Pipeline? | Why Is It So Popular?

Data Pipelines Explained

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

What are Data Pipelines?

How To DESIGN YOUR First DATA PIPELINE ??🔥 15 Minutes BASIC STEPS

How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL

Code along - build an ELT Pipeline in 1 Hour (dbt, Snowflake, Airflow)

How to Build AI-Powered Data Pipelines and Put Them Into Action | by@InformaticaCorp

What is ETL Pipeline? | ETL Pipeline Tutorial | How to Build ETL Pipeline | Simplilearn

What the HECK is a “Data Pipeline”? 👩🏻‍🔧📊🪠

The BEST library for building Data Pipelines...

Back to Basics: Building an Event Driven Serverless ETL Pipeline on AWS

A Modern Data Pipeline in Action (Cloud Next '18)

Data Pipelines: Introduction to Streaming Data Pipelines

Building a Real-Time Data Pipeline with PySpark, Kafka, and Redshift | By Darshil Parmar

How Data Engineering Works

ETL Data Pipelines Explained

5 Steps to Build a Scalable Data Analytics Pipeline

How to build a data pipeline with Google Cloud

7 steps to build your own data pipeline

What Is A Data Pipeline - Data Engineering 101 (FT. Alexey from @DataTalksClub )

Building Robust and Scalable Data Pipelines with Kafka

Robust Foundation for Data Pipelines at Scale - Lessons from Netflix