Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Показать описание

We'll be covering data lakes, parquet file format, data compression and shuffle!

Data with Zach

Рекомендации по теме

Комментарии

This channel is gold for any young data engineer. I wish I could pay you but you're probably already swimming in enough data :D

alonzo_go

Zach! We just started our project where we will be transferring our data to Data Lake in parquet! This is a very timely video. Awesome job, as always!

nobodyinparticula

Great lesson Zach! I have always wondered what the hell a Data Lake is. Great explanations and super easy to understand!

justinwilkinson

great video Zach, awesome content I learnt a Lot. Can you please make a video or share some content about why we should avoid shuffling, shuffling issues and ways to fix it?

rohitdeshmukh

Zach, I watched this while going office, and I loved the way, learnt hell about lot of things.Thanks for it

vivekjha

Great and insightful lessons Zach, just high quality content! Your community of loyal DEs is growing :) Keep up!

theloniusmonkey

Awesome video man! Just discovered your channel and excited to see more like this

andydataguy

Need more of these videos, beginer friendly💡

muhammadzakiahmad

Thanks Zach, the practical you showed helped me learn a lot. Can you please tell if I do daily sorted inserts into my iceberg table from my OLTP system using an ETL pipeline, will Iceberg consider that instance 'exclusive' and compress store it or will it look for common columns in existing data files as well and then compress?

ManishJindalmanisism

Wow - I learned so much from this video - Amazing! Thank you for sharing.

qculryq

amazing class Zach! keep going, thxxx

murilloandradef

Its great Video Zach, thoroughly Enjoyed It

vivekjha

@zach Thanks for this informative video. I have one question. You mentioned about sorting the data on low cardinality columns and then moving towards high cardinality for better RLE which makes sense to get more compressed data. But on the read side taking an example of ICEBERG we generally try to filter data on high cardinality columns and hence use those columns in sorting the data so that we read fewer data and predicate pushdown will really help in reading very small subset of data. Now both these settings contradict each other, on one side we get smaller data but on the other side we are more concerned about using sorting on high cardinality columns.

atifiu

Wow Amazing content Zach
Thank you so much

srinubathina

Wow the way people push vc is creative now good video.

JP-zzql

I have a question, during the whole video you've been dealing with historical data and moving it, what about new data received, how do you deal with it ? do you insert it into some random table then update your iceberg table using some crone jobs or do you insert it directly into iceberg and how?

LMGaming

Hello Zach, thanks for the content, after May, when is the next bootcamp?

pauladataanalyst

Casually ending the gender debate 😂 good video sir! Very informative

zwartepeat

This is amazing . You are a fabulous teacher . Had a question on replication. Is the replication factor not a requirement any more in modern cloud data lakes ?

thoughtfulsd

The tables you are using for your sources... Are those iceberg tables which are really just files and folders in s3 under the hood, placed there before the training? I'm just confused where the raw is coming from and what it looks like.

YEM_

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Apache Iceberg Fundamentals: Course #1 - Introduction

Apache Iceberg 101 Course #3 | Data Lakehouse & Iceberg Explained

Database vs Data Warehouse vs Data Lake | What is the Difference?

Set Up and Use Apache Iceberg Tables on Your Data Lake - AWS Virtual Workshop

Apache Iceberg Tutorial: Learn the Problem & Solution Behind Iceberg's Origin Story

Intro to Apache Iceberg! Apache Iceberg Explained for Beginners!

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

Apache Iceberg vs Apache Hudi vs Delta Lake: Table Format Comparison

Apache Iceberg Explained: A Tutorial with Dremio #shorts

#4 - What is a Data Lake? - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

Apache Iceberg 101: The Who, What and Why of Apache Iceberg

What Table Format Should I Choose For My Data Lake? Hudi | Iceberg | Delta Lake

Apache Iceberg Explained: A Tutorial with Dremio #shorts

Understanding Apache Iceberg architecture | Starburst Academy

From Data Lake to Data Lakehouse (What, Why and How of Apache Iceberg/Dremio/Nessie Lakehouses)

Modern Data Lake Storage Layers

#5 - What is a Data Lakehouse? - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

What Is Apache Iceberg?

#7 - What is Apache Iceberg? - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

Building an Open Data Lake House Using Trino and Apache Iceberg

The top 3 reasons to switch to Apache Iceberg

Managing Data Files In Apache Iceberg

Is THIS the Best Modern Data Format?