Spark ETL with Lakehouse | Apache Iceberg

preview_player
Показать описание
In this video, we are discussing Spark ETL with Lakehouse | Apache Iceberg

Meduim Blob:

Github Repo:

Blogs Link

YouTube Playlist:

We will discuss Spark ETL pipelines with all of the different types of sources. Detailed Plan:

0. Chapter0 - Spark ETL with Files (JSON | Parquet | CSV | ORC | AVRO)
1. Chapter1 - Spark ETL with SQL Database (MySQL | PostgreSQL)
2. Chapter2 - Spark ETL with NoSQL Database (MongoDB)
3. Chapter3 - Spark ETL with Azure (Blob | ADLS)
4. Chapter4 - Spark ETL with AWS (S3 bucket)
5. Chapter5 - Spark ETL with Hive tables
6. Chapter6 - Spark ETL with APIs
7. Chapter7 - Spark ETL with Lakehouse (Delta)
8. Chapter8 - Spark ETL with Lakehouse (HUDI)
9. Chapter9 - Spark ETL with Lakehouse (Apache Iceberg)
10. Chapter10 - Spark ETL with Lakehouse (Delta vs Iceberg vs HUDI)
11. Chapter11 - Spark ETL with Lakehouse (Delta table Optimization)
12. Chapter12 - Spark ETL with Lakehouse (Apache Kafka)
13. Chapter13 - Spark ETL with GCP (Big Query)
14. Chapter 14 - Spark ETL with Hadoop (Apache Sqoop)
Рекомендации по теме
Комментарии
Автор

hey buddy

please don't stop making videos. continue consistantly and your channel will really grow with time. you have good content

SMUser-sjdc
Автор

Very nice mate. Can you please also mention how you installed Iceberg, for example, which version of the JAR file and where it is placed etc.?

Sayan_Mukherjee
Автор

hello nice video. when have many parquet files in a datalake, what is the correct way to deal with them? read all of them and insert rows in an iceberg table o just link an iceberg table to all the parquet file?

ufiepte
Автор

Iceberg is not. a fileformat, it is. a table format. The underlying file can still be in a Parquet format. Please don't give incorrect information.

BlueberretRam