Ingest data with Spark and Microsoft Fabric notebooks | Lab 10

preview_player
Показать описание
Discover how to use Apache Spark and Python for data ingestion into a Microsoft Fabric lakehouse. Fabric notebooks provide a scalable and systematic solution.
In this lab, you’ll create a Microsoft Fabric notebook and use PySpark to connect to an Azure Blob Storage path, then load the data into a lakehouse using write optimizations.
For this experience, you’ll build the code across multiple notebook code cells, which may not reflect how you will do it in your environment; however, it can be useful for debugging.
Because you’re also working with a sample dataset, the optimization doesn’t reflect what you may see in production at scale; however, you can still see improvement and when every millisecond counts, optimization is key.

The video is based on Microsoft Lab which supports "Ingest data with Spark and Microsoft Fabric notebooks" module in Microsoft Learn (see links below).

What you'll find in this video:
00:00 Intro
00:57 Lakehouse destination
02:43 Create notebook and load external data
10:54 Transform and load data to a Delta table
17:48 Analyze Delta table data with SQL queries
20:06 Summary

*** Useful links: ***

*** Socials: ***

*** Hungry more? Learn with me ***

If you're reading this, please leave a like and comment for the algorithm.
Рекомендации по теме
Комментарии
Автор

Hi Kamil

Thanks for the video, I have a requirement where my data source is on a cloud service and I can only connect to it by having a gateway machine with ODBC installed and then I can extract the data to Fabric.

Is there a way to connect directly to the cloud service by using notebook instead of having the gateway.

Kindly Advise

hmhkh