Ingest data with Spark and Microsoft Fabric notebooks | Lab 10

Показать описание

Discover how to use Apache Spark and Python for data ingestion into a Microsoft Fabric lakehouse. Fabric notebooks provide a scalable and systematic solution.
In this lab, you’ll create a Microsoft Fabric notebook and use PySpark to connect to an Azure Blob Storage path, then load the data into a lakehouse using write optimizations.
For this experience, you’ll build the code across multiple notebook code cells, which may not reflect how you will do it in your environment; however, it can be useful for debugging.
Because you’re also working with a sample dataset, the optimization doesn’t reflect what you may see in production at scale; however, you can still see improvement and when every millisecond counts, optimization is key.

The video is based on Microsoft Lab which supports "Ingest data with Spark and Microsoft Fabric notebooks" module in Microsoft Learn (see links below).

What you'll find in this video:
00:00 Intro
00:57 Lakehouse destination
02:43 Create notebook and load external data
10:54 Transform and load data to a Delta table
17:48 Analyze Delta table data with SQL queries
20:06 Summary

*** Useful links: ***

*** Socials: ***

*** Hungry more? Learn with me ***

If you're reading this, please leave a like and comment for the algorithm.

Рекомендации по теме

Комментарии

Hi Kamil

Thanks for the video, I have a requirement where my data source is on a cloud service and I can only connect to it by having a gateway machine with ODBC installed and then I can extract the data to Fabric.

Is there a way to connect directly to the cloud service by using notebook instead of having the gateway.

Kindly Advise

hmhkh

Ingest data with Spark and Microsoft Fabric notebooks | Lab 10

Ingest data with Spark and Microsoft Fabric notebooks | Lab 10

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

Processing 2000 TBs per day of network data at Netflix with Spark and Airflow

Parallel table ingestion with a Spark Notebook (PySpark + Threading)

XML Data Ingestion with Spark on Databricks

HADOOP + PYSPARK + PYTHON + LINUX tutorial || by Mr. N. Vijay Sunder Sagar On 18-01-2025 @4:30PM IST

Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal (Paytm)

Near Real Time Analytics with Apache Spark: Ingestion, ETL, and Interactive QueriesBrandon Hamric Ev

Spark Streaming Master Class: Ingest data from Kafka to Delta with Spark Streaming

Scaling Data and ML with Apache Spark and Feast

Learn Apache Spark in 10 Minutes | Step by Step Guide

Spark Streaming Master Class: Ingest data from Kafka to Delta with Spark Streaming

Spark Structured Streaming as a Batch Job? File based data ingestion benefits from pseudo streaming?

Scalable Data Ingestion Architecture Using Airflow and Spark | Komodo Health

Real-Time Twitter Data Ingestion With Kafka & Spark

Extending Apache Spark's Ingestion: Building Your Own Java Data Source - Jean Georges Perrin

How to Performance-Tune Apache Spark Applications in Large Clusters

Storage Engine Considerations for Your Apache Spark Applications - Mladen Kovacevic

Database vs Data Warehouse vs Data Lake | What is the Difference?

Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale

Getting data using Spark And saving It to Hive Table.