Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

Показать описание

#dataengineering #python #microsoftfabric

In this video, I make use of Microsoft Fabric's data engineering experience - specifically the Synapse Data Engineering Notebooks (pySpark engine) to read a JSON file in our Lakehouse Files area, parse the JSON structure, clean the data a bit, transform some of the columns and then LOAD the data into a nice Lakehouse Table.

Next parts of the series include:
- Data validation of pySpark dataframes in Fabric using Great Expectations
- Visualising the weather data in Power BI.

--LINK TO OPENWEATHERMAP--

--OTHER VIDEOS YOU MIGHT LIKE--

--TIMELINE--
0:00 Intro
1:09 Recap on the Lakehouse Files
1:40 Intro to notebook structure
2:12 Reading JSON into pySpark dataframe
4:07 Exploring the data in Azure Storage Explorer & VS Code
5:28 Parsing the JSON
8:52 Datetime conversion
9:55 Calculated columns
11:25 Rounding numbers
12:37 Code refactoring
15:Load dataframe to Lakehouse table
--LINKEDIN--

--ABOUT WILL--
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.

--SUBSCRIBE--
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.

Рекомендации по теме

Комментарии

Hi Will. Really good material. Keep going!!! Congratulations!!!

woliveiras

i really like your videos, there are quite simple, short and things are well explained

josuedegbun

Awesome video!!!!
I understand the data factory pipeline runs daily and loads the daily json file in the lake house folders.

Then the notebook code is extracting the data, transforming it and then loading it to the table. Appending daily.

How is the notebook executed daily?

Thank you!

AmritaOSullivan

Hi Will. Thanks for your tutorials! Very smooth learning experience. Do you have a sample code for how to loop through YYYY/MM/DD folders and read and then load files incrementally? Also, have you shared your tutorial notebooks on your GitHub by chance? I see only some oldest notebooks there.

gvasvas

Hey @LearnMicrosoftFabric, why do you prefer Azure Storage Explorer over OneLake File Explorer?

pphong

Will hi, in Azure Data Factory, transformations were handled by no-code drag and drop buttons. But in Fabric version, there is Power Query like transformations and notebooks. Are these only options for transformation inside Fabric? Thanks

HasanCatalgol

Another question, currently in the code the json file oath is hard coded. How can that be made dynamic? Thanks!!!

AmritaOSullivan

Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

Let's look at using some PySpark in your first Microsoft Fabric Notebook

Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

7. How To Create NOTEBOOK in Microsoft Fabric | Apache Spark | PySpark

Microsoft Fabric Notebooks - Showcase with advanced features

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

Microsoft Fabric: How to load data in Lakehouse using Spark; Python using the notebook

Get and use Python Libraries within Microsoft Fabric

Spark Tutorial in Microsoft Fabric (3.5 HOURS!)

ML based Autotune for Apache Spark Jobs in MS Fabric performance optimization for recurrent jobs

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

13. Microsoft Fabric Lakehouse | Convert Files to Delta Table | Using Notebooks | Append / Overwrite

Microsoft Fabric Spark Notebook - Learn PySpark and SparkSQL in 2hr(Beginners Course) #microsoft

PySpark in Microsoft Fabric - Introduction (Ep. 1)

Inner Join SQL Query in Fabric Notebook using Spark

Microsoft Fabric: Processing Bronze to Silver using Fabric Notebooks

Microsoft Fabric Spark integration and VS Code

Learn Together: Ingest data with Spark and Microsoft Fabric notebooks

How to Use Apache Spark in Microsoft Fabric? | MS Fabric Tutorial #fabric #powerbi #apachespark

Top 5 tips to save up development time with MS Fabric Notebooks

Notebooks Loading Tables From Lakehouse - Let's Learn Fabric - ep.16

Synapse Espresso: Notebooks vs Apache Spark Jobs Definitions: which one should I use in Spark Pools?

How to work with PySpark & notebooks in Microsoft Fabric? | DP-600 Ep03 #dp600 #microsoftfabric

Microsoft Fabric - How to use Notebooks

How to Install PySpark in Visual Studio Code (Easy)