Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

preview_player
Показать описание

#dataengineering #python #microsoftfabric

In this video, I make use of Microsoft Fabric's data engineering experience - specifically the Synapse Data Engineering Notebooks (pySpark engine) to read a JSON file in our Lakehouse Files area, parse the JSON structure, clean the data a bit, transform some of the columns and then LOAD the data into a nice Lakehouse Table.

Next parts of the series include:
- Data validation of pySpark dataframes in Fabric using Great Expectations
- Visualising the weather data in Power BI.

--LINK TO OPENWEATHERMAP--

--OTHER VIDEOS YOU MIGHT LIKE--

--TIMELINE--
0:00 Intro
1:09 Recap on the Lakehouse Files
1:40 Intro to notebook structure
2:12 Reading JSON into pySpark dataframe
4:07 Exploring the data in Azure Storage Explorer & VS Code
5:28 Parsing the JSON
8:52 Datetime conversion
9:55 Calculated columns
11:25 Rounding numbers
12:37 Code refactoring
15:Load dataframe to Lakehouse table
--LINKEDIN--

--ABOUT WILL--
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.

--SUBSCRIBE--
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.
Рекомендации по теме
Комментарии
Автор

Hi Will. Really good material. Keep going!!! Congratulations!!!

woliveiras
Автор

i really like your videos, there are quite simple, short and things are well explained

josuedegbun
Автор

Awesome video!!!!
I understand the data factory pipeline runs daily and loads the daily json file in the lake house folders.

Then the notebook code is extracting the data, transforming it and then loading it to the table. Appending daily.

How is the notebook executed daily?

Thank you!

AmritaOSullivan
Автор

Hi Will. Thanks for your tutorials! Very smooth learning experience. Do you have a sample code for how to loop through YYYY/MM/DD folders and read and then load files incrementally? Also, have you shared your tutorial notebooks on your GitHub by chance? I see only some oldest notebooks there.

gvasvas
Автор

Hey @LearnMicrosoftFabric, why do you prefer Azure Storage Explorer over OneLake File Explorer?

pphong
Автор

Will hi, in Azure Data Factory, transformations were handled by no-code drag and drop buttons. But in Fabric version, there is Power Query like transformations and notebooks. Are these only options for transformation inside Fabric? Thanks

HasanCatalgol
Автор

Another question, currently in the code the json file oath is hard coded. How can that be made dynamic? Thanks!!!

AmritaOSullivan