filmov
tv
Using Fabric notebooks (pySpark) to clean and transform real-world JSON data
Показать описание
#dataengineering #python #microsoftfabric
In this video, I make use of Microsoft Fabric's data engineering experience - specifically the Synapse Data Engineering Notebooks (pySpark engine) to read a JSON file in our Lakehouse Files area, parse the JSON structure, clean the data a bit, transform some of the columns and then LOAD the data into a nice Lakehouse Table.
Next parts of the series include:
- Data validation of pySpark dataframes in Fabric using Great Expectations
- Visualising the weather data in Power BI.
--LINK TO OPENWEATHERMAP--
--OTHER VIDEOS YOU MIGHT LIKE--
--TIMELINE--
0:00 Intro
1:09 Recap on the Lakehouse Files
1:40 Intro to notebook structure
2:12 Reading JSON into pySpark dataframe
4:07 Exploring the data in Azure Storage Explorer & VS Code
5:28 Parsing the JSON
8:52 Datetime conversion
9:55 Calculated columns
11:25 Rounding numbers
12:37 Code refactoring
15:Load dataframe to Lakehouse table
--LINKEDIN--
--ABOUT WILL--
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.
--SUBSCRIBE--
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.
Комментарии