How to read CSV, JSON, PARQUET into Spark DataFrame in Microsoft Fabric (Day 5 of 30)

preview_player
Показать описание
Learn Apache Spark in Microsoft Fabric in the 30 days of September.

Spark is the engine behind both the Data Engineering AND the Data Science experiences in Microsoft Fabric, so in September I'll be walking you through Apache Spark: what it is, why you should learn it, how to use it, and how it integrates into Microsoft Fabric.

No previous Spark knowledge is required, some basic Python would be useful!

#pyspark #microsoftfabric #apachespark

Here's the schedule:

Timeline
0:00 In this tutorial
0:47 Exploring the dataset
1:10 Uploading File to Lakehouse
2:09 Read CSV into DataFrame
5:45 Writing DataFrame to JSON
7:15 Reading JSON into DataFrame
7:45 Writing DataFrame to Parquet
9:09 Reading in multiple to DataFrame
10:27 Magic _metadata column
12:08 Links to further learning
12:38 Wrap up

--BROWSE MY OTHER FABRIC PLAYLISTS--


--LINKEDIN--

--ABOUT WILL--
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.

--SUBSCRIBE--
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.
Рекомендации по теме
Комментарии
Автор

i loved this video thank you!!! i really got comfortable with the different elements of reading into a df and writing to different file types!

AmritaOSullivan
Автор

Nice explaination in this video.
Thank you

mohammadaamirkhan
Автор

Fantastic. Love your style with a step by step approach.

Just a question, how about writing to a delta parquet format in a overwrite mode?

adilmajeed
Автор

Is there any way to locate my old notebooks in a more organized manner? While I can view all the notebooks in my personal workspace (or the workspace where I created the notebook), they are currently only sorted by name, type, owner, etc. Is this the only method available?

pphong
Автор

Is there a way to read azure sql server tables directly using apache spark?

RohanAndrewMichigan