How to read csv file in PySpark dataframe | Read csv in Google colab using pyspark example code

preview_player
Показать описание
#pyspark #googlecolab #pandas #jupyternotebook #databricks

If you just follow the same code, it would be enough to read csv file in databricks using pyspark and also jupyter notebook. The .csv file name and path can vary as per the user.

There are some other methods in pyspark to read csv files, but for this specific video, I am demonstrating it with the most basic and simple PySpark commands. It is also possible to perform same task in python using Pandas library. There are some minor changes you need to make to read csv file in google colab using Pandas. I will make a separate video to cover that topic.

• pyspark code to read csv file

spark = SparkSession.Builder().master("master_name").appName("app_name").getOrCreate()

Jump directly to the particular topic using below Timestamps:

0:00 - Introduction
0:57 - How to create SparkSession
2:44 - How to read csv in dataframe
3:50 - import file in google colab
4:47 - Copy csv file path
5:15 - Display dataframe created from csv
6:21 - PySpark dataframe schema

I am using the exact code for read csv pyspark example taken in this vide.

For this particular example, I have used a .csv file that is already provided under Google Colab files folder in sample data folder. But, it is also possible to read csv file in google colab from desktop that you can find in other video from my Youtube channel @datawithvedant.

Moreover, you can read data from google drive in colab, this can be achieved once you mount drive on google colab. There is a small code for mounting drive in google colab so you can acess each and every file from google drive. It is possible from UI as well that you can find on my channel.
Рекомендации по теме
Комментарии
Автор

Wanted to thank you for helping me today.

ankurtyagi