How to use PySpark DataFrame API? | DataFrame Operations on Spark

Показать описание

In this tutorial we will continue with PySpark. In the previous session we covered the setup, learned about the basics of PySpark and explored few of the features that it offers for example DataFrame API and Spark SQL. In this session we will further explore these features before we dive into building data pipelines with PySpark (Spark API).
Spark is a distributed engine designed for processing large amount of data. It offers scalability beyond a single machine. If you encounter Pandas memory error due to data size then it is time to explore Spark. It is designed for large data. It is the engine behind the AWS Glue.

Subscribe to our channel:

-------------------------------------------
Follow me on social media!

-------------------------------------------

#apachespark #pyspark #dataframe

Topics covered in this video:
0:00 - Introduction to PySpark
0:28 - Spark in current context of Data
1:16 - Spark DataFrame API
2:22 - Jupyter Notebook
3:00 - Read Data from Database
4:06 - DataFrame API Operations - Rename and Select
4:35 - Sort DataFrame
5:14 - Filter Operation in DataFrame and Spark SQL
7:40 - DataFrame & SQL Join & Aggregate Operation
9:22 - Create new Columns based on condition
11:06 - Replace Null & Drop Columns