Master Databricks and Apache Spark Step by Step: Lesson 21 - PySpark Using RDDs

preview_player
Показать описание
In this video, we use PySpark to analyze data with Resilient Distributed Datasets (RDD). RDDs are the foundation of Spark. You learn what RDDs are, what Lazy Evaluation is and why it matters, and how to use Transformations and Actions. Everything is demonstrated using a Databricks notebook.

Video slides and code at:

Apache Spark Transformations Docs

Apache Spark Actions Docs

Apache Spark RDD

For information on how to upload files to Databricks see:
Рекомендации по теме
Комментарии
Автор

Can't wait for more of your videos on PySpark!

anthonygonsalvis
Автор

Bryan, thanks for the series. But expecting more explanation on parallelize, partitions etc which seem to be the very purpose of using Spark. Many training videos just explain the pyspark code for reading parsing Data frames etc...but how to really parallelize a big data ? What are partitions and how to partition ? Can you please explain these more ?

Raaj_ML
Автор

Golden Content and a grand series!
quick question, What would be the difference between a simple SQL statement and a pyspark's spark-sql statement? Both seem to launch spark jobs when executed in databricks. Would they both leverage distributed computing?

annukumar
Автор

Hey brayn thankyou so much for this series. I have a question whats the difference between spark session and spark context.

itsshehri