Train Machine Learning Model with SparkML (...and Python) | Hands-on tutorial

preview_player
Показать описание
To build and train a Machine Learning (#ML) model with Spark is not hard. With this tutorial we will build a simple Binary Classification ML model with Spark. We will use Logistic Regression built-in Spark algorithm, and then evaluate it by getting performance metrics from the model.

There are some different from we do it in Scikit-Learn. Spark provides a built-in SparkML engine with rich #SparkML API which you can leverage to build your unique Machine Learning model.

In this tutorial we are using SparkUI v.3.2.1 with pyspark-shell.

The critical points you should pay your attention to is:
- Datatypes (DTypes)
- String Indexer and One-Hot-Encoding for categorical features.
- Vector Assembler.

All these parts are explained and demonstrated in details in this tutorial. Also, you will learn what is SparkContext and SparkSession (differences between them). Therefore you will be able to check Data schema and handle data types in Spark DataFrame, selected features within your data. As required for ML modelling, you will also learn how to split your data into train and test sets.

Here you also learn how to setup ML stages with Spark and build a custom ML Pipeline to build your Machine Learning Model with Spark.

At the end, you will learn hot to get model performance metrics, such as Precision, Recall, or ROC curve values.

The tutorial is prepared with Jupyter Notebook, using Python programming language, so all the steps are executed with #pyspark .

The content of the video:
0:00 - Intro
0:32 - Start of Hands-on with Jupyter Notebook
0:46 - 1. Import main dependencies for Spark and Python
1:14 - Theory: Spark Session vs. Spark Context
3:10 - 1. Continuing importing dependencies
3:28 - 2. Load External CSV data to Spark (as Spark DataFrame)
5:40 - 3. Train and Test splits
6:39 - 4. Check Data Types
8:27 - 5. One-Hot-Encoding with Spark
10:07 - Theory: StringIndexer and One-Hot-Encoer
11:01 - 5. Continuing with StringIndexer hands-on
12:19 - 6. Vector Assembling
12:55 - Theory: Vector Assembling in Spark
13:53 - 6. Continuing with Vector Assembling
15:24 - 7. Make Spark ML Pipeline
18:31 - 8. Train ML Model with Spark
20:07 - 9. Get Model Performance Metrics

Spark API and SparkML API method used in the tutorial (incl. documentation):

Thank you for watching!

Please subscribe this channel - @DataScienceGarage to get more high-quality videos about #DataScience , #Python , #AI , #MachineLearning , #DeepLearning and much more!
Рекомендации по теме
Комментарии
Автор

Omg, thank you so much for uploading this video. 💯
This will help me a lot in preparing for my final exam.

ChiNguyen-dzcl
Автор

Thank you for watching this video. I appreciate your time watching it and hoping it was worth your time.
If yes, please subscribe the channel and you will get more HQ videos in Spark, or related Data Science topics in future.

Also, here are some other videos from @DataScienceGarage may you will like:

DataScienceGarage