Feature Selection and Dimensionality reduction using Covariance Matrix Heatmap

preview_player
Показать описание
This Python tutorial explain how to handle one of the most common issues in Data Science and Data analysis.
It is a Feature Selection and Dimensionality reduction. There are a wide range how to do it, but in this video I demonstrate one of the quickest way that is suitable for both beginners and data scientist, machine learning experts.

It is a data inspection, feature selection from pairplot (made by seaborn) and from heatmap (seaborn and matplotlib).

For implement this solution you must have installed following Python modules:

The content of the video:
0:09 - Introduction and some theory.
1:59 - CODING PART BEGIN. Preparing Python modules.
2:14 - Reading Dataset with Pandas.

Step #1.
2:41 - Inspecting imported dataframe (features).

Step #1.1
2:49 - Selecting Numerical and Dummy (if exists) variables from dataset.

Step #1.2
3:21 - Generate a pairplot with Seaborn.

Step #2 and Step #2.1
3:42 - Variable selection from Covariance Matrix. Scaling features from raw dataset.

Sep 2.2
4:05 - Generate Covariance Matrix with Matplotlib and Seaborn.
5:08 - Selecting cmap (colormap) value for heatmap from Seaborn official documentation.
6:04 - Result. Covariance Matrix showing Correlation coefficients between selected features.

Step # 3.
6:16 - Construct Pandas DataFrame from selected the most important features.
6.45 - The result. Constructed Pandas DataFrame from the most important features.

--------

This video is created to demonstrate an idea how to implement feature engineering for feature selection and dimensionality reduction with very simple dataset.
In real world, please take a strong attention to data pre-processing and data cleaning!

Hoping this useful for data scientist, data analysts and everyone who working with data.

Wishes! - Vytautas.
Рекомендации по теме
Комментарии
Автор

If you liked this video and want to learn more, please check other my videos, especially following ones:

DataScienceGarage
Автор

Hi - Any idea on dimension reduction for categorical variables ?

kapilgupta