Dimension reduction | data science

preview_player
Показать описание
Dimensionality reduction is a common technique used in data science to reduce the number of features in a data set while preserving its important characteristics.
00:10
This can help simplify the data set and make it easier to analyze and visualize.
00:16
Here are some popular dimension reduction methods and techniques, along with a real time example.
00:22
Principal component analysis PCA.
00:26
PCA is a commonly used technique for dimensionality reduction.
00:30
It works by finding the directions of maximum variance in the data and projecting the data onto a smaller number of dimensions.
00:38
For example, if we have a data set with 10 features, PCA can be used to reduce it to two or three principal components that explain the most variance in the data.
00:49
Real time example.
00:51
PCA can be used in image processing to reduce the dimensionality of high resolution images while preserving important features such as edges and textures.
01:02
He distributed Stochastic Neighbor Embedding T sne.
01:06
Tsang is a powerful technique for visualizing high dimensional data.
01:11
It works by reducing the dimensionality of the data while preserving the pairwise distances between points in the original high demensional space.
01:20
Real time example.
01:22
Tsang can be used to visualize the relationship between different types of cancer based on gene expression data.
01:30
Linear Discriminate Analysis LDA.
01:34
LDA is a supervised learning technique that can be used for both classification and dimensionality reduction.
01:41
It works by finding the directions that maximize the separation between classes while minimizing the variance within each class.
01:50
Real time example.
01:52
LDA can be used in speech recognition to reduce the dimensionality of the speech features while preserving the information that is important for distinguishing different words.
02:03
Nonnegative matrix factorization, NMF.
02:07
NF is a matrix factorization technique that can be used for dimensionality reduction and feature extraction.
02:15
It works by decomposing a high dimensional data matrix into two lower dimensional matrices.
02:21
Real time example.
02:23
NMF can be used in topic modeling to extract the most important topics from a large corpus of documents.
02:31
These dimensionality reduction techniques can help data scientists handle high dimensional data sets and extract important information from them in a more efficient and effective manner.
Рекомендации по теме