PCA Principal Component Analysis made easy

preview_player
Показать описание
Questions that we seek answers
1) The need of Dimension Reduction
2) How dimension reduction is different from feature selection ?
3) What is the basic intuition behind principal components?
4) How these Eigen Vectors and Eigen Values Look like ?
5) How to select some of the Principal Components ?
6) When PCA is used?
Рекомендации по теме
Комментарии
Автор

Dimension reduction is really very interesting and intriguing part.

shantanuchakrabory
Автор

1) Dimension needs to be reduced:
a) To make the programs/algorithm efficient in time and space complexity.
b) To give the user a notion of compressed/compact visualization in representing the main components responsible for the maximum variability of the data.
2)
a)In feature selection the selected reduced feature set is actually a subset of the original feature set.However in feature reduction new features are created by multiplying some scaler to the original feature set.
b) As in feature selection the user gets to know and a sense of transperancy is conveyed about the significant features/variables, whereas in feature reduction the user cannot interpret the variables.
3)Intution 1: Covariance matrix * Eigen Vector=Eigen Value * Eigen Vector
Intution2: Covariance matrix when applied on the eigen vector does not change the value of the vector but only scales the eigen vector with the help of eigen value.
4)Eigen Value may be a matrix of 1(row) * m(significant features as columns) whereas, the eigen vector must be of at least m rows. The scale in reduction in eigen value can give us a notion of how the PC would vary in interpreting the variability of the data.
5)We can select principal components by looking through the graph of Dimensions(x -axis) and Expected variance(y-axis).We can stop at a value of dimensions that would account for the maximum amount of variability, lets say at 10th pc(x-axis) the variability is around 97% and after that it remains more or less the same then we can select our principal component to be 10.
6)We can use PC
a)When we want to reduce the number of variables but not sure which of them are to be reduced.
b)When we want variables to be independent/orthogonal of each other.
c)When we want the variables to be less interpret-able.

Thanks again Sir, for this wonderful initiative of yours, in helping us.

arghyakusumdas