Correlation vs. Covariance | Standardization of Data | with example in Python/NumPy

preview_player
Показать описание


It is common that multiple feature dimensions in high-dimensional data are not independent. Most of the time, there is a linear relationship, called correlation. We can investigate it by looking at the correlation coefficients, i.e., the off-diagonal elements of the correlation matrix. This matrix is computed as the covariance matrix of standardized data. Standardization refers to the process of first centering the data by their empirical mean and then dividing by the empirical standard deviation in each dimension.

-----
Information on Nonlinear Relationships:

These techniques help because they discover (nonlinear) manifolds embedded in the high dimensional space (e.g., a 2d plane in 3d space).

-------

-------

Timestamps:
00:00 Introduction
01:51 Components of Covariance Matrix
03:38 Estimating the Covariance Matrix
06:37 Limitation of Covariances for dependency
07:12 Correlation instead of Covariance
07:28 Standardization
10:37 Standardized Data Matrix
11:29 Correlation Matrix
12:33 Discussing correlations
14:30 Python: Creating linear dataset
16:12 Python: Concatenate into data matrix
16:51 Python: Pure Covariance of the data
17:48 Python: Standardizing the data
21:22 Python: Using Broadcasting
22:26 Python: Calculating correlation matrix
23:22 Python: Correlation Matrix by NumPy
24:06 Final Remarks on nonlinear dependencies
25:06 Outro
Рекомендации по теме
Комментарии
Автор

Amazing! Thanks a lot! It gives deeper insights.. could you let me know if this interactive plot was made in python? Is this code available in your GitHub?

krithiksingh
Автор

Great content. Your explanations and short demos are terrific.
A great follow up would be explaining the fisher information matrix.
This suggested topic could tie in statistics and machine learning even stronger.

danielnovikov
Автор

future video recommendation: connection to eigenvalues and eigenvectors would be interesting.

orjihvy