filmov
tv
Correlation vs. Covariance | Standardization of Data | with example in Python/NumPy

Показать описание
It is common that multiple feature dimensions in high-dimensional data are not independent. Most of the time, there is a linear relationship, called correlation. We can investigate it by looking at the correlation coefficients, i.e., the off-diagonal elements of the correlation matrix. This matrix is computed as the covariance matrix of standardized data. Standardization refers to the process of first centering the data by their empirical mean and then dividing by the empirical standard deviation in each dimension.
-----
Information on Nonlinear Relationships:
These techniques help because they discover (nonlinear) manifolds embedded in the high dimensional space (e.g., a 2d plane in 3d space).
-------
-------
Timestamps:
00:00 Introduction
01:51 Components of Covariance Matrix
03:38 Estimating the Covariance Matrix
06:37 Limitation of Covariances for dependency
07:12 Correlation instead of Covariance
07:28 Standardization
10:37 Standardized Data Matrix
11:29 Correlation Matrix
12:33 Discussing correlations
14:30 Python: Creating linear dataset
16:12 Python: Concatenate into data matrix
16:51 Python: Pure Covariance of the data
17:48 Python: Standardizing the data
21:22 Python: Using Broadcasting
22:26 Python: Calculating correlation matrix
23:22 Python: Correlation Matrix by NumPy
24:06 Final Remarks on nonlinear dependencies
25:06 Outro
Комментарии