14. Mahalanobis distance with complete example and Python implementation

preview_player
Показать описание
Mahalanobis distance with complete example and Python implementation.

Mahalanobis distance is a distance between a data (vector) and a distribution. It is useful in multivariate anomaly detection, classification as skewed data.

Prasanta Chandra Mahalanobis was Indian scientist and statistician. He founded the Indian statistical institute.

Euclidean distance will work great as long as the features are equally important and are independent to each other.

If the variables are strongly correlated then covariance will be high. If the features of x are not correlated, then the covariance is not high and the distance is more.
Рекомендации по теме
Комментарии
Автор

having read through half a dozen articles on the subject, my pattern recognition text book, several hours of lecture and a handful of videos, I can say with good authority that this is the best explanation of the process I've seen so far.
probably this would have been easier if I'd ever taken linear algebra

bpin
Автор

Wow!! You really simplified this. Just like many I was googling through tons of 'super complicated results.' With this, was able to do it side by side (albeit using numpy). Thanks a lot! Its a pity the website is de-registered.

jbz
Автор

why did you transposw the data in line 19?

MeMonarch
Автор

I tried to follow your example but did not get the same values in your covariance matrix. first I did it in Excel (calculating it manually) and then a second time in python to check myself.

import numpy as np
# suppresses the use of scientific notation
A = [1, 2, 4, 2, 5]
B = [100, 300, 200, 600, 100]
C = [10, 15, 20, 10, 30]
data = np.array([A, B, C])
covMatrix = np.cov(data)
# covMatrix = np.cov(data, bias=True)
print(covMatrix)


Results for the Covariance Matrix are:

[[ 2.7 -110. 13. ]
[ -110. 43000. -900. ]
[ 13. -900. 70. ]]

craiggers
Автор

How to calculate the mahalanobis distance between a point(which lies within the distribution itself) and the centre of the distribution. I need to find the outlier present within the data. will the formula differ?? Can u pls help me in this regard.

robi
Автор

The best video in youtube discussing Mahalanobis. I've one question please, in case I'm using Mahalanobis distance for classification, if I have a binary dataset is this mean that for every unknown object I'll have only 2 distances? Or as Euclidean I'll have n distances, where n is the number of training objects?

AhmedHamed-otez
Автор

This video help me alot! I'm using mahalanobis to calculate the distance between a rgb color and a set of rgb colors, but sometimes the MD² results in a negative number, so my python program crashes when calculates the sqrt. Do you have any idea of what should I do in this case?

diogohalmeida
Автор

Thank you very much for this, it was precisely what I was looking for! Also, your handwritting is beautiful!

Rafaelkenjinagao
Автор

The correct covariance matrix for the data is:
2.26 -88 10.4
-88 34400 -720
10.4 -720 56

Bmahsih-iryx