Euclidean Distance - Practical Machine Learning Tutorial with Python p.15

preview_player
Показать описание
In the previous tutorial, we covered how to use the K Nearest Neighbors algorithm via Scikit-Learn to achieve 95% accuracy in predicting benign vs malignant tumors based on tumor attributes. Now, we're going to dig into how K Nearest Neighbors works so we have a full understanding of the algorithm itself, to better understand when it will and wont work for us.
We will come back to our breast cancer dataset, using it on our custom-made K Nearest Neighbors algorithm and compare it to Scikit-Learn's, but we're going to start off with some very simple data first. K Nearest Neighbors boils down to proximity, not by group, but by individual points. Thus, all this algorithm is actually doing is computing distance between points, and then picking the most popular class of the top K classes of points nearest to it. There are various ways to compute distance on a plane, many of which you can use here, but the most accepted version is Euclidean Distance, named after Euclid, a famous mathematician who is popularly referred to as the father of Geometry, and he definitely wrote the book (The Elements) on it.

Рекомендации по теме
Комментарии
Автор

remember that feeling when the teacher announced to the class that we were watching a movie today?
that's the feeling that these drawing style videos give me!
great work, you really are helping me a lot!

iliasp
Автор

Relieved to see a short video after 22 minutes :P

irock
Автор

from math import sqrt
x = [ 1, 3 ]
y = [ 2, 5 ]
euclidean_distance = sqrt( sum( [(x-y)**2 for x, y in zip(x, y) ] ) )

much easier if using ndarrays

typedfast
Автор

I didn't know Edward Snowden had a YouTube channel. Great )

OtabekJuraev
Автор

Thank you! You made euclidian distance click for me.

Warby
Автор

If I am working with a dataset that continues values in different ranges for instance some are on a scale of 1-100 and others are on a scale in seconds between 4.30 seconds and 5.40 seconds, how would I go about normalising these for use with Euclidean Distance? otherwise, if I'm correct in thinking, that the ED calculation would be thrown off by the numbers in the range of 1-100 being much larger?

tomkmb
Автор

does it matter which point is x1 and x2? or is x1 always the "lowest" point on the x-axis and x2 the "highest" point on the x-axis? Same with y1 and y2?

christoffere
Автор

Theres something beautiful about doing linear algebra instead:

import numpy as np

diff = np.array([1, 3]) - np.array([2, 5])
sqerr = np.sqrt(diff @ diff)

randallwalkerdiaz
Автор

Thanks for your videos. They are very helpful

jaywiji
Автор

Do you use Excalidraw to draw illustrations in this video?

A--_--M
Автор

Is it necessary to compute the sqrt as sqrt(x) > sqrt(y) is equivalent to x > y ? Wouldnt it save some time ? I havent watched your algorithms yet and it might be negligeable but i didnt want to forget to ask that

MrBoubource
Автор

I still dont understand where the numbers for the points are coming from are thoses just numbers to represent the features?

quanmedley
Автор

how to create function to estimate the euclidean distance between two points in any dimension?

manszeho
Автор

A simple way for youngsters to understand this formula is to think about the Pythagorean theorem and how (for 2D, lets keep it simple) you can translate your entire X/Y axis so that O (0, 0) matches one of your points.
For distance purposes it is irrelevant where your Origin is.
So now, If you have two points P1 and P2, and P1 matches the origin, what's the distance to P2 ? You know how to calculate it, using a^2+b^2=c^2.
a is the X coordinate of P2.
b is the Y coordinate of P2.
c is the distance from P1 to P2 (the hypotenuse )
So... c=sqrt(a^2+b^2) = sqrt ( (a-0)^2 + (b-0)^2).
See the zeroes ? That's the P1 coordinates, .
There you have it, Euclid and Pythagoras working together :)

PedroLucas-hkvo
Автор

just curious about whether we could cal the Euclidean distance when all n dimension feature of xi are categorical? since i'm dealing with cell phone scam call data, the raw list r just non-sense, so i code them into categorical variables to grab info out of it, but to use the ADASYN algorithm for imbalance dataset, we need to cal the K based on euclidean distance....

zhengtracey
Автор

hello, if I have 20 predictions to test whether to turn ON or OFF a system, then how would I do the Euclidean Distance? for knn alg

bellsandoor
Автор

I am currently working with this Euclidean distance on a project of mine (I'm new to Machine learning), but I am questioning what you can do with this result once you have calculated the Euclidean distance? a.k.a. What exactly do I have to imagine that this result is? What can this result be used for and in what way?

iamthatoneprogrammer
Автор

How to find out Euclidean distance for mxn genotype matrix data?, this matrix contains binary data. Here m is no.of individuals and n is no.of markers. Please reply soon.

ynteors
Автор

great video man keep doing this awesome work

rahulmodak
Автор

how to use weighted euclidean distance and what is the advantage of the same over simple euclidean distance...
plz help

garvit
welcome to shbcf.ru