K-Nearest Neighbors Clearly Explained with Practical Demonstration | IRIS | Machine Learning | 2021

Показать описание

K-Nearest neighbors is a supervised machine learning algorithm composed of a specified labeled dataset accommodating training sets (x, y) and would like to represent the correlation between x and y.
The objective of KNN is to find out a function h that maps the input x to output y and hence for a new test point x, h(x) can assertively deduce the equivalent output y.
1. In KNN classification, a new test point is categorized by greater number of votes of its neighbors, with the test point being allocated to the category most available among its k nearest neighbors.
2. If k=1, then new point is assigned to the category of its only nearest neighbor.
Now, the feat of KNN classifier is mostly hinges on the distance metric employed to recognize the k nearest neighbors of a test point. The most popular distance metrics used in KNN are Euclidean distance, Manhattan distance, Minkowski distance, Hamming distance.
Again, the most routinely employed one is the Euclidean distance. If point P1 with coordinates x1, y1 and point P2 with coordinates x2, y2; then the Euclidean distance between them is calculated by the formula d equal root over x2 minus x1 whole square plus y2 minus y1 whole square.
For a specified number of nearest neighbors, k, and a distance metric d, a KNN classifier executes three steps to classify an unidentified test point, x.
1. First, it explores throughout the whole dataset and calculates distance metric d between x and all the nearest neighbors adjoining to x.
2. Then, it counts the number of data points for every category among these k nearest neighbors.
3. And finally, the unidentified test point x is allocated to the category with maximum nearest neighbors.
Now, let’s explore this concept by an example. In this diagram, If you choose, k = 3 the test sample which is green polygon will be allocated to the category of blue circle as among those 3 nearest neighbors, 2 are from the blue circle. On the other hand, if you choose k = 5, then the test sample will be assigned to the category of yellow square as among those 5 nearest neighbors, 3 are from the yellow square.
Let’s demonstrate k nearest neighbors in google colab. We will define a KNN Classifier to Classify 3 types of flowers from the iris data set. So, let’s go there.

First, we like to import pandas library. We are importing pandas because it is the most popular python library which is used for data manipulation and analysis. Next, we like to import numpy library because it helps us to represent our data as n-dimensional array quickly and efficiently. We also need to import matplotlib library to plot figures easily. Pyplot functions in matplotlib help us to create and decorate figures. Lastly, we have to import seaborn which is a data visualization library based on matplotlib in python.

#KNN #MachineLearning #IrisDataset #KNNAlgorithm #KNNClassifier #KNN2021 #open-instruction