From Scratch: How to Code K-Nearest Neighbors in Python for Machine Learning Interviews!

Показать описание

Code K-Nearest Neighbors from Scratch in Python (No Sklearn) | Machine Learning Interviews and Data Science Interviews

🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
0:00 Intro
0:43 KNN Overview
1:23 How KNN Works
2:52 Implementation
6:29 Space and time complexity
8:07 How to select K

Рекомендации по теме

Комментарии

Hey Emma, appreciate the time and effort creating amazing content. Your channel has helped me to get a DS offer from a top tech company, the AB testing series is intuitive. Needless to say, ML-related topics aren't complex as other sources, they are easy to understand and the implementation part is awesome. Looking forward to watching more ML-related videos!

stellawww

time complexity is not O(MN) as N is actually constant. Thus, complexity is O(M) + O(M log M) = O(M log M)

also it is unnecessary to save all distances, only the top k matter, for which a constant k-size heap could be used. Thus space complexity is constant.

techedu

Thanks Emma! This is helpful! Looking forward to the videos on optimizing naive KNN algo and more ML algo!

shanggao

I really appreciate your effort in preparing these contents! By far, going over your videos is the most efficient way for me to re-cap the key concepts to prepare for data science interviews. Thanks a lot!

hankyujang

log(n_samples) is not necessarily larger than n_features. 2**20 ~ 1 million, 2**30 ~ 1 billion. we could easily have more than 30 features

ZhensongRen

Thanks a lot for your content and efforts. Your videos are very helpful for the revision. Looking for more videos on other algos.

ashritkulkarni

Thank you Emma! Great content as always. It was so hard to find reliable resources to learn MLE interviews nowsaday haha

MinhNguyen-lzpg

Amaizing content, thank you so much 🤟

bettysi

I think the predict function is only for one point prediction. I have made some updates to have it for a dataset:
def predict(self, data, k):
predict_output = []
for point in data:
distance_label = [
(self.get_distance(point, train_point), train_label)
for train_point, train_label in zip(self.x, self.y)
]
neighbors = sorted(distance_label)[:k]
for _, label in neighbors) / k)
return predict_output

licdad

Hi Emma, thank you so much for making awesome videos! It helped me a lot!

nanwang

Isn't the space complexity from the distance_label array going to be O(m) + O(n), since we are first calculating the distance for each feature m, then summing it into a single value, then storing that value for each point in the training set n?

leolloyd

Hi Emma, thank you so much for all your videos, they are all super helpful! Can you please do more ML and Python coding videos in the future?

jennywu