From Scratch: How to Code K-Nearest Neighbors in Python for Machine Learning Interviews!

preview_player
Показать описание
Code K-Nearest Neighbors from Scratch in Python (No Sklearn) | Machine Learning Interviews and Data Science Interviews

🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
0:00 Intro
0:43 KNN Overview
1:23 How KNN Works
2:52 Implementation
6:29 Space and time complexity
8:07 How to select K
Рекомендации по теме
Комментарии
Автор

Hey Emma, appreciate the time and effort creating amazing content. Your channel has helped me to get a DS offer from a top tech company, the AB testing series is intuitive. Needless to say, ML-related topics aren't complex as other sources, they are easy to understand and the implementation part is awesome. Looking forward to watching more ML-related videos!

stellawww
Автор

time complexity is not O(MN) as N is actually constant. Thus, complexity is O(M) + O(M log M) = O(M log M)

also it is unnecessary to save all distances, only the top k matter, for which a constant k-size heap could be used. Thus space complexity is constant.

techedu
Автор

Thanks Emma! This is helpful! Looking forward to the videos on optimizing naive KNN algo and more ML algo!

shanggao
Автор

I really appreciate your effort in preparing these contents! By far, going over your videos is the most efficient way for me to re-cap the key concepts to prepare for data science interviews. Thanks a lot!

hankyujang
Автор

log(n_samples) is not necessarily larger than n_features. 2**20 ~ 1 million, 2**30 ~ 1 billion. we could easily have more than 30 features

ZhensongRen
Автор

Thanks a lot for your content and efforts. Your videos are very helpful for the revision. Looking for more videos on other algos.

ashritkulkarni
Автор

Thank you Emma! Great content as always. It was so hard to find reliable resources to learn MLE interviews nowsaday haha

MinhNguyen-lzpg
Автор

Amaizing content, thank you so much 🤟

bettysi
Автор

I think the predict function is only for one point prediction. I have made some updates to have it for a dataset:
def predict(self, data, k):
predict_output = []
for point in data:
distance_label = [
(self.get_distance(point, train_point), train_label)
for train_point, train_label in zip(self.x, self.y)
]
neighbors = sorted(distance_label)[:k]
for _, label in neighbors) / k)
return predict_output

licdad
Автор

Hi Emma, thank you so much for making awesome videos! It helped me a lot!

nanwang
Автор

Isn't the space complexity from the distance_label array going to be O(m) + O(n), since we are first calculating the distance for each feature m, then summing it into a single value, then storing that value for each point in the training set n?

leolloyd
Автор

Hi Emma, thank you so much for all your videos, they are all super helpful! Can you please do more ML and Python coding videos in the future?

jennywu
Автор

Great video as always Emma! Do you think in new grad interviews they ask about A/B testing if its not in the job description?

SuperLOLABC
Автор

Hi Emma, thank you so much for your videos! I learned so much from them. Can you do one on Decision Tree ?

CC-jirb
Автор

Would it be possible to use a min heap instead of sorting the points by distance (and implement this in linear time instead of NlogN)?

roguenoir
Автор

nice video, may I get the source code of this video.

likhithadusanapudi
Автор

The interviewer ask me about this. It is quite embarrassed to give the wrong answer

Ragnarik
Автор

Am I missing self.distance somewhere? Thank you!

jiayiwu
welcome to shbcf.ru