K-means Clustering From Scratch In Python [Machine Learning Tutorial]

preview_player
Показать описание
In this project, we'll build a k-means clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering.

We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn.

Project Steps
- Write out pseudocode for the algorithm
- Code the k-means algorithm
- Plot the clusters from the algorithm
- Compare performance to the scikit-learn algorithm

Chapters

00:00 Intro
00:37 k-means overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting k-means iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikit-learn
37:56 Conclusion and next steps

------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Рекомендации по теме
Комментарии
Автор

Outstanding! Thank you, man! This really helped me do my masters thesis. I really appreciate that you explained every small step, and used as much visuals as possible, and focused on us being able to learn!
- In case others run into the same problem: With Scikit K-means, when using the fit(data) function, I got an "split" error message. (attributeerror: 'nonetype' object has no attribute 'split'). I checked my BLAS, and updated through conda all libraries, then shut everything down and opened again, and this resolved the problem, but it took a long time.

(I asked chatgpt for help)

MarianneHMiettinen
Автор

Thank you, sir. This is how tutorials should be conducted: with in-depth explanations, step-by-step implementation, and the release of all code and datasheets to enable everyone to practice and advance their own personal projects. Congrats!

maleck
Автор

This was amazing. Brilliantly explained, demonstrated and presented clearly. Helped me so much with my current bootcamp task. Thank you.

animal
Автор

From the bottom of my heart; thank you. This was so clear and easily understandable, fantastic video!

stevenlomon
Автор

Terrific implementation! I also really liked the way you used PCA for iteritive visualization... Nicely done

TimHerrin
Автор

Your explanation is absolutely clear. You have best knowledge. Keep posting new topics and encourage us ❤

Risewithvishwas
Автор

it's very great job, the only one in youtube that explain every place of code 👍👍

allaguimaouia
Автор

One of the best tutorials on the internet, thank you.

mo_l
Автор

This is a nice and powerful way to learn. Thanks for teaching.

elu
Автор

Absolutely fantastic
Would love a similar video on PAM clustering for mixed integer and categorical variables

Silverwing_
Автор

I have never thought that we can visualize K means by using Dimension Reduction (PCA)!! Awesome Tutorial Sir

photoish
Автор

Awesome stuff, Vik. Thanks for sharing.

jessemunson
Автор

Amazingly clear! Thank you so much, Dataquest!

tejasvinnarayan
Автор

very helpful and clear explanations - thank you!

sashagalanova
Автор

This THE best tutorial online. I am so grateful for this! Thank you

amandamorrow
Автор

Great video. Really helpful looking at implementing it manually. Thank you so much

krlwshu
Автор

I can't thank you enough. Thank you for this content.

TidianeDiallo-ss
Автор

Such good and clearly delivered material. Thanks a lot!

hounddog
Автор

Thank you, thank you, thank you!!! Being able to perform and explain what runs under the hood is really important- I agree. Please keep these videos coming 🙌🏼❤️ The “From Scratch” series :)

oskeeg
Автор

you might be a hero... thansk a lot for the contents...

ytustatistics