K Nearest Neighbors Application - Practical Machine Learning Tutorial with Python p.14

preview_player
Показать описание
In the last part we introduced Classification, which is a supervised form of machine learning, and explained the K Nearest Neighbors algorithm intuition. In this tutorial, we're actually going to apply a simple example of the algorithm using Scikit-Learn, and then in the subsquent tutorials we'll build our own algorithm to learn more about how it works under the hood.
To exemplify classification, we're going to use a Breast Cancer Dataset, which is a dataset donated to the University of California, Irvine (UCI) collection from the University of Wisconsin-Madison. UCI has a large Machine Learning Repository.

Рекомендации по теме
Комментарии
Автор

Thank you so much Edward Snowden. 😂 I took 6 Machine Learning courses in uni. First I failed Machine Learning, then failed Natural Language Processing, then failed Bioinformatices which also constituted some Machine Learning. The other three courses were Data Mining, Machine Learning (again) and... I can't remember 6th lol. Not once did I understand code that clearly. Good job.

ahmadjamalmughal
Автор

Note that cross_validation is depreciated. Use model_selection, which interestingly gets me a higher accuracy.

josiahls
Автор

Hey guys, if you are having the error ValueError: labels ['class'] not contained in axis be sure to save your file as ANSI ENCODING, that was what solved my problems here!

daniloktz
Автор

For more recent followers train_test_split has been moved. To fix use:
from sklearn.model_selection import train_test_split

barrettbryson
Автор

Hey Mate! I don't know if it works for others, but the warning for depreciation is fixed (for me) by simply providing a 2-dimensional list for the np.array declaration. this way you don't have to use reshape.

example_measures = np.array( [ [4, 2, 1, 1, 1, 2, 3, 2, 1] ] )

when you provided a 2nd set of data you didn't need to reshape there either, because you naturally made it 2-dimensional.
But this isn't to say that different versions of everything won't have different effects. For me, if I get the warning in question, the algorithm returns no prediction where as yours appeared to still run the prediction properly anyway.

Loving your videos!

nintendo_dringus
Автор

id, clump_thickness, unif_cell_size, unif_cell_shape, marg_adhesion, single_epith_cell_size, bare_nuclei, bland_chrom, norm_nucleoli, mitoses, class

TheHimanshu
Автор

Found out what the reshape was moaning about ... just needed an extra set of square brackets (i.e. no need to reshape)

example_measures = np.array([[4, 2, 1, 1, 1, 2, 3, 2, 1]])
#example_measures = example_measures.reshape(1, -1);

LOVED the video - thank you!

duncancarr
Автор

For who have errors: labels['class'] not contained or labels['id'] not contained in axis, remember to insert "id, clump_thickness, unif_cell_size, unif_cell_shape, marg_adhesion, single_epith_cell_size, bare_nuclei, bland_chrom, norm_nucleoli, mitoses, class" into the first line of your data file.

jimiyu
Автор

Load data, replace missing values and drop 'id' column in one step: df = pd.read_csv('breast_cancer_wisconsin.csv', header=0).replace('?', axis=1). Method chaining keeps the code short and concise. Great tutorial series by the way :)

stephan
Автор

You got me interested in machine learning and now I'm using it for my MA thesis. Thank you.

aryanyekrangi
Автор

Hi,

I just did this tutorial. This is awesome !
Your explanation is so clear and straight forward, it makes the entire ML looks so easy !

Thanks a lot for this (and all other) tutorials !

Moving on to next video.

Regards
Prasad

gprasas
Автор

I dont know if much has changed since the making of this video and when im writing this but its incredible to see that when i follow along the model predicts with 99% accuracy at times. Kinda crazy.

isaiahtoro
Автор

You are a superb teacher and an exceptional programmer. Keep up the good work!!!!

profmo
Автор

If anyone is facing issue to import cross_validation from sklearn, it's updated now and moved to the model_selection, so you can import it as: from sklearn.model_selection import cross_validate

arycloud
Автор

Can't wait for episode 15! I have to this exact application of rewriting KNN in Java.

levyroth
Автор

Your ML tutorials are best on the internet👍

nemesis_rc
Автор

if you use "example_measures=np.array([[4, 2, 1, 1, 1, 2, 3, 2, 1], ])", you needn't to reshape the array. What scikit-learn need in this instance is just a 2D array.

shuhangzhan
Автор

U r superb man! Your Tutorials made machine learning a "learn with fun tutorial"... Thanks!

shivambhirud
Автор

Better explained than when I was a student in a university.

alexmattheis
Автор

If you have the error, "ValueError: labels ['class'] not contained in axis", the following steps will solve your issue
1. save the downloaded file without having column names like id, class, clump_thickness. In other words, just save the file as it is
2. assign column names like this
df.columns = ["id", "clump_thickness", "unif_cell_size", "unif_cell_shape", "marg_adhesion",
"single_epith_cell_size", "bare_nuclei", "bland_chrom", "norm_nucleoli", "mitoses", "class"]
This line should be after df = pd.read_csv()

joshsato