K means clustering using python

Показать описание

The scikit learn library for python is a powerful machine learning tool.

K means clustering, which is easily implemented in python, uses geometric distance to create centroids around which our data can fit as clusters.

In the example attached to this article, I view 99 hypothetical patients that are prompted to sync their smart watch healthcare app data with a research team. The data is recorded continuously, but to comply with healthcare regulations, they have to actively synchronize the data. This example works equally well is we consider 99 hypothetical customers responding to a marketing campaign.

In order to prompt them, several reminder campaigns are run each year. In total there are 32 campaigns. Each campaign consists only of one of the following reminders: e-mail, short-message-service, online message, telephone call, pamphlet, or a letter. A record is kept of when they sync their data, as a marker of response to the campaign.

Our goal is to cluster the patients so that we can learn which campaign type they respond to. This can be used to tailor their reminders for the next year.

In the attached video, I show you just how easy this is to accomplish in python. I use the python kernel in a Jupyter notebook. There will also a mention of dimensionality reduction using principal component separation, also done using scikit learn. This is done so that we can view the data as a scatter plot using the plotly library.

Рекомендации по теме

Комментарии

This is an excellent tutorial. I had found loads of people doing clustering of two dimensional arrays and writing the clusters to numpy array. This is great because it shows many fields and applying the clusters to the original data ... which seems far more like a real world application. Many Thanks.

stuartkirkup

Thanks. Awesome exercise. I made a different attempt here. Instead of patient_cluster I used table for k means just to visualize more attributes in some different dataset. Only problem was data stored as a tuple instead of dictionary. so iplot(data) doesn't seems to work. so I did some little changes as follows in data=[trace0, trace1, trace2, trace3]( I used K=4):

clusters = KMeans(n_clusters=4)

df["cluster"] =

pca = PCA(n_components = 2)
cols = df.columns[1:]
df['x'] = pca.fit_transform(df[cols])[:, 0]
df['y'] = pca.fit_transform(df[cols])[:, 1]

trace0 = go.Scatter(x= df[df['cluster'] == 0]['x'],
y= df[df['cluster'] == 0]['y'],
name = 'Cluster1',
mode = 'markers',
marker = dict(size = 10,
color = 'rgba(255, 0, 0, 0.5)',
line = dict(width = 1, color = 'rgb(0, 0, 0)')))

trace1 = go.Scatter(x= df[df['cluster'] == 1]['x'],
y= df[df['cluster'] == 1]['y'],
name = 'Cluster2',
mode = 'markers',
marker = dict(size = 10,
color = 'rgba(0, 255, 0, 0.5)',
line = dict(width = 1, color = 'rgb(0, 0, 0)')))

trace2 = go.Scatter(x= df[df['cluster'] == 2]['x'],
y= df[df['cluster'] == 2]['y'],
name = 'Cluster3',
mode = 'markers',
marker = dict(size = 10,
color = 'rgba(0, 0, 255, 0.8)',
line = dict(width = 1, color = 'rgb(0, 0, 0)')))

trace3 = go.Scatter(x= df[df['cluster'] == 3]['x'],
y= df[df['cluster'] == 3]['y'],
name = 'Cluster4',
mode = 'markers',
marker = dict(size = 10,
color = 'rgba(255, 255, 0, 0.5)',
line = dict(width = 1, color = 'rgb(0, 0, 0)')))

I still got a nice plot. Am I did right? or I still need some correction. I calculate SSE and It seems that more iteration gives higher value of SSE and good quality clusters. I want to know more about SSE?

vaibhavhiwase

Thanks Juan ! That was great video. I was just wondering how can we measure accuracy of the results ?

JagjotSingh

Hey Juan, Thank you for this explanation!

abhaybhutkar

Hello Juan, can you please explain a bit when you do cluster.fit_predict, why columns start from 2 in [2:]? Thank you very much for your amazing video

defoezhang

hello Jua, is it correct to apply PCA including the cluster column?

pedroribeiro

Hi Juan, this was a great tutorial! I was wondering how to make a 3D clustering model. Do you know how?

victorhe

@Juan Klopper please share the link of the video on PCA if you have made one..

sighage

can i get your example data for my exercise sir ?

dhiyamahdiasriny

Hi Juan: Do you have the databases of this exercise? Can you chare with us? Thanks.

gabrielcornejo

Is that a Joburg accent I hear? Dankie for the video

chikomufc

Thanks a lot for video .. please share excel file..

RaviRanjanProfile

K means clustering using python

K-Means Clustering Algorithm with Python Tutorial

Machine Learning Tutorial Python - 13: K Means Clustering Algorithm

K-means Clustering From Scratch In Python [Machine Learning Tutorial]

K-Means Clustering From Scratch in Python (Mathematical)

Image Segmentation with K-Means Clustering in Python

Python Machine Learning Tutorial #6 - K-Means Clustering

K-Means Clustering Algorithm in Python | Practical Example | Student Clustering Example | sklearn

K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka

10 thuật toán học máy phổ biến (Top 10 Machine Learning Algorithms)

K Means Clustering Algorithm | K Means In Python | Machine Learning Algorithms |Simplilearn

K means clustering using python

K-Means Clustering - Methods using Scikit-learn in Python - Tutorial 23 in Jupyter Notebook

K Means Clustering Algorithm Example in Python - V1

Project 13. Customer Segmentation using K-Means Clustering with Python | Machine Learning Projects

StatQuest: K-means clustering

Customer Segmentation Using K-Means Clustering | Machine Learning | GeeksforGeeks

How to Perform K Means Clustering in Python( Step by Step)

KMeans Clustering Algorithm Practical Implementation

K-Means Clustering using Python

K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Simplilearn

K Means Clustering Python

K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial

K-Means Clustering in Python - Clients Segmentation with Unsupervised Learning

Customer Segmentation Tutorial | Python Projects | K-Means Algorithm | Python Training | Edureka