Introduction to Cluster Analysis with R - an Example

Показать описание

Provides illustration of doing cluster analysis with R.

Cluster analysis is a statistical technique used to group similar objects or data points based on their characteristics. The goal is to identify patterns or structures within data without any prior knowledge of the groups. By measuring the similarity or distance between objects, cluster analysis divides the data into distinct clusters where members of each cluster are more similar to one another than to members of other clusters. This method is widely used in various fields such as marketing for customer segmentation, biology for classifying species, and machine learning for exploratory data analysis.

For citation as reference in a research paper, use:
Meshram, A., and Rai, B. (2019). “User-Independent Detection for Freezing of Gait in Parkinson’s Disease Using Random Forest Classification,” International Journal of Big Data and Analytics in Healthcare, Vol. 4, Issue 1, 57-72.

Rai BK (2017) “Feature Selection and Predictive Modeling of Housing Data Using Random Forest,” International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering, Vol. 11, No. 4, 880-884.

Xiaoling, Lu., Rai, B., Yan, Z., and Li, Y. (2018). “Cluster-based Smartphone Predictive Analytics for Application Usage and Next Location Prediction,” International Journal of Business Intelligence Research, Vol. 9, No. 2, 64-80.

Topics
00:00 Read data file
00:45 Scatter plot
02:30 Data normalization
04:27 Calculate Euclidean distance
05:54 Cluster dendrogram with complete linkage
08:20 Cluster dendrogram with average linkage
08:52 Cluster membership
10:47 Cluster means
12:35 Silhouette plot
13:31 Scree plot
14:47 Non-hierarchical k-means clustering & interpretation

Cluster analysis is an important tool related to analyzing big data or working in data science field.

R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Рекомендации по теме

Комментарии

Great tutorial!!...the way you explain is easy to understand...you should do more like this

DineshKumarT

Thank you so much Dr.B.Rai, I inspired your way of teaching even you in online, hopefully, every one enjoying your teaching

ramasamythirunavukkarasu

This is an excellent tutorial -- well presented and thorough. I followed along with my own application example (country healthcare per capita expenditure versus infant mortality rates of various types) and got very interesting results.

factChecker

Excellent explanation and code. I took the Johns Hopkins data science course, and clustering was part of the course. This video really helps explain the concept.

stephenhobbs

Great tutorial. I really like how you stuck to explaining the steps through a practical application. Thank you for this.

ArcenisRojas

5-star explanation. thank you! Very much recommended for beginners and intermediate R users. You got a new follower!

sebastiansocianu

Really thank you so much!!! The best tutorial on this topic!!!

karoargote

My goodness, this video is so complete, and clearly explained with details of the script... Thank you so very much... 100 points to you...!! You have a new fan...

rarosification

If i had a thousand likes you would have received them all sir. Love the way you have explained and covered the concepts

kanikalungani

Great tutorial! You are really helping a lot of people like me, and the best part is- drama, background music etc are completely missing unlike many other tutorials. Also saw some bhojpuri songs :)...thank you sir!

rupeshbharadwaj

Hi Bharatendra, great video - really helpful!

Everything goes well until the point of doing the scree plot, I am getting:

> withinGroupSumOfSquares = (nrow(normNum)-1) * sum(apply(normNum, 2, var, na.rm=TRUE))
> for(i in 2:20) withinGroupSumOfSquares[i] = sum(kmeans(normNum, centers=i)$withinss)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
> plot(1:20, withinGroupSumOfSquares, type="b", xlab = "Number of Clusters", ylab = "Within group SS")
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ

Can you help me? Thank you.

jonathanrhein

This is truly an excellent, clear and concise tutorial. You covered a lot of topics in a short amount of time. I will be watching your other videos. Well done!

markshanks

Thanks for such wonderful explanation.
By the way, I was working on a similar dataset, and apply didnt work for me. Although I removed all character vectors, but still the numeric vectors were returning 'NA'. I applied sapply and it solved the purpose.

Thanks again!!

tradingtraveller

Great Explanation!
Thank you Sir For this Video Lecture
I will be watching your other videos.

kapilrana

Thank you so much. This was easy to follow and I did my own analysis as we went along with almost no trouble. This was a breakthrough video for me.

janelutken

Fantastic explanation! I followed along with a different dataset and it worked perfectly! Great work!!

archeops.

This is a brilliant tutorial which is easy to understand and follow.

sarahroffe

Wah!!! how could u explain it so well!! Great job.

harikamacharla

Hi Bharatendra, this is an excellent tutorial - the first one that worked for me. Great effort, keep up the good work!

emiltsenov

Good Job in explaining the content along with code..

kishoreyarramshetty

Introduction to Cluster Analysis with R - an Example

Learn Cluster Analysis | Cluster Analysis Tutorial | Introduction to Cluster Analysis

StatQuest: K-means clustering

Introduction to Cluster Analysis with R - an Example

R Tutorial: What is cluster analysis?

4 Basic Types of Cluster Analysis used in Data Analytics

#22 Cluster Analysis - Properties, Categories Of Methods |DM|

Introduction to Clustering

Introduction to Cluster Analysis

Unsupervised Learning: How t-SNE Works | AIML End-to-End Session 116

1 Introduction to Cluster Analysis Objective and Data Profiling

12. Clustering

6.2.5 An Introduction to Clustering - Video 3: Movie Data and Clustering

Hierarchical Cluster Analysis [Simply explained]

Introduction to Cluster Analysis

Part 34 Introduction to Cluster Analysis

How to Perform a Cluster Analysis

cluster analysis in data mining|| properties || data mining || machine learning || ns lectures

Introduction to Cluster analysis and K Means Algorithm Big Data Analytics Tutorial by Mahesh Huddar

StatQuest: Hierarchical Clustering

CLU1: Introduction to Cluster Analysis

Intro to Clustering

6.2.13 An Introduction to Clustering - Video 7: Hierarchical Clustering in R

Cluster Analysis Explained

Intro to Cluster Analysis in R using Case Study - Part 1