Introduction to Cluster Analysis with R - an Example

preview_player
Показать описание
Provides illustration of doing cluster analysis with R.

Cluster analysis is a statistical technique used to group similar objects or data points based on their characteristics. The goal is to identify patterns or structures within data without any prior knowledge of the groups. By measuring the similarity or distance between objects, cluster analysis divides the data into distinct clusters where members of each cluster are more similar to one another than to members of other clusters. This method is widely used in various fields such as marketing for customer segmentation, biology for classifying species, and machine learning for exploratory data analysis.

For citation as reference in a research paper, use:
Meshram, A., and Rai, B. (2019). “User-Independent Detection for Freezing of Gait in Parkinson’s Disease Using Random Forest Classification,” International Journal of Big Data and Analytics in Healthcare, Vol. 4, Issue 1, 57-72.

Rai BK (2017) “Feature Selection and Predictive Modeling of Housing Data Using Random Forest,” International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering, Vol. 11, No. 4, 880-884.

Xiaoling, Lu., Rai, B., Yan, Z., and Li, Y. (2018). “Cluster-based Smartphone Predictive Analytics for Application Usage and Next Location Prediction,” International Journal of Business Intelligence Research, Vol. 9, No. 2, 64-80.

Topics
00:00 Read data file
00:45 Scatter plot
02:30 Data normalization
04:27 Calculate Euclidean distance
05:54 Cluster dendrogram with complete linkage
08:20 Cluster dendrogram with average linkage
08:52 Cluster membership
10:47 Cluster means
12:35 Silhouette plot
13:31 Scree plot
14:47 Non-hierarchical k-means clustering & interpretation

Cluster analysis is an important tool related to analyzing big data or working in data science field.

R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
Рекомендации по теме
Комментарии
Автор

Great tutorial!!...the way you explain is easy to understand...you should do more like this

DineshKumarT
Автор

Thank you so much Dr.B.Rai, I inspired your way of teaching even you in online, hopefully, every one enjoying your teaching

ramasamythirunavukkarasu
Автор

This is an excellent tutorial -- well presented and thorough. I followed along with my own application example (country healthcare per capita expenditure versus infant mortality rates of various types) and got very interesting results.

factChecker
Автор

Excellent explanation and code. I took the Johns Hopkins data science course, and clustering was part of the course. This video really helps explain the concept.

stephenhobbs
Автор

Great tutorial. I really like how you stuck to explaining the steps through a practical application. Thank you for this.

ArcenisRojas
Автор

5-star explanation. thank you! Very much recommended for beginners and intermediate R users. You got a new follower!

sebastiansocianu
Автор

Really thank you so much!!! The best tutorial on this topic!!!

karoargote
Автор

My goodness, this video is so complete, and clearly explained with details of the script... Thank you so very much... 100 points to you...!!  You have a new fan...

rarosification
Автор

If i had a thousand likes you would have received them all sir. Love the way you have explained and covered the concepts

kanikalungani
Автор

Great tutorial! You are really helping a lot of people like me, and the best part is- drama, background music etc are completely missing unlike many other tutorials. Also saw some bhojpuri songs :)...thank you sir!

rupeshbharadwaj
Автор

Hi Bharatendra, great video - really helpful!

Everything goes well until the point of doing the scree plot, I am getting:

> withinGroupSumOfSquares = (nrow(normNum)-1) * sum(apply(normNum, 2, var, na.rm=TRUE))
> for(i in 2:20) withinGroupSumOfSquares[i] = sum(kmeans(normNum, centers=i)$withinss)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
> plot(1:20, withinGroupSumOfSquares, type="b", xlab = "Number of Clusters", ylab = "Within group SS")
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ

Can you help me? Thank you.

jonathanrhein
Автор

This is truly an excellent, clear and concise tutorial. You covered a lot of topics in a short amount of time. I will be watching your other videos. Well done!

markshanks
Автор

Thanks for such wonderful explanation.
By the way, I was working on a similar dataset, and apply didnt work for me. Although I removed all character vectors, but still the numeric vectors were returning 'NA'. I applied sapply and it solved the purpose.

Thanks again!!

tradingtraveller
Автор

Great Explanation!
Thank you Sir For this Video Lecture
I will be watching your other videos.

kapilrana
Автор

Thank you so much. This was easy to follow and I did my own analysis as we went along with almost no trouble. This was a breakthrough video for me.

janelutken
Автор

Fantastic explanation! I followed along with a different dataset and it worked perfectly! Great work!!

archeops.
Автор

This is a brilliant tutorial which is easy to understand and follow.

sarahroffe
Автор

Wah!!! how could u explain it so well!! Great job.

harikamacharla
Автор

Hi Bharatendra, this is an excellent tutorial - the first one that worked for me. Great effort, keep up the good work!

emiltsenov
Автор

Good Job in explaining the content along with code..

kishoreyarramshetty