filmov
tv
K-Means clustering with Python Example

Показать описание
This video is going to be divided into 5 parts:
00:00 Introduction to Data
00:59 Clean data and choose variables for clustering
05:30 Sklearn K-Means example code
07:50 Interpreting clustering results
11:30 Wrapping up
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.
The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.
The unsupervised k-means algorithm has a loose relationship to the k-nearest neighbor classifier, a popular supervised machine learning technique for classification that is often confused with k-means due to the name. Applying the 1-nearest neighbor classifier to the cluster centers obtained by k-means classifies new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.
Created by
Kunaal Naik
------
Follow us on Facebook
Also on Instagram
Also you can check our website
--------
----
#k-means_clustering #deeplearning #machinelearning #Decision_trees #gradient_boosting #varianc #gradiant_descent #python #deeplearning #technology #programming
#coding #bigdata #computerscience #data #dataanalytics #tech #datascientist #iot #pythonprogramming
#programmer #ml #developer #software #robotics #java #innovation #coder #javascript #datavisualization
#analytics #neuralnetworks #bhfyp
00:00 Introduction to Data
00:59 Clean data and choose variables for clustering
05:30 Sklearn K-Means example code
07:50 Interpreting clustering results
11:30 Wrapping up
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.
The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.
The unsupervised k-means algorithm has a loose relationship to the k-nearest neighbor classifier, a popular supervised machine learning technique for classification that is often confused with k-means due to the name. Applying the 1-nearest neighbor classifier to the cluster centers obtained by k-means classifies new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.
Created by
Kunaal Naik
------
Follow us on Facebook
Also on Instagram
Also you can check our website
--------
----
#k-means_clustering #deeplearning #machinelearning #Decision_trees #gradient_boosting #varianc #gradiant_descent #python #deeplearning #technology #programming
#coding #bigdata #computerscience #data #dataanalytics #tech #datascientist #iot #pythonprogramming
#programmer #ml #developer #software #robotics #java #innovation #coder #javascript #datavisualization
#analytics #neuralnetworks #bhfyp