Expectation Maximization for the Gaussian Mixture Model | Full Derivation

preview_player
Показать описание

Gaussian Mixture Models (GMMs) are extremely handy for clustering data. For example, think of clustering the grades of students after an exam into two clusters, those who passed and those who failed. For this we have to infer the parameters of the GMM (cluster-probabilities, means and standard deviations) from the latent. However, since the class node is latent we have to resort to an Expectation Maximization and the whole Maximum Likelihood Estimate will turn into an iterative procedure.

In this video we start at the derived general equations and fully derive all equations for the E-Step and the M-Step with NO EXCUSES - every derivative, manipulation and trick is presented in detail *.

The interesting observation is that although the EM implies we would need an expectation and maximization in every iteration, this is actually not the case. For the GMM, we can derive straight-forward update equations.

* If something is still unclear, please write a comment :)

-------

-------

Timestamps:
00:00 Introduction
01:10 Clustering
01:40 Infer Parameters w\ missing data
03:05 Joint of the GMM
04:45 E-Step: Un-Normalized Responsibilities
10:29 E-Step: Normalizing the Responsibilities
11:13 M-Step: The Q-Function
15:27 M-Step: Maximization formally
16:57 M-Step: Lagrange Multiplier
20:20 M-Step: Cluster Probabilities
30:50 M-Step: Means
35:00 M-Step: Standard Deviations
39:37 Summary
42:52 Important Remark
43:37 Outro
Рекомендации по теме
Комментарии
Автор

this is legitimately such a great explanation. thanks! <3

agrawal.akash
Автор

11:30 isn't it a lower bound of marginal log-likelihood instead?

vslaykovsky
Автор

There was an error on the hand-written M-Step in the beginning of the video. For the first 3 minutes I was able to overlay it. Please refer to this as the correct expression for the M-Step.

MachineLearningSimulation
Автор

How are you sure that the zeropoints of Q are maxima? Couldnt it be a saddle point or minima as well? Or did you just skip the part where you have to check the second derivatives?

patrickg.
Автор

Hi, what about EM algorithm for one bivariate Gaussian with missing values

bartosz
Автор

Is it possible to have the Gaussian distributions be latent and have the class be non-latent? Basically the continuous variable is latent now? What would this look like?

nickelandcopper
Автор

How about syntax in R if we want applied in survival mixture model?

sulasrisuddin
Автор

Just wondering. Could such EM approach work well in cases where X are high dimensional?

orjihvy