Expectation Maximization Algorithm | Intuition & General Derivation

Показать описание

The Maximum Likelihood is a great first start for fitting the parameters of a model when you only have access to data. However, it breaks down once your model contents latent random variables, i.e., nodes for which you do not observe any data. A remedy is to take the marginal likelihood instead of the full likelihood, but this approach leads to some difficulties that we have to overcome.

In this video, I show how to derive an upper estimate for the marginal log-likelihood, including all the necessary tricks like importance sampling and Jensen's inequality. We then end up in a chicken-egg problem. Hereby, we need the distribution's parameters to perform an estimate, but we also need the estimate to update the parameters. Consequentially, we have to resort to an iterative algorithm which contains of the E-Step (Expectation) and the M-Step (Maximization).

An Important remark is that the derivations I deliver here are just a framework. For each application scenario, for instance for Gaussian Mixture Models, the framework requires a new maximization to then end up with simple update equations.

-------
Info on why the Expectation Maximization algorithm does not work for the Bernoulli-Bernoulli model:

[TODO] I will work on a video on this, stay tuned ;)

-------

-------

Timestamps:
00:00 Introduction
00:48 Latent means missing data
02:15 How to define the Likelihood?
02:55 Marginal Likelihood
05:05 Disclaimer: It will not work
05:48 Marginal Likelihood (cont.)
06:15 Marginal Log-Likelihood
08:11 Importance Sampling Trick
11:31 Jensen's Inequality
13:03 A lower bound (error, see comments below)
15:23 The Posterior over the latent variables
16:20 A lower bound (cont.) (error, see comments below)
17:56 The Chicken-Egg Problem
20:18 Old and new parameters
21:55 The Maximization Procedure
22:56 A simplified upper bound
25:04 Responsibilities
25:46 The EM Algorithm
28:28 An MLE under missing data
29:07 Outro

Рекомендации по теме

Комментарии

Error at 13:20 : It is a lower bound, not an upper bound. Maximizing an upper bound is not meaningful. See also the comment of @Flemming for more details.

MachineLearningSimulation

very well produced video! But log is concave so you flipped the sign/direction of Jensen's inequality. In other words, you are finding a lower bound on the log-likelihood. BTW that is in fact arguably desirable as maximizing a lower bound is informative, maximizing an upper bound is not. Maybe that should be clarified for ppl learning this stuff.

flemming

3:50, Theta bar has two components right? (you said 3 components)

pravingaikwad

Amazing lovely video. Great job. I feel a bit unlucky that I have not come across your channel earlier.

todianmishtaku

Wow, I am still amazed how EM works. It’s really brilliant probability.

orjihvy

Hi felix, this is a nice video on em thanks for that, I question, i dont clearly understand why we have to take only posterior as q(T). why not something else. why posterior only suits q(t)

imvijay

The video explains formally and in a very clear way the algorithm. My question is, what if we have a mix of missing data, i.e. some missing Words and some missing Thoughts?

lucavisconti

10:27 I don't think it is right. Summation is for the whole (q * p/q), and we cannot conveniently apply summation to just q alone.

ananthakrishnank

The video is in high quality.

It is highly appreciated if the summation symbol is written as just Σ. It is a bit confusing when I look at your written summation symbol. I though that it is summing 1 to 1 (haha). But, this confusion does not degrade your video quality.

Thanks

ryanyu

Hi there ! Can you pls explain why do we have a parameter vector for 'words' but just a single parameter for 'thoughts' ?

Thanks in advance:

kartikkamboj

I think I see why theta_k is associated with responsibilities, instead of theta_k+1.

orjihvy

the only one made me understand this evil (E(P/q)= Σq* P/q))
thank you

EngRiadAlmadani

Expectation Maximization Algorithm | Intuition & General Derivation

EM Algorithm : Data Science Concepts

Expectation Maximization Algorithm | Intuition & General Derivation

The EM Algorithm Clearly Explained (Expectation-Maximization Algorithm)

EM-algorithm for gaussian clustering: The intuition behind an important Machine Learning concept

Expectation Maximization Explained

MLVU 8.3: Expectation-maximization

EM Algorithm Derivation

Expectation Maximization algorithm

MLVU 8.4: Expectation-maximization from first principles

EM

Expectation Maximization | EM Algorithm Solved Example | Coin Flipping Problem | EM by Mahesh Huddar

Cornell CS 5787: Applied Machine Learning. Lecture 18. Part 2: Expectation Maximization

EM Expectation Maximization in less than 20 minutes

2020 ECE641 - Lecture 29: Intro to EM Algorithm

2020 ECE641 - Lecture 30: EM Algorithm Theory

5.9 The Expectation Maximisation Algorithm

Expectation Maximization (EM) - 1 - Theory

STAT115 Chapter 14.7 Baum Welch Algorithm Intuition

AI Week 10 - Expectation-Maximization algorithm.

10-701 Lecture 14: Mixture models and the Expectation Maximization algorithm

#46 EM Algorithm - Expectation Maximisation - Steps, Usage, Advantages & Disadvantages|ML|

Monte Carlo Expectation Maximization

EM Algorithm for GMMs

EM (Expectation Maximization) Algorithm: Introduction/Background and Pseudocode