Neural networks [5.4] : Restricted Boltzmann machine - contrastive divergence

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

Can we say intuitively that the model learns by the difference between the given sample and the general pattern it learned so far?

revolutionarydefeatism
Автор

hi Hugo, I cant see spliting the partial derivertive into a positive phase and a negative phase a very obvious step..:(  where does this trick come from and why? thanks!

fengji
Автор

I am  confused with the equation at 2:28.
Could you explain why this partial derivative is composed of a Positive phase minus a Negtive  phase?

ayakoyamaguchi
Автор

La fase negativa hace referencia a la función de partición, o en que video se aproxima esa función de partición

duvansepulveda
Автор

If yes, it's weird, because loss function on training Neural Network requires class label while we are doing unsupervised learning here.
Could you please help to explain ?

SonNguyen
Автор

Hugo, Thanks for the slides. I have a question about why the 2nd term is not tractable. Some paper says it runs over 2^m states. Could you explain a bit more detail? Thank you very much

teacher
Автор

Hello Hugo,
Thank you very much for your excellent lectures series. I love your lecture a lot.

Regarding to this RBM lectures, I have some questions
1. I found many paper deliver the contrastive divergence from "KL distance". In your lecture, you started with average loglike lihood, is there any explanation for this? It make me a bit confusing.
2. Is there any further reading to understand where is "possitive phase" and "negative phase" come from?

Bests,

ThuongNgC
Автор

Around 2:50 notation can be confusing since you are using E for expectation and E for the energy function. I can see Energy appears on italics but still... just for clarity... can be different.

dgg
Автор

Very good, congrats!

I've read a lot about RBM and CD and just now I think I have understood.

I have a question: are there other modern ways to train RBM?

Regards from Brazil.

andtenorio
Автор

Hey, Hugo! I'm quite confused with PCD. If we use the previous iteration gibbs gampling result instead of the current train sample, is that mean all we need is just one sample, other samples are not used? I'm confused...

youlihanshu
Автор

Hello Hugo,
i did not understand how did you remove the expectations (E_{x} and E_{x, h} disappeard) starting from 10:36.
If you can give more explanation it will be good.
thanks anyway

mahmoudalbardan
Автор

Thank you for the video, it really helped me understand the details of CD a lot better

graufx
Автор

@5:00 So, basically, E_{x, h}[.] = E_x E_{h|x} [.] ~ E_{h|xtilde}[.], where xtilde~ is sampled from p(x) using Gibbs sampling.

kiuhnmmnhuik
Автор

Hi, Hugo. Firstly thank you for this nice video. I have a question in the procedure of Gibbs Sampling. At about 10:19, you said the sampling will terminate at k step. Is this k the dimension of the data? Personally I don't think it should be, because it seems there are no connections between the data dimension and sampling steps, But if it is, please tell me why. Thank you.

chenwang
Автор

Hey, why is the average NLL used to calculate the log of the function?, what is the reason behind it?. If you can recommend a paper or a previous video that will be cool, thanks.

michaelosinowo
Автор

Hi Hugo,
first of all, a big thanks for the tutorial videos. I'm trying to express my understanding about the first of the 3 main ideas of CD. You said it is to replace the expectation by a point estimate at x_tilde. My question is - am I right if I try to justify the 'point estimate' of an expectation as follows?
You want to compute the expectation of a function and you don't have a means to do it exactly for some reason. But you have a way to find the most probable (expected) value of x (the random variable with whose respect the expectation is being calculated), then assume that the whole probability mass is concentrated on that particular value of 'x' (the particular value is denoted as x_tilde). Now compute the value of the function and then multiply with the probability of that x_tilde (which is 1) to get an estimate of the expectation. So the expectation value is basically the value of the function at the most probable(expected) value of the random variable.
Hope I'm clear with my question. And thanks again for this wonderful series of videos.

pathbholapathik
Автор

Very clear and visual, thank you so much!

anirudhsingh
Автор

Thank you so much for the excellent explanation

snigdhapurohit
Автор

Dear Dr. Larochelle, thank you so much for your lectures and for making them public.

shamimabanu
Автор

Hello Hugo,
I would like to ask a question about the wake-sleep algorithm and contrastive divergence. I thought they seem similar in their basic idea. Are there any differences between them? Could I take it that wake-sleep is used to pre-train a DBN, while contrastive divergence is used to train a RBM?

yuanhuang