Neural networks [7.9] : Deep learning - DBN pre-training

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

Thank you Hugo for all the time you put into making these edifying set of videos. You saved me a lot of time pin-pointing where to apply my efforts. Merry Christmas :)

williamkyburz
Автор

hi Hugo, if you don't mind, could you share the ppt(slides) that you used please?

hendriktampubolon
Автор

To make sure I understand correctly:
1. At each new hidden layer i+1, we are effectively training an RBM on q(h_i|x)
2. In the first new hidden layer (i=2), we happen to see a good choice of q(h_1|x) = p(h_1|x) by tying weights
3. Once we train this first new layer (i=2), q(h_1|x) =/= p(h_1|x)
4. In each following new hidden layer (i>2), is there no good choice of q(h_i|x)? Due to non-linearities, we cannot choose perfect weights, but what do we tend choose in practice?

aSeaofTroubles
Автор

Dear Hugo, can you tell me some real life application of deep belief network? what are the advantages of using DBNs instead of CNN? I read some papers about this DBN but I can't really understand how supervised DBNs work

minh
Автор

Please Could you help me by giving the slides(PPT) of this video? I need it so badly. Ofcourse you shared the link for slides which are pdf form. If you don't mind and if you have powerpoint I will be really happy. Thanks a lot

ninanashew
Автор

@16:00: Using probabilities instead of sampling: if I'm not mistaken, the vector of probabilities is just the mean vector in this case, so here we do exactly what we do, say, in dropout, where we sample during training and use the mean during testing/prediction. Basically we approximate E[f(z)] with f(E[z]) which is exact when f is linear (and gives inequalities when f is convex/concave).
In our case, if log p(h) is "linear enough" in a region of high probability wrt q(h|x), we should be fine. I guess that's a lot of if's... and my intuition is possibly wrong :(
BTW, aren't the sigmoid and the softmax functions the mean functions (inverse of the link function) of the bernoulli and multinoulli (categorical) distributions? So it seems we're using this approximation all over the place.

kiuhnmmnhuik
Автор

Hi Hugo, could you explain why do we only optimize P(h) but not P(v|h) when pre-training a deep belief network?

bomb
Автор

It is hard for me to understand your explanation because of the mathematics, but after reading many paper and stuck in the direction of h^1 to v, now it seems it is related to Gibbs sampling and thank you for resolving this problem. However, I wonder where is the output layer that provides the result?

rezarawassizadeh
Автор

Hi Hugo, at 7:46, can you please elaborate on how the expression at the bottom is equivalent to training an RBM on data generated from q(h1|x), please?

saurabhmehta
Автор

Since RBM is being used for pretraining, so hidden units can take only binary values?

sameermalik