Neural networks [5.6] : Restricted Boltzmann machine - persistent CD

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

Hi Hugo Larochelle,
One important practical consideration while training is that while updating the weights according to update rule (0:53), you use probabilities themselves in place of h(xt) but while generating a Gibbs chain, you use binary vectors. This greatly speeds up the training.

jadoo
Автор

Hi,

I have a doubt in deep boltzmann machine. I am trying to implement DBM for MNIST digit classification. My plan is to implement paper written by Hinton and Salkhutdinov. And then "Efficient Learning of Deep Boltzmann Machine". But when I was going through the code provided by Salkhutdinov, he first pre trains the two RBMs with visible and hidden units of 500 and 1000. After pretraining, he does learning of Deep Boltzmann machine (Variational Approach). But he takes the weight matrices to be random again. Why is this? Ideally, I think we should take weight matrices from pre training right. I am confused here. Can you please clarify this?

saisagar
Автор

hi
can i discriminate two currents using back propagation algorithm using MATLAB code
and i need to test and train data
i got through the matlab code from Ruslan Salakhutdinov "backpropg.m
but am getting error in load fullmnist_dbm
could u please help in solving this error
which file should i upload for resolving this error
Thank you

ramyakeerthi
Автор

How many multiplications are needed by the Contrastive Divergence algorithm
to update weights for a single input vector x, assuming that two steps of Gibbs
sampling are used? Assume that the network has M input nodes, N hidden
nodes, and no biases

Could u help me answer this question??

hbm
Автор

Hi,

I understand that using the PCD method helps to optimize the negative log likelihood. But I don't understand why this is important. I assume that RBM is just interesting to get another "better" representation of our data right? And you argued before that CD-K was already enough for getting a good h(x). So why would we be interested in a better negative log likelihood?

Is it perhaps because we also need to make sure that (at test time) we need to be able to create correct new representations h(x) of data which we haven't seen before?

lucasvanwalstijn
Автор

When you update weights using PCD, do you keep the original positive sample that used the training vector during the first epoch? Or does both the negative and the positive sample change after 1 epoch by using the last Gibbs chain?

dontwakemeup
Автор

How do the actual inputs stay relevant if you don't load them into the chain? Is the x in p(h|x) still x and not ~x?

peterfrendl
Автор

Hi Hugo,
I have I doubt about implementing PCD with mini batch 7:00.
Should I maintain a chain for each training point and at epoch update its chain or
should I maintain only one mini batch chain and at each mini batch update this chain ?

I believe that the second approach is more reasonable, but it is not clear.