Deep Learning(CS7015): Lec 4.2 Learning Paramters of Feedforward Neural Networks (Intuition)

Показать описание

lec04mod02

Рекомендации по теме

Комментарии

What will be the dimension of WL? Let the last layer weights matrix WL for convenience be called just W and the output vector of the last hidden layer be h = [h1, h2, ..., hn]. The output should be a (k*1) vector. Let the output be [L1, L2, ..., Lk]. Then if we don't include the bias for the last layer yet, the first element of the output layer should be:
L1 = W11*h1 + W12*h2 + - - - - - + W1n*hn and the last element of the output layer shoudl be Lk = Wk1*h1 + Wk2*h2 + - - - - - - - + Wkn*hn.
This makes the WL matrix of dimension k*n. And therefore when a matrix of dimension k*n multiplies a matrix of n*1, we get the output vector of dimension k*1.
Therefore, instead of grad(WL), shouldn't it be grad(transpose of WL)?

desiquant

in that "nasty" matrix, why is the 3rd column from the right repeated?

anuragdathatreya

Deep Learning(CS7015): Lec 4.2 Learning Paramters of Feedforward Neural Networks (Intuition)

Deep Learning(CS7015): Lec 4.2 Learning Paramters of Feedforward Neural Networks (Intuition)

Deep Learning(CS7015): Lec 2.4 Error and Error Surfaces

Deep Learning(CS7015): Lec 4.1 Feedforward Neural Networks (a.k.a multilayered network of neurons)

Deep Learning(CS7015): Lec 2.8 Representation Power of a Network of Perceptrons

Deep Learning(CS7015): Lec 3.4 Learning Parameters: Gradient Descent

Deep Learning(CS7015): Lec 5.7 Tips for Adjusting Learning Rate and Momentum

Deep Learning(CS7015): Lec 4.3 Output functions and Loss functions

Deep Learning(CS7015): Lec 8.3 True error and Model complexity

Deep Learning Part - II (CS7015): Lec 16.3 Can we represent the joint distribution more compactly

Deep Learning(CS7015): Lec 12.4 Finding influence of input pixels using backpropagation

Deep Learning(CS7015): Lec 9.4 Better initialization strategies

Deep Learning(CS7015): Lec 9.5 Batch Normalization

Deep Learning(CS7015): Lec 4.6 Backpropagation: Computing Gradients w.r.t. Hidden Units

Deep Learning(CS7015): Lec 4.5 Backpropagation: Computing Gradients w.r.t. the Output Units

Deep Learning(CS7015): Lec 13.4 The problem of Exploding and Vanishing Gradients

Deep Learning(CS7015): Lec 11.5 Image Classification continued (GoogLeNet and ResNet)

Deep Learning(CS7015): Lec 4.7 Backpropagation: Computing Gradients w.r.t. Parameters

Deep Learning(CS7015): Lec 14.2 Long Short Term Memory(LSTM) and Gated Recurrent Units(GRUs)

Deep Learning(CS7015): Lec 6.8 Singular Value Decomposition

Deep Learning(CS7015): Lec 11.3 Convolutional Neural Networks

Deep Learning(CS7015): Lec 8.8 Adding Noise to the outputs

Deep Learning(CS7015): Lec 1.9 (Need for) Sanity

Deep Learning(CS7015): Lec 14.3 How LSTMs avoid the problem of vanishing gradients

Deep Learning Part - II (CS7015): Lec 22.3 Generative Adversarial Networks - The Math Behind it