Deep Learning(CS7015): Lec 4.2 Learning Paramters of Feedforward Neural Networks (Intuition)

Показать описание
Рекомендации по теме

What will be the dimension of WL? Let the last layer weights matrix WL for convenience be called just W and the output vector of the last hidden layer be h = [h1, h2, ..., hn]. The output should be a (k*1) vector. Let the output be [L1, L2, ..., Lk]. Then if we don't include the bias for the last layer yet, the first element of the output layer should be:
L1 = W11*h1 + W12*h2 + - - - - - + W1n*hn and the last element of the output layer shoudl be Lk = Wk1*h1 + Wk2*h2 + - - - - - - - + Wkn*hn.
This makes the WL matrix of dimension k*n. And therefore when a matrix of dimension k*n multiplies a matrix of n*1, we get the output vector of dimension k*1.
Therefore, instead of grad(WL), shouldn't it be grad(transpose of WL)?


in that "nasty" matrix, why is the 3rd column from the right repeated?
