Deep Learning(CS7015): Lec 4.2 Learning Paramters of Feedforward Neural Networks (Intuition)

preview_player
Показать описание
lec04mod02
Рекомендации по теме
Комментарии
Автор

What will be the dimension of WL? Let the last layer weights matrix WL for convenience be called just W and the output vector of the last hidden layer be h = [h1, h2, ..., hn]. The output should be a (k*1) vector. Let the output be [L1, L2, ..., Lk]. Then if we don't include the bias for the last layer yet, the first element of the output layer should be:
L1 = W11*h1 + W12*h2 + - - - - - + W1n*hn and the last element of the output layer shoudl be Lk = Wk1*h1 + Wk2*h2 + - - - - - - - + Wkn*hn.
This makes the WL matrix of dimension k*n. And therefore when a matrix of dimension k*n multiplies a matrix of n*1, we get the output vector of dimension k*1.
Therefore, instead of grad(WL), shouldn't it be grad(transpose of WL)?

desiquant
Автор

in that "nasty" matrix, why is the 3rd column from the right repeated?

anuragdathatreya