Random Initialization (C1W3L11)

preview_player
Показать описание

Follow us:
Рекомендации по теме
Комментарии
Автор

I have noted that my models would not converge nicely (last assignment from C1W4, 3 ReLU + 1 sigmoid layers) when comparing to a notebook reference that I'm following.
If I just initialized my weights from a normal distribution, the cost would get stuck at a high value. I've tried scaling the weights, changing to a uniform distribution, changing the learning rate to various values, nothing worked.
Then following your code, I saw that if I divided the weights for each layer according to the sqrt of the number of features to that layer, then it would start converging beautifully. Would be interesting to know why!


Thanks for your lessons!

swfsql
Автор

If you use tanh activation function you have an even bigger problem - the gradients will always be equal to zero, and no learning is feasible (not even a disabled - all weights go in the same direction - learning).

RealMcDudu
Автор

It seems like the most general statement of the solution is that the coefficients must form full rank matrices.

arthurkalb
Автор

from where can we access the practice questions?

sakshipathak
Автор

Since we are using leaky ReLU for most cases now, should we initiate weights as extreme as possible so when back propagation take place, they will have higher chance to land in different local extremas?

X_platform
Автор

what is the best choice for learning rate(alpha)...?

jagadeeshkumarm
Автор

can anyone explain why gradient descent study slow when the slope is 0 (flat)? arent we are trying to find the max and min in this function? thanks

jessicajiang
Автор

If W = 0, B = 0, then A = 0. Similarly all vectors should be zero. Isn't it?

shubhamsaha