Random Initialization (C1W3L11)

Показать описание

DeepLearningAI

Рекомендации по теме

Комментарии

I have noted that my models would not converge nicely (last assignment from C1W4, 3 ReLU + 1 sigmoid layers) when comparing to a notebook reference that I'm following.
If I just initialized my weights from a normal distribution, the cost would get stuck at a high value. I've tried scaling the weights, changing to a uniform distribution, changing the learning rate to various values, nothing worked.
Then following your code, I saw that if I divided the weights for each layer according to the sqrt of the number of features to that layer, then it would start converging beautifully. Would be interesting to know why!

Thanks for your lessons!

swfsql

If you use tanh activation function you have an even bigger problem - the gradients will always be equal to zero, and no learning is feasible (not even a disabled - all weights go in the same direction - learning).

RealMcDudu

It seems like the most general statement of the solution is that the coefficients must form full rank matrices.

arthurkalb

from where can we access the practice questions?

sakshipathak

Since we are using leaky ReLU for most cases now, should we initiate weights as extreme as possible so when back propagation take place, they will have higher chance to land in different local extremas?

X_platform

what is the best choice for learning rate(alpha)...?

jagadeeshkumarm

can anyone explain why gradient descent study slow when the slope is 0 (flat)? arent we are trying to find the max and min in this function? thanks

jessicajiang

If W = 0, B = 0, then A = 0. Similarly all vectors should be zero. Isn't it?

shubhamsaha

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Why do we Randomly Initialize Weights in Neural Networks?

18-Random initialization for neural networks

Why Initialize a Neural Network with Random Weights || Quick Explained

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization for Deep Feedforward Neural Networks

Tutorial 11- Various Weight Initialization Techniques in Neural Network

L11.5 Weight Initialization -- Why Do We Care?

Weight Initialization explained | A way to reduce the vanishing gradient problem

Introduction to Weight Initialization

Kaiming Initialization | Lecture 6 (Part 2) | Applied Deep Learning

DLFVC - 11 - Network Initialization

Why shouldn't Neural networks initialized with Zero | Deep Learning Series | Open Knowledge Sha...

Advantages of Xavier Initialization in Deep Neural Networks

L8/2 Stabilize Training - Weight Initialization

Deep Learning(CS7015): Lec 9.4 Better initialization strategies

Tutorial 98 - Deep Learning terminology explained - Kernel (weights) initialization and padding

L11.6 Xavier Glorot and Kaiming He Initialization

orthogonal initialization in PyTorch

building deep learning library part 6! Adding Xavier's initialization and regularization

Weight initialization

Neural networks [2.9] : Training neural networks - parameter initialization

Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes

Kaiming Initialization (Q&A) | Lecture 5 (Part 1) | Applied Deep Learning (Supplementary)