Dropout Regularization (C2W1L06)

Показать описание

DeepLearningAI

Рекомендации по теме

Комментарии

Dropout helps us to ensure that the model is not getting biased towards a particular feature, i.e. to ensure it performs well even in the absence of that particular feature.
keep-prob = 0.8 means the probability of a hidden unit will be kept is 0.8, and 0.2 chances of a hidden unit will be ignored.

no dropout at test time, explicitly, because we do not want to vary the output

exampreparationonline

As we can see that the d3 vector is 20% sparse matrix means that 20% of the data has been set to 0/False.on multiply the a3 with d3 makes the a3 to 20% to 0 values which results the drouput machanism in the training data set but although we have to manage the a3 for the w4 iteraion equation so we want to increase the a3 by keep_prob which is in this case is 0.2 which increases a3 relatively high .may it helps you

SandeepKumar-ieni

??? So dropout mask d3 is calculated every iteration!?
Does not not make the result jump around like a monkey preventing the network from actually converging to one result!?

Why would not calculating a fixed/final d3 before training be better??

Jaspinik

According to 0:40, `1 - keep_prob` is the probability that the node will be eliminated. Andrew (I'm paraphrasing a bit) says that it is equivalent to "removing all the ingoing links to that node". In other words, this is equivalent to setting the whole column of d3 (2:34) that corresponds to the node to be eliminated to zero. By doing this, the dot product of the weights of this column to the previous layer's activation units would be 0 (which is what we want).
If that's the case, should we not have instead:
```
d3 = np.zeros((a3.shape[0], a3.shape[1]))
d3[:, np.random.rand(a3.shape[1]) < keep_prob] = 1
```
where a3.shape[1] is the size of the current layer (whose nodes are dropped out) and a3.shape[0] the size of the previous layer.

If, instead, we implement it as shown in the video, that is,
d3 = np.random.rand(a3.shape[0], a3.shape[1]) < keep_prob],
it is not guaranteed that the whole column corresponding to the node to be eliminated will be zero. Thoughts?

giofou

Thank you Andrew

Do we consider Bias to be dropped out, or we always keep it?

IgorAherne

Why do you have to divide by 0.8 to conserve the mean? Is average

diegomoya

1. Why do we have to eliminate different nodes in the layers for different training examples? Like why does it have to be different?
2. What is the difference between test time and training time?
3. Instead of np.random, randn(a3.shape[0], a3.shape[1]), can we write np.random.randn(a3.shape) ?

shwethasubbu

I couldn't understand the reason of multiplication of keep_prob at the end. Can any one help? thank you.

abhramajumder

great explanation .need to watch again

sandipansarkar

whats the point of dividing a3 by 0.8, u said it will gain back the reduced 20% loss values, i m not getting first we shut down 20% neuron or units then we r switching them back on? this is what u mean by gaining the 20% lost values blc if u r talking about other values than 0 then yess they get increases by 20% but i dont know y we r doing this, i mean how will it effect the results in z4.

smilebig

So when teaching Neural Network we have dropout layer(s) to prevent over-fitting.

Once we got all the weights figured out and ready to use network in production environment should we drop those layers?

They seems to only hurt our "using" of the network.

Geoters

Please help me to understand...
1. Is the difference between L2 and inverted dropout that the L2 is an "average" of "w"s, so changing through iterations, while inverted dropout is always a fix number? Because both just reduce the "w"s.
2. Does inverted dropout only simulate the shut off through iterations by let's say averaging the effects of the nodes?
3. If so then is it possible to still have some x features which have major effects on every nodes in a layer, meaning that every nodes in a layer "learn" the same thing?
4. If so, why do not we just kill some w values for nodes randomly meaning they will not be involved in the learning process of the given node, while the other nodes in the same layer can still learn on it?

rekasil

what if every element in d3 is ALL True (all is smaller than keep_prob)? I means its random so that case is possible

VV-mzyz

do we perform the inverse dropout calculation (a = a/ keep-prob) on the input layer neurons aswell?

doyugen

in training, node should share the effect of missing nodes that the whole idea. isn't it? i dont find divide by .8 a very appropriate way.

uditarpit

excellent explanation.Though it wasn't clear what happens during test/dev time. Does it multiply all the weights with 0.2 during test/dev for Dropout(0.2)?

rahuldey

5:40 what really means that divided z(4) to 0.8 because to remain the expectation of the value a(3).. someone please help me..

nikqlcb

If we are to remove the nodes than why did we added those nodes at first? Why isn’t we started with a smaller network?

farooqkhan

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Dropout Regularization | Deep Learning Tutorial 20 (Tensorflow2.0, Keras & Python)

What is Dropout Regularization | How is it different?

Dropout Regularization

Understanding Dropout (C2W1L07)

Regularization - Dropout

Dropout Regularization

Deep Learning - Lecture 5.4 (Regularization: Dropout)

Tutorial 9- Drop Out Layers in Multi Neural Network

CS 152 NN—12: Regularization: Dropout

Dropout Regularization

Dropout layer in Neural Network | Dropout Explained | Quick Explained

What is Dropout regularization in Deep Learning

Regularization in a Neural Network | Dealing with overfitting

Add Dropout Regularization to a Neural Network in PyTorch

Dropout - a Method to Regularize the Training of Deep Neural Networks [Lecture 6.4]

Dropout in Neural Network | Detailed Explanation with implementation in Python from Scratch

Dropout

Dropout regularization with intuition

PyTorch Dropout Regularization (4.3)

Regularisation: Dropout

L10.5.1 The Main Concept Behind Dropout

DropBlock - A BETTER DROPOUT for Neural Networks

On the Regularization Properties of Structured Dropout