Understanding Dropout (C2W1L07)

preview_player
Показать описание

Follow us:
Рекомендации по теме
Комментарии
Автор

Clarification about Understanding dropout

Please note that from around 2:40 - 2:50, the dimension of w[ˡ] should be 7x3 instead of 3x7, and w[³] should be 3x7 instead of 7x3.

In general, the number of neurons in the previous layer gives us the number of columns of the weight matrix, and the number of neurons in the current layer gives us the number of rows in the weight matrix.

manuel
Автор

I had a question regarding 3:15 since I expect assigning a low keep prop at hidden layer 1 instead of hidden layer 2. As Andy mentioned, the drop out performs shrinking of weights of input nodes which could cause overfitting, so I assumed P(keep) should be low for both hidden 1 and 2

kswill
Автор

the video does not play with me not on laptop and mobile is their any main reason

alonewalker
Автор

2:00
If L2 is more adaptive, what is the advantage of using dropout?
Is it the robustness?
It seems that dropout directly enforces that the network should be robust.

NolanZewariligon
Автор

I think the dimension of each weight w1 should be [7][3], not [3][7]. And w3 should be [3][7]...

sungyunpark
Автор

very importany lecture .need to watch again

sandipansarkar
Автор

How can dropout be related to L2 regularization? L1 is more plausible.

jpzhang
Автор

funny scaling factor..😂. It's very polite of you to call all the tech blunders FUNNY.. & add humour effortlessly w/o being rude🤩. Great teacher!!

preetysingh
Автор

At what point is a dropout rate too high? 50% sounds like a lot if the training step is called frequently. I'm afraid it throws out useful weights before they converge.

ABCYT
Автор

Do we use another random dropout at each iteration? Suppose that we selected keep_prob=0.8 for layer 3, at each iteration, it picks another random 0.2 of units to shut off as far as I understood. Can anyone confirm me about this?

travel
Автор

Do we just keep randomly changing droput neurons in every iteration once we start? How would that be useful; we need to find best combination in real world.

coolamigo
Автор

What if instead of dropping hidden unit randomly I will trained my NN with limited units not deep

deveshnagar
Автор

Just to make sure my understanding, the downside of using dropouts is that we cannot use the loss function (or J function as previously stated in the video) as the indicator whether our model is actually converging or diverging. Because the neurons in the hidden layers are constantly changing through iterations. Therefore, we simply cannot compare since how the data is being treated differently every epoch. Is it correct?

marcellinuschrisnada
Автор

Half of the things he speaks are un-understandable to me. God knows what he says!!

indiangirl