Momentum Optimizer in Deep Learning | Explained in Detail

preview_player
Показать описание
In this video, we will understand in detail what is Momentum Optimizer in Deep Learning.

Momentum Optimizer in Deep Learning is a technique that reduces the time taken to train a model.

The path of learning in mini-batch gradient descent is zig-zag, and not straight. Thus, some time gets wasted in moving in a zig-zag direction. Momentum Optimizer in Deep Learning smooth out the zig-zag path and make it much more straighter, thus reducing the time taken to train the model.

Momentum Optimizer uses Exponentially Weighted Moving Average, which averages out the vertical movement and the net movement is mostly in the horizontal direction. Thus zig-zag path becomes straighter.

In this video, we will also understand what Exponentially Weighted Moving Average is, and thus this video is a full in-depth explanation of Momentum Optimizer in Deep Learning.

➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖

➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖

➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖

➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖

Timestamp:
0:00 Agenda
1:00 Why do we need Momentum?
2:53 Exponentially Weighted Moving Average
8:29 Momentum in Mini Batch Gradient Descent
9:50 Why Momentum works?

➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖

Рекомендации по теме
Комментарии
Автор

This is EXACTLY HOW I needed to learn: Maths + Visualization with equations! Thank you so much!

anuranroy
Автор

Very few of them had even explained what momentum is made up of, whats its equation, you just took 2 mins to add that explanation but it helped so much to understand rest 10 mins of the video without pausing. Great work. Please keep it up.

pranaysingh
Автор

Very few resources in the internet explain these concepts in this kind of depth and clearly. Either they are in depth but not understandable or clear but not in depth. Loved your explanation.

bijoyroy
Автор

Your lectures are very short and easy to understand. Hope you will make more videos like this about optimization algorithms in deep learning.Thank you very useful video

minhnhat
Автор

Very nice explanation, thank you....From scratch, mathematics is what I was looking for....This really helped!!

chinmaysoni
Автор

this was exactly what I was seeking for. Thanks a

amirrezasadeghi
Автор

Greatly explained ! Thank you !! ( I find it even better than Andrew's one on the momentum), Keep it up !!

redalamphd
Автор

On 5:27 when computing V3, aren't you missing the factor (1-beta) from V2?

olgaptacek
Автор

This is the best explanation. Thank you

ghilesdjebara
Автор

You're a great man dude! Thanks alot.

pranaysingh
Автор

At 7:00 I think the difference formula is supposed to V(t) = Beta*Theta(t) + (1-Beta)*V(t-1) rather than V(t) = Beta*V(t-1) + (1-Beta)*Theta(t). Am I seeing that correctly?

careyshane
Автор

oh my god, this was clearly explained, thanks for this perfect insight.

melikakeshavarz
Автор

3:09 you are saying we give higher weightage to new points and low weightage to old points

but at 7:47, you are saying something opposite of it

so a confusion in this
I will appreciate if you can resolve this

Ankit-hsnb
Автор

Wonderful video. Made the concept look very easy...

aashwinsharma
Автор

very informative video brother, Thank you very much for the explanation, It was great

mohamedmohudoom
Автор

What would be the difference between this and adadelta?

mamahuhu_one
Автор

Thank you for a detailed video! I'm not an expert in this area. Could you explain what are W and B? From my understanding, W is the vector of parameters in the cost function, e.g., we want to minimize f(W). Is that correct? If so, what is B? How is it different from W? Thanks!

OnTastySpots
Автор

u are so good but some of ur vudeos has no subtitle caption unavailabe please active that for all ur videos tnx a lot

zshahlaie
Автор

Very well explained sir. Can you please start a playlist of DSA for python?

anirbanrana
Автор

I want to contact you for business work

alidakhil