Tutorial 14- Stochastic Gradient Descent with Momentum

Показать описание

In this post I’ll talk about simple addition to classic SGD algorithm, called momentum which almost always works better and faster than Stochastic Gradient Descent. Momentum or SGD with momentum is method which helps accelerate gradients vectors in the right directions, thus leading to faster converging. It is one of the most popular optimization algorithms and many state-of-the-art models are trained using it. Before jumping over to the update equations of the algorithm, let’s look at some math that underlies the work of momentum.

Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!

You can buy my book on Finance with Machine Learning and Deep Learning from the below url

🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY YOUTUBE CHANNEL

Рекомендации по теме

Комментарии

You're doing really great. It's really good that you're focusing on the theory part and making it crisp clear for every one.

allenalex

I just love you, Krish. No need to search the Web, just Krish Naik is there to clear all the ideas. I like your approach of teaching theory first and then practical. Doing practical without clearing theory is useless. Thank you.

sukumarroychowdhury

Krish , you are doing really a great job. Even though I had completed my MSc. in Data Science and have some work experience, I am learning so much more from your tutorials. Lot of love. From Saudi Arabia 😃

story_teller_

if you confuse at 11:30 in SGD Momentum Equation, I will try to write again all equations.
Weight updated Formula
w2 = w1 - (learning_rate * dl/dw1)

define a new variable g1 = dl/dw1
and v1 = learning_rate* g1

so you can write your Weight updated Formula Again
w2 = w1 - v1

Again come to exponential moving Average Part
v1 = learning_rate* g1
v2 = gamma* v1 + (learning_rate* g2)
v_n = gamma* v_n-1 + (learning_rate* gn)

So final Equation will be
w_n = w_n-1 - v_n

Case1. If gamma value is 0 then
w_n = w_n-1 - learning_rate* gn

case 2. if gamma value is not 0
w_n = w_n-1 - v_n = w_n-1 - (gamma* v_n-1 + (learning_rate* gn))

shahrukhsharif

Understanding concept is very important, When i started deep learning, I was not able to understand any terminology . After watching your tutorial, I am able to correlate everything.. Thanks you so much..

pravinkaushikbsp

Yes, we need to understand the basic concepts and then we shall apply it practically, well organized lecture topics. Great keep going sir.

brindhasenthilkumar

Thank you for explaining SGD+Momentum. I have a much more intuitive understanding of the method now.

melodytune

Utmost respect looking for this theory and the way you explained it is just great

swapnilkushwaha

continue your work. The theoretical concept is very important. The practical implementations won't take much time.

abhishekkaushik

That was a great video.Hope my understanding continues till the end.Only need to know one thing.You don't have to remember all the things .Just know what is going on. THat's all.Thanks

sandipansarkar

You are amazing. Please do not stop making videos.

raminehlopezyazdani

Awesome videos:), I was always confuse with the momentum concept in the optimizer, now I am understanding it crystal and clear.

rishabhkumar-qsjb

Continue sir,

I'm understanding all this This is awesome

Thank you sir for this free educational video, this help mean a lot to us...
Keep continuing....
And I'm click ads so that you can get money in rewards.... 🙏

Artista

Awesome work dude. Really Like your videos..keep going

abhishekkaushik

Awesome Work Sir! your sequence of topics is very well organized

alikalair

SGD with momentum, in the last part at 11:30 min it should be V(t+1) coz we are predicting for future value and hence V(t) will be the recent known value.

adityachandra

I have been following this playlist n I don't wanna lie this whole tuto confused me really bart😂...

CoderX-mchv

Very well explained. Not seen any other tutorial with some much emphasis on foundation. Btw, your video is going out of focus at times, may be your camera is set on auto focus.

ranjithmadhavan

At 10:30, why the learning rate is not multiplied by the term \gamma V_t?

MohandAlbaz

Shouldn"t the last equation be V(t) instead of V(t-1)

darshmehta

Tutorial 14- Stochastic Gradient Descent with Momentum

Tutorial 14- Stochastic Gradient Descent with Momentum

Stochastic Gradient Descent vs Batch Gradient Descent vs Mini Batch Gradient Descent |DL Tutorial 14

The Unreasonable Effectiveness of Stochastic Gradient Descent (in 3 minutes)

Tutorial 12- Stochastic Gradient Descent vs Gradient Descent

Introduction to Deep Learning: 04 forward backward propagation and stochastic gradient descent

02 - Stochastic Gradient Descent Methods

03 - Exercise on Stochastic Gradient Descent in 7 minutes

Stochastic Gradient Descent - Sliding Down A Steep Hill of Math - Simply Put

What is Stochastic Gradient Descent (SGD) Classifier in Machine Learning?

Why Minibatch Gradient Descent in Transformers?

Gradient Descent: The Ultimate Guide to Optimizing Your ML and Deep Learning Models #datascience

Gradient Descent & Stochastic Gradient Descent Explained | Machine Learning Tutorial

Stochastic Gradient Descent

Stochastic Gradient Descent

Math 545 Project: Approximating the Dynamics of Stochastic Gradient Descent

#46: Scikit-learn 43:Supervised Learning 21: Intuition Stochastic Gradient Descent

Lecture 2.6 - Stochastic Gradient Descent Memorabilia

Stochastic Gradient Descent & Mini Batch Gradient Descent Made Easy

Gradient Descent algorithm

GRADIENT DESCENT ALGORITHM IN 15s

Full vs Batch vs Stochastic Gradient Descent in code

what is stochastic gradient descent?

STOCHASTIC GRADIENT DESCENT OPTIMIZER IMPLEMENTATION FROM SCRATCH

2.5) Stochastic Gradient Descent with Momentum