AdaGrad Explained in Detail with Animations | Optimizers in Deep Learning Part 4

Показать описание

Adaptive Gradient Algorithm (Adagrad) is an algorithm for gradient-based optimization. The learning rate is adapted component-wise to the parameters by incorporating knowledge of past observations.

============================
Do you want to learn from me?
============================

📱 Grow with us:

👍If you find this video helpful, consider giving it a thumbs up and subscribing for more educational videos on data science!

💭Share your thoughts, experiences, or questions in the comments below. I love hearing from you!

⌚Time Stamps⌚

00:00 - Intro
00:15 - Adaptive Gradient Introduction
03:42 - Elongated Bowl Problem
07:22 - Visual Reprensation
09:42 - How do Optimizers behave?
17:22 - Mathematical Intuition
24:20 - Disadvantage
26:16 - Outro

Рекомендации по теме

Комментарии

Firstly, thanks for your sessions.
Answer to your question is that firstly the update in normal feature which is "b" in your case is more significant than update of "m" but once it reaches to its optimum value then the change in weight in b is near to zero whereas m has not reached to its optimum value. So, it still tries to get its optimum value and continue to change even with small change in weight and thus take large time but still converge.

ManbirSinghMago

The answer to your question that after a particular point the curve descending max at first where input feature is not sparse compared to sparse feature can be understood by the graph itself, at first it descends in one direction where feature is not sparse till it reaches max achievable location then from there it has to descent towards that direction where it has not moved in our case the sparse feature direction, since now weights update are being noticed as other one has already reaced its max descent location

satyamgawade

Brilliantly Explained Sir, ..,. I do not see such deep concepts well explained in any other Youtube / text books ... 🙂

SidIndian

Sir your details 🛐🛐, got a satisfactory reasons for every changes.

TNSR

3:22 revising concepts for the placement from IIT kgp with your playlist and suddenly you took the example of IIT, this made me jovial sir and Thanks for your help

Shivam_kgp

Thanks for the amazing content sir....your explanation are so good we don't have a chance to forget the concept in our entire life 😅

ramsu

17:06 : The Answer is, firstly when b was normal, the algorithm was trying to reduce the error, for b eventually reaching the optimal value for b, but after getting optimal value for b, it starts to get the optimal value for x,

innocentgamer

Sir thanks alot for providing such education for free to is Indians. You are helping in soo much. May god bless you sir. ♥️♥️Much love.

alroygama

A simple explanation for the movement in the m direction that sir mentioned would be that as the curve is updated, the b value is dominating in the update process. Finally, as seen in the graph b reach almost 0 value hence all the value of b is zero making b much more sparse than m. so now the same thing that happened with m will happen to b but even more aggressively making m the only updating parameter.

sheiphanshaijan

17:04 At first b was dominating and step size was large but as we start taking small steps the resultant gradient of m which was smoll already is being noticed, and start participating in finding optmial minima. I think that's why it start moving in that direction. Please correct.

Sara-fpzw

So detailed and nicely explained. Man I logged in just to subscribe !!

SleepeJobs

Nitish sir... Tysm for osm deep learning videos♥️ btw aapse ek request se zara time series par ek 3-4 hr ki video banado na i mean beginner ya entry level k hissab se ya phir aapke pehle ka video hoga to bhi chalega uako compile krke daaldo aap... muze time series padna hai actually aapse ho and mera 3rd week of aug interview hai

raj

Answer: What we are observing here is relative update in b and m in respective directions. Assume that both b and m need delta change during training process from their initial values. Now b in able to achieve that delta update in few initial epochs and has attained the right value. But m is little slower learner so thats why it took it some more epochs to reach that stage.

radhikawadhawan

Sir you are really really great teacher, moreover your content is premium

saduddinshaikh-rt

You're Love. Thank you for existing.

arskas

Answer - value of b reached to the optimal solution. So derivatives become near to zero, that stops the updation in b axis. Now respectively derivatives of w become large, so training takes speed towards w axis.

Bhudeep

A student from IIT watching your Vedio, thank you!!! 03:22

Kumar-clk

Thanks Nitesh for this easy explanation

narendraparmar

At 7:26, b is on x axis and m is on y changing b is changing the loss more . It should be opposite.

shubhamkumar-nwui

17:00 bcz value of b already reached a point where gradient became zero .In other words most optimal value of b was near so it started updating by less amount and eventually saturated .

AbhinavSharma-yflz

AdaGrad Explained in Detail with Animations | Optimizers in Deep Learning Part 4

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Tutorial 15- Adagrad Optimizers in Neural Network

AdaGrad Optimizer For Gradient Descent

AdaGrad Explained in Detail with Animations | Optimizers in Deep Learning Part 4

Adagrad and RMSProp Intuition| How Adagrad and RMSProp optimizer work in deep learning

Optimizers - EXPLAINED!

Adagrad Algorithm Explained and Implemented from Scratch in Python

Adam Optimizer Explained in Detail | Deep Learning

Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!

Rachel Ward (UT Austin) -- SGD with AdaGrad Adaptive Learning Rate

Adam, AdaGrad & AdaDelta - EXPLAINED!

Adam Optimizer Explained in Detail with Animations | Optimizers in Deep Learning Part 5

Optimizers in Neural Networks | Adagrad | RMSprop | ADAM | Deep Learning basics

Lecture 44 : Optimisers: Adagrad Optimiser

Deep Learning-All Optimizers In One Video-SGD with Momentum,Adagrad,Adadelta,RMSprop,Adam Optimizers

L26/2 Momentum, Adagrad, RMPProp in Python

Ada Grad and Ada Delta Optimizer || Lesson 14 || Deep Learning || Learning Monkey ||

The Evolution of Gradient Descent

RMSProp Explained in Detail with Animations | Optimizers in Deep Learning Part 5

AdaGrad (Adaptive Gradient Algorithm) Optimizer

AdaBoost, Clearly Explained

Adaptive Learning Rate Algorithms - Yoni Iny @ Upsolver (Eng)

AdaGrad | Deep Neural Network | Data Science | NADOS

Optimization Algorithms