23. Accelerating Gradient Descent (Use Momentum)

Показать описание

MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018
Instructor: Gilbert Strang

In this lecture, Professor Strang explains both momentum-based gradient descent and Nesterov's accelerated gradient descent.

License: Creative Commons BY-NC-SA

Рекомендации по теме

Комментарии

Jesus man, I remember back before I started college when I checked out Prof Strang’s calculus series.
He’s aged quite a lot since that series, but he’s always sharp as a tack. And I’m just astonished that even being so old he knows so much about machine learning, I didn’t think it was his field.
Huge kudos Gilbert Strang, huge kudos.

gigik

Such a great lecturer, as well as in his classic Linear Algebra lecture series. Really nice to see him up and healthy, sharp and as a great step-by-step-explainer as ever.

franzdoe

Professor Strang, thank you for an old fashion lecture on Accelerating Gradient Descent.
These topics are very theoretical for the average student.

georgesadler

Why is there no more comments for such a great course? MIT is a great university!

dengdengkenya

I'm so happy to see you here. I only trust you when it comes to lecture

nguyenbaodung

Wow this old man is so smart. I would wish to see more lectures from him and learn much more of this stuff.

marjavanderwind

He radiates knowledge. Love the content!

honprarules

Those who have sixth edition of Introduction to Linear Algebra can enjoy this course!!! In my view this course really increases the value of the book.

Arin

I loved this amazing lecture. Great professor, and great content. Thanks for sharing it openly on YouTube.

MsVanessasimoes

Prof Boyd is also very good teacher !
I enjoy his lecture very much.

何浩源-ry

Finally a lecture that explains the magic numbers in momentum! Those shorter video formats are great for introduction but leave me confused about the math behind it. Love the ground up approach to explaining.

Could any one tell me what the book that Professor Strang mentioned in 06:53 of the lecture is?

casual_dancer

At 27:00 why follow the direction of eigenvalue? It just comes out of no where

vnpikachu

such great lecturing makes me wonder what part of MIT student success is due to innate ability and how much due to superior teaching

vaisuliafu

Crystal clear! Thank you very much for sharing it

antaresd

It’s nice you got it on a linear line.

brendawilliams

why is it enough to assume x follows an eigenvector to demonstrate the rate of convergence?

Schweini

Tough course to follow, from what I feel (I'm currently in my 4th semester of undergrad)
Great lecture of Prof Gilbert, I feel kinda dumb after listening to this lecture, will try again

newbie

wow, beautiful, now i see why it oscillates

meow

why do we need to make the eigen vector as small as possible ?

vishalpoddar

Can this procedure be expanded to deal with problems in multiple dimensions? So a, b, c, and d are not scalars but actually vectors themselves, representing the inputs x1, x2, x3 to a function f(x1, x2, x3). How would you form R that way, and would you have different condition numbers for each element of b?

alessandromarialaspina

23. Accelerating Gradient Descent (Use Momentum)

23. Accelerating Gradient Descent (Use Momentum)

Accelerate Gradient Descent with Momentum (in 3 minutes)

Gradient descent with momentum

Deep Learning(CS7015): Lec 5.5 Nesterov Accelerated Gradient Descent

Gradient Descent With Momentum (C2W2L06)

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Nesterov's Accelerated Gradient Method - Part 1

Nesterov's Accelerated Gradient

CS 152 NN—8: Optimizers—Nesterov with momentum

Accelerating Gradient Descent with Momentum

optimizers comparison: adam, nesterov, spsa, momentum and gradient descent.

Gradient Descent with momentum and Steepest Descent

Acceleration of Gradient-Based Path Integral Method for Efficient Optimal and Inverse Optimal Contro

Intro to Gradient Descent || Optimizing High-Dimensional Equations

Gradient Descent with Momentum and Nesterov's Accelerated Gradient

Tutorial 14- Stochastic Gradient Descent with Momentum

On momentum methods and acceleration in stochastic optimization

Nesterov Accelarated Gradient Descent

Gradient Descent with Nesterov Momentum

Part 3. Convergence of gradient descent and Nesterov's accelerated gradient using ODE

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

4.2 Accelerated Gradient Descent

Nesterov's Accelerated Gradient Method - Part 2

Lecture 43 Optimisers Momentum and Nesterov Accelerated Gradient NAG Optimiser