01L – Gradient descent and the backpropagation algorithm

preview_player
Показать описание
Speaker: Yann LeCun

Chapters
00:00:00 – Supervised learning
00:03:43 – Parametrised models
00:07:23 – Block diagram
00:08:55 – Loss function, average loss
00:12:23 – Gradient descent
00:30:47 – Traditional neural nets
00:35:07 – Backprop through a non-linear function
00:40:41 – Backprop through a weighted sum
00:50:55 – PyTorch implementation
00:57:18 – Backprop through a functional module
01:05:08 – Backprop through a functional module
01:12:15 – Backprop in practice
01:33:15 – Learning representations
01:42:14 – Shallow networks are universal approximators!
01:47:25 – Multilayer architectures == compositional structure of data
Рекомендации по теме
Комментарии
Автор

Thanks for posting these! With this, you reach a very wide audience and help anyone who does not have access to such teachers and universities! 👏

AICoffeeBreak
Автор

You’re doing a massive favor to the community who wants to access to high quality content without paying a huge amount of money. Thank you so much!

makotokinoshita
Автор

From seeing the name of Yan in a research paper during a literature survey in my internship program, to attending his lectures is really a thriller. Quite enriching and mathematically profound stuff here. Thanks for sharing it free!

sutirthabiswas
Автор

Wow! Yann is such a great teacher. I thought I knew this material fairly well, but Yann is enriching my understanding with every slide. It seems to me that his teaching method is extremely efficient. I suppose that's because he has such a deep understanding of the material.

dr.mikeybee
Автор

Thanks very much for the content. What a time to be alive. To hear from the master himself.

johnhammer
Автор

I can totally see how a quantum computer could be used to perform gradient descent in all directions simultaneously, helping to find the true global minimum across all valleys in one go! 😲 It's mind-blowing to think about the potential for quantum computing to revolutionize optimization problems like this!

OpenAITutor
Автор

You are a great man. Thanks to you someone even in a third world country can learn DL from one of the inventors himself. THIS IS CRAZY!

thanikhurshid
Автор

At 1:05:40 Yann is explaining the two jacobians, but I was having trouble getting the intuition. Then I realized that the first jacobian was getting the gradient to modify the weights w[k+1] for function z[k+1] and the second jacobian was back propagating the gradient to function z[k] which can then be used to calculate the gradient at k for yet another jacobian to adjust weights w[k]. So one jacobian is for the parameters and the other is for the state since both the parameter variable and state variable are column vectors. Yann explains it really well. I'm amazed that I seem to be understanding this complicated mix of symbols and logic. Thank you.

dr.mikeybee
Автор

I just can't believe this content is free. Amazing! Long life to Open Source! Grazie Alfredo :)

neuroinformaticafbf
Автор

That is my honor to learn from you and Sir...

mahdiamrollahi
Автор

Mehn!! these are gold.. especially for people who don't have access to these types of teachers, and methods of teaching, plus the material etc (that's a lot of people actually).

fuzzylogicq
Автор

I have watched this lecture twice in the last year. Mister LeCun is great! :)

copuzvv
Автор

Thank you so much for sharing this 🥰 This was the best video for learning gradient descent and backpropagation.

monanasery
Автор

I really love that discussion about solving non convex problems.... finally we get out of the books ! At least we unleash our mind.

alexandrevalente
Автор

Thank you so much, Alfredo, for organizing the material in such a nice and compact way for us! The insights of Yann and your examples, explanations and visualization are an awesome tool for anybody willing to learn (or to remember stuff) about deep learning. Greetings from Greece and I owe you a coffee, for your tireless effort.

PS. Sorry for my bad English. I am not a native speaker.

mpalaourg
Автор

This intimate atmosphere allows for a better understanding of the subject matter. Great questions 【ツ】 and of course great answers. Thank you

mataharyszary
Автор

Alfredo Canziani ... drinks are on me if you ever visit India ... this is extremely high quality content!

gurdeeepsinghs
Автор

I don't know if this helps anyone, but it might. Weighted sums like s[0] are always to the to the first power. There are no squared weighted sums or cubed. So the derivative using the power rule of nx to the first power is equal to n. The derivative of ws[0] is always the weight w. That's why the application of the chain rule is so simple. Here's some more help. If y=2x, y'=2. If q=3y, q'=3; so y(q(x))' = 2 * 3. Picture the graph of y(q(x)), What is the slope? It's 6. And as many layers as you add in a neural net, the partial slopes will be multiples of the weights.

dr.mikeybee
Автор

Discussion on stochastic gradient descent (12:23) and with adams (1:16:15) are great. General misconception.

WeAsBee
Автор

Great content! It’s just great to have this quality information available

jobiquirobi