Synthetic Gradients Explained

preview_player
Показать описание
DeepMind released an optimization strategy that could become the most popular approach for training very deep neural networks, even more so than backpropagation. I don't think it got enough love, so i'm going to explain how it works myself and why i think it's so cool. Already know how backpropagation works? Skip to 14:10

Code for this video:

Please Subscribe! And like. And comment. Thats what keeps me going.

Follow me on:

Snapchat: @llSourcell

More learning resources:

Join us in the Wizards Slack channel:

Signup for my newsletter for exciting updates in the field of AI:
Рекомендации по теме
Комментарии
Автор

Dude! This is the first time the back propegation has made any type of sense to me! Thaks! I actually also like the slower format for these technical topics. And seeing the code made all the complicated abstract math much more concrete and understandable. Best!

hannobrink
Автор

SIRAJ !! YOU ARE THE BEST MAN !! THANKS SO MUCH FOR THE EFFORT YOU PUT INTO MAKING THE VIDEOS AND FOR KEEPING US SHARP ! I am a PhD student working with neural networks and support vector machines and you HAVE MADE THINGS SO MUCH FUN !

armandduplessis
Автор

would have been awesome to run it on the same example and show the difference and how it would outperform the regular method

HusoAmir
Автор

I have been following your channel since its inception. You had 67 subscriptions when I joined, congratulations on crossing 200k subscription. It is commendable achievement to get so many subscriptions for programming/technical channel. Keep up the good work.

vikram.pandya
Автор

It's really helpful that you're using pure python. I felt kinda lost when using the built in functions of tensorflow, but now it makes much more sense how it's all connected. Thank you!

dpmitko
Автор

nice video siraj, but have I missed it or you did not compare training results of the synthetic gradients with normal back-prop? That should have been IMO a part of this vid.

seanpedersen
Автор

Been waiting for this for so long. Thanks!

rahulsoibam
Автор

Thanks for the video! It's hard to keep up with all these new techniques in machine learning. You make it a lot easier to do so.

Fireking
Автор

Siraj, you are a beast! Only few months after it's posted on arXiv, and you are already well into the paper. A synthetic gradient model tells me you will be one of the multi-millionaires in future :D Might meet each other one day, who knows

IgorAherne
Автор

Very interesting video and technique. Just a suggestion: timings and convergence rate for the two versions of the binary adder network would have been useful to show the speed-up with the use of synthetic gradients.

tonyholdroyd
Автор

Man you explained it so beautifully....you are awesome....!!

shivroh
Автор

Hinton was suggesting that backprop had run its course only a month ago. Everyone is obsessed with it, but it is clearly suboptimal.

And here you are, putting forward new ideas already. Love your neuroplasticity.

antonylawler
Автор

Could you do a video on quasi-diagonal Riemannian stochastic gradient? I think it deserves way more popularity!

jat
Автор

Holy moly! I know this. I have seen this before. This is bootstrapping TD learning from RL

solidsnake
Автор

@Siraj, we are still back-propagating the true gradient in the Synthetic Gradient right, which is more like a back-propagation. As in the code example we are sequentially doing those steps for each iteration, we might not be seeing much power of Synthetic Gradient right, I mean does this method manly to allow the parallel processing? If so, Can the update of the synthetic weight has to happen at least once in an iteration ? if that is the case we might still be having the sequential dependency we had in back-propagation right ?

TummalaAnvesh
Автор

Siraj, I'm interested to know what sources you use, whether aggregate or primary, to keep up to date on new technology. Do you read papers from any particular publication? GitHub pages of particular research projects? Twitter feeds that aggregate promising research?

guypersson
Автор

A very interesting method they introduced there. But in my opinion, the synthetic gradient will only shine at more complex architectures because of the overall lockdown of the first layers. I believe that is the reason why you did not run it in this video - with only 3 Layers the normal back propagation would have been faster.

highqualityinspector
Автор

Sounds like at each but the last layer (output layer) the synthetic gradient is using a second order approximation: the gradient of the gradient from the next layer. which might explain why the technique would only improve accuracy in large numbers capturing the loss in the gradient propagating back through many layers.

jxchtajxbt
Автор

Hey Siraj, why do you transpose the matrix near 24:02 ?

luizgarciaaa
Автор

Thanks Siraj. Since I heard of this technique from a lecture by Alex Graves. I have been interested in Synthetic Gradients. Google goes further with self gated Activation functions. Can hardly keep up with the rate of progress.

menzithesonofhopehlope