Synthetic Gradients Explained

Показать описание

DeepMind released an optimization strategy that could become the most popular approach for training very deep neural networks, even more so than backpropagation. I don't think it got enough love, so i'm going to explain how it works myself and why i think it's so cool. Already know how backpropagation works? Skip to 14:10

Code for this video:

Please Subscribe! And like. And comment. Thats what keeps me going.

Follow me on:

Snapchat: @llSourcell

More learning resources:

Join us in the Wizards Slack channel:

Signup for my newsletter for exciting updates in the field of AI:

Рекомендации по теме

Комментарии

Dude! This is the first time the back propegation has made any type of sense to me! Thaks! I actually also like the slower format for these technical topics. And seeing the code made all the complicated abstract math much more concrete and understandable. Best!

hannobrink

SIRAJ !! YOU ARE THE BEST MAN !! THANKS SO MUCH FOR THE EFFORT YOU PUT INTO MAKING THE VIDEOS AND FOR KEEPING US SHARP ! I am a PhD student working with neural networks and support vector machines and you HAVE MADE THINGS SO MUCH FUN !

armandduplessis

would have been awesome to run it on the same example and show the difference and how it would outperform the regular method

HusoAmir

I have been following your channel since its inception. You had 67 subscriptions when I joined, congratulations on crossing 200k subscription. It is commendable achievement to get so many subscriptions for programming/technical channel. Keep up the good work.

vikram.pandya

It's really helpful that you're using pure python. I felt kinda lost when using the built in functions of tensorflow, but now it makes much more sense how it's all connected. Thank you!

dpmitko

nice video siraj, but have I missed it or you did not compare training results of the synthetic gradients with normal back-prop? That should have been IMO a part of this vid.

seanpedersen

Been waiting for this for so long. Thanks!

rahulsoibam

Thanks for the video! It's hard to keep up with all these new techniques in machine learning. You make it a lot easier to do so.

Fireking

Siraj, you are a beast! Only few months after it's posted on arXiv, and you are already well into the paper. A synthetic gradient model tells me you will be one of the multi-millionaires in future :D Might meet each other one day, who knows

IgorAherne

Very interesting video and technique. Just a suggestion: timings and convergence rate for the two versions of the binary adder network would have been useful to show the speed-up with the use of synthetic gradients.

tonyholdroyd

Man you explained it so beautifully....you are awesome....!!

shivroh

Hinton was suggesting that backprop had run its course only a month ago. Everyone is obsessed with it, but it is clearly suboptimal.

And here you are, putting forward new ideas already. Love your neuroplasticity.

antonylawler

Could you do a video on quasi-diagonal Riemannian stochastic gradient? I think it deserves way more popularity!

jat

Holy moly! I know this. I have seen this before. This is bootstrapping TD learning from RL

solidsnake

@Siraj, we are still back-propagating the true gradient in the Synthetic Gradient right, which is more like a back-propagation. As in the code example we are sequentially doing those steps for each iteration, we might not be seeing much power of Synthetic Gradient right, I mean does this method manly to allow the parallel processing? If so, Can the update of the synthetic weight has to happen at least once in an iteration ? if that is the case we might still be having the sequential dependency we had in back-propagation right ?

TummalaAnvesh

Siraj, I'm interested to know what sources you use, whether aggregate or primary, to keep up to date on new technology. Do you read papers from any particular publication? GitHub pages of particular research projects? Twitter feeds that aggregate promising research?

guypersson

A very interesting method they introduced there. But in my opinion, the synthetic gradient will only shine at more complex architectures because of the overall lockdown of the first layers. I believe that is the reason why you did not run it in this video - with only 3 Layers the normal back propagation would have been faster.

highqualityinspector

Sounds like at each but the last layer (output layer) the synthetic gradient is using a second order approximation: the gradient of the gradient from the next layer. which might explain why the technique would only improve accuracy in large numbers capturing the loss in the gradient propagating back through many layers.

jxchtajxbt

Hey Siraj, why do you transpose the matrix near 24:02 ?

luizgarciaaa

Thanks Siraj. Since I heard of this technique from a lecture by Alex Graves. I have been interested in Synthetic Gradients. Google goes further with self gated Activation functions. Can hardly keep up with the rate of progress.

menzithesonofhopehlope

Synthetic Gradients Explained

Synthetic Gradients Explained

Synthetic Gradients Tutorial - How to Speed Up Deep Learning Training

Synthetic Gradients – Decoupling Layers of a Neural Nets: Anuj Gupta

Gradient Descent in 3 minutes

Gradient descent, how neural networks learn | DL2

Gradient Descent Explained

| colourful liquid density gradient | layers of liquid in glass |Awesome science experiment

Gradient Descent Machine Learning

Vanishing & Exploding Gradient explained | A problem resulting from backpropagation

Mastering Gradient Descent | The Heart of Machine Learning Algorithms Explained

Neural Networks explained in 60 seconds!

Learning to Generate 3D Training Data Through Hybrid Gradient

What is Gradient Descent in Machine Learning?

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

Tutorial 7- Vanishing Gradient Problem

Gradient Boosting Explained #datascience #machinelearning #statistics #boosting #math

Backpropagation, step-by-step | DL3

Gradient descent simple explanation|gradient descent machine learning|gradient descent algorithm

Exploding Gradients Explained: The Hidden Danger in Deep Learning!

Gradient Descent|Machine Learning

Model interpretability with Integrated Gradients - Keras Code Examples

Gradient Descent Explained: How It Works and Finds the Minimum of Complex Functions #CodeMonarch

AI Explained Video Series - Learn about Explainable AI and MLOps: What are Integrated Gradients?

AI Explained – Gradient Descent | A Machine Learning Tool