An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

Показать описание

If deep neural networks are so powerful, why aren’t they used more often? The reason is that they are very difficult to train due to an issue known as the vanishing gradient.

Deep Learning TV on

To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop. During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value.

Backprop suffers from a fundamental problem known as the vanishing gradient. During training, the gradient decreases in value back through the net. Because higher gradient values lead to faster training, the layers closest to the input layer take the longest to train. Unfortunately, these initial layers are responsible for detecting the simple patterns in the data, while the later layers help to combine the simple patterns into complex patterns. Without properly detecting simple patterns, a deep net will not have the building blocks necessary to handle the complexity. This problem is the equivalent of to trying to build a house without the proper foundation.

Have you ever had this difficulty while using backpropagation? Please comment and let me know your thoughts.

So what causes the gradient to decay back through the net? Backprop, as the name suggests, requires the gradient to be calculated first at the output layer, then backwards across the net to the first hidden layer. Each time the gradient is calculated, the net must compute the product of all the previous gradients up to that point. Since all the gradients are fractions between 0 and 1 – and the product of fractions in this range results in a smaller fraction – the gradient continues to shrink.

For example, if the first two gradients are one fourth and one third, then the next gradient would be one fourth of one third, which is one twelfth. The following gradient would be one twelfth of one fourth, which is one forty-eighth, and so on. Since the layers near the input layer receive the smallest gradients, the net would take a very long time to train. As a subsequent result, the overall accuracy would suffer.

Credits
Nickey Pickorita (YouTube art) -
Isabel Descutner (Voice) -
Dan Partynski (Copy Editing) -
Jagannath Rajagopal (Creator, Producer and Director) -

Рекомендации по теме

Комментарии

This clip explains why deep neural nets are so hard to train. If you've used backprop before, you would relate to this. Enjoy :-)!

DeepLearningTV

Thanks for this channel!! I really appreciate your simplified aproach in order to grasp the core concepts.
Looking fordward for the next videos.
Keep the good work!!!

albertoferreiro

Wow I really sounded like I might cry before a year of voice and speech 😂

IsabelsChannel

Yo breath. You sound like you did a long run befor every single sentence :D

Tarnov

Hi. I just wanted to know that i love you. That is all. Goodbye :)

Jabrils

at 3:05 should it say "it starts with the right ?"

joesgarage

why are the gradient values between 0 and 1?

gitanjalinair

How are the author names of those three papers spelled please? I promise it's for research (and not for masterbaiting)

fosheimdet

Wow this video is very useful. Hope to watch your videos more!

hoangtrunghieu

Thank you so much. I'm currently going through an intro to deep learning, going through basics of supervised and unsupervised networks. The instructor explaining this kept rambling on and confusing me. This is very helpful!

Foogly

Hey yea let me know what you find - backprop with RELU is a solution to beat the vanishing gradient. Check it out and let us know what you found :-)

DeepLearningTV

The learning rate was also wort mentioning. The deeper is the net, the more it's prone to jumping over the cost minima. I remember how in the 80's ppl invented all the various trick, such as adding noise to weights to remedy this problem. You can find as sweet spot where the convergence improves, but... that was the 80's, so we all know how it turned out back then.

cykkm

This channel is AMAZING! I love it when something's so neatly explained that even my grandma can understand. Great job fellas! :D

ajayshaan

Hi. Did the forward propagation is characterize as a training method for the neural network or it just a way of neural network to classify the input data?

ChingMavis

Nice lecture and nice voice. What tool did you use for this lecture ?

nguyenxuanhung

I'm a bit confused. Gradient values don't necessary have to be values between 0 and 1 right? One of the arguments made was that the gradient at earlier layers become smaller and smaller because of the compounding multiplications of values between 0 and 1. Can anyone help me understand?

kevintan

Yes that is the problem I was stuck for some time. Since I learned magic of back propagation I was under impression that it is the solution and only we need more power in computer. Later I heard it does not work well and finally learned what is the problem by going through online course on Coursera. Unfortunately it does not have yet solution.

I watched lectures from Geff Hinton and Oxford and was not able to grasp a solution.

Finally after this video that matches my current state and next video that gives me an idea how it can be solved. Still did not try it myself but at least got an idea about solution and it feels right.

+1

Thanks again

kkochubey

when you're talking numbers and doing calculations orally, its better if you can write them graphically as u go along

EranM

Oh man! I'm facing this problem right now. It took may more than 12 hours and the cost still around 0.48

farisalasmary

I also think in another point of view, training the deep ANNs with back prop, will lead to something that we've already knew as curse of dimension. In fact in another way you have explained it very well.

sillfsxa

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

Justice: What's The Right Thing To Do? Episode 01 'THE MORAL SIDE OF MURDER'

Atharva Ka Idea | Wagle Ki Duniya | Ep 1092 | Full Episode | 28 Sep 2024

The Wet Painters 🎨 FULL EPISODE in 5 Minutes | SpongeBob

FULL EPISODE: Everybody Still Hates Chris - Episode 1

Old One 1 - Cthulhu auf einer Raumstation im Jahr 3000 - Episode 2: Die Pilger

Wagle Ki Duniya - Wake Up Atharva! - Ep 192 - Full Episode - 10th November 2021

Ada Twist, Scientist [FULL EPISODE] Cake Twist and Garden Party | Netflix Jr

Dad Is Too 'Old School' With Discipline | The Potter Family Full Episode | Supernanny

Food Talks With Chefs | Laughter Chefs

doreamon season 1 episode 1 advance rental capsule

Harshad Humiliates Jyoti - Wagle Ki Duniya - Ep 331 - Full Episode - 21 April 2022

Parenting Tip - Wagle Ki Duniya - Ep 404 - Full Episode - 15 July 2022

Wagle Ki Duniya - Rajesh Calls Kiara Adolf Hitler - Ep 189 - Full Episode - 6th November 2021

ReBoot Rewind S1 Ep 03

Wagle Ki Duniya - Atharva And Vidyut Switch There House! -Ep 198 - Full Episode - 17th November 2021

Rajesh Came Home Early - Wagle Ki Duniya - Ep 284 - Full Episode - 25 Feb 2022

PAW Patrol Mighty Pups Charged Up ⚡ Ep. #2 🐶 Nick Jr.

Mother Denied Paternity When Better Man Came Along (Full Episode) | Paternity Court

My Triangular Love Story | Ep 5| Unseen Village Love | 4K | Sri | Creative Thinks

The Van Acker Family Full Episode | Season 7 | Supernanny USA

The Krolikowski Family Full Episode | Season 5 | Supernanny USA

MOM HACKS ℠ | Sick & Ouchie! (Ep.8)

New Home, New Problems - Rimworld Anomaly Ep. 10 [Rimworld Sea Ice Randy 500%]