Neural Network Training (Part 3): Gradient Calculation

preview_player
Показать описание
In this video we will see how to calculate the gradients of a neural network. The gradients are the individual error for each of the weights in the neural network. In the next video we will see how these gradients can be used to modify the weights of the neural network.
Рекомендации по теме
Комментарии
Автор

It seems that most people (including myself) are having trouble with the mysterious "F Prime" which is not really explained at all. Thank you very very much to Patterion to clearing this up. If you don't understand how FPrime is calculated then check Patterion's comment. Here is how to get 0.045 -> you use the sigmoid function twice like this 0.25 * (1.0 / (1 + Math.Exp(-1.0 * (1.13)))) * (1 - (1.0 / (1 + Math.Exp(-1.0 * = 0.046 (because we use 1.13 and not 1.278).

bytepushersmusic
Автор

@00YURIN00

It actually is not a derivative of constant, but derivative of the transfer (sigmoid) function then evaluated with the 1.13 constant as parameter for x.

That is:
f=1/(1+e^(-x)) =>
f'=(e^x)/[e^(2x)+2e^(x)+1] =>
f'(1.13)=0.1845 and then
d=0.25*0.1845=0.046.

f is sigmoid function and f' is its derivative.
The 1.13 is actually 1.1278 so that's why 0.046 instead of 0.045.

Patterion
Автор

Also, f`(x) is derivative of sigmoid function s(x).
Sigmoid function is s(x) = 1/(1+e^(-x));
Derivative f`(x) = s(x)(1-s(x)).

jekabskarklins
Автор

Just a note, because it's not mentioned in this video. 
For hidden layer nodes, there are two values written inside, "Sum" and "Out"
Sum, is sum of all node output values from preceding layer multiplied by the value of edge.
Out, is output of activation function which accepts Sum as a parameter.
Activation function is sigmoid function in this example.

jekabskarklins
Автор

Jeff, great lessons but I wish you used a tablet for writing instead of a mouse.

farennikov
Автор

very informative videos, thank you for uploading

jacobschneider
Автор

i guess that is why we have the summation in the delta for the hidden layer

alimohanad
Автор

take it easy you can simply apply delta k = output k (1-output k) (output k -desired output) where K is the output node

alimohanad
Автор

so if i'll be computing for the gradient on weight from I1-H1, i'll be using the value in I1 and multiply it to -0.02 (in color blue)? is that right?

rgbondad
Автор

what if the weights are given, how do you estimate them based on the data given to you?

blazekiller
Автор

how do you find the derivative of 1.13? Is it just 1.13^-1

MaxI-npne
Автор

Sory. i finaly get it. 0.25 is error, x is sum of inputs,
i can us formula 0.25 * s(x)(1-s(x)), where s(x) is sigmoid(1.13)
and yes mee to get 1.1278 not 1.13, so i am finaly geting closer to resolts :)

if for mee it was hard, maybee this will help some other beginer.

MrSillymee
Автор

I think you want error closest to 0, not lowest possible value of error.

ryanavery
Автор

one question.

how to get  [small delta] from  f '(Sum) * (Sum of W from k to i ) * [small delta k]  if there is more than 1 neuron in output layer.

it seems that every neuron in output layer will give different [small delta].  witch neuron [small delta] use as [small delta k] for equation?
it's all easy for now but this.

hexenpl
Автор

When i calculated H2 i got 0, 005 (not minus) -> Input in wolframalpha: (sigmoid(1, 05) * (1-sigmoid(1, 05))) * 0, 045 * (0, 58).
Maybe I am missing something?

BugheroTheDon
Автор

0.25*F'(1.13) = 0.047 Amazing. How did you do that?

complicated
Автор

i try sigmoid(x)^2 (-e^(-x)) and i get -0.184546...
but i canot figuret out where you get this 0.25 * ...?

and i have nut study high math, so all this is hard for mee.

MrSillymee
Автор

Man you spent so much time for this Tutorials. Thank you for this. But really, its just horrible that u try to paint with the  Mouse. Pls get a Tablet!

Aviszzs
Автор

I had to shave twice while watching this video

wnbdriver
Автор

For example if we would have similar network as in the example but there would be a 3 hidden layers instead of one hidden layer would be a Gradient calculation procedure be same as in video or some modifications need to be done? Thank you in advance for reply!

AsuusG