Neural Network Training (Part 3): Gradient Calculation

Показать описание

In this video we will see how to calculate the gradients of a neural network. The gradients are the individual error for each of the weights in the neural network. In the next video we will see how these gradients can be used to modify the weights of the neural network.

Рекомендации по теме

Комментарии

It seems that most people (including myself) are having trouble with the mysterious "F Prime" which is not really explained at all. Thank you very very much to Patterion to clearing this up. If you don't understand how FPrime is calculated then check Patterion's comment. Here is how to get 0.045 -> you use the sigmoid function twice like this 0.25 * (1.0 / (1 + Math.Exp(-1.0 * (1.13)))) * (1 - (1.0 / (1 + Math.Exp(-1.0 * = 0.046 (because we use 1.13 and not 1.278).

bytepushersmusic

@00YURIN00

It actually is not a derivative of constant, but derivative of the transfer (sigmoid) function then evaluated with the 1.13 constant as parameter for x.

That is:
f=1/(1+e^(-x)) =>
f'=(e^x)/[e^(2x)+2e^(x)+1] =>
f'(1.13)=0.1845 and then
d=0.25*0.1845=0.046.

f is sigmoid function and f' is its derivative.
The 1.13 is actually 1.1278 so that's why 0.046 instead of 0.045.

Patterion

Also, f`(x) is derivative of sigmoid function s(x).
Sigmoid function is s(x) = 1/(1+e^(-x));
Derivative f`(x) = s(x)(1-s(x)).

jekabskarklins

Just a note, because it's not mentioned in this video.
For hidden layer nodes, there are two values written inside, "Sum" and "Out"
Sum, is sum of all node output values from preceding layer multiplied by the value of edge.
Out, is output of activation function which accepts Sum as a parameter.
Activation function is sigmoid function in this example.

jekabskarklins

Jeff, great lessons but I wish you used a tablet for writing instead of a mouse.

farennikov

very informative videos, thank you for uploading

jacobschneider

i guess that is why we have the summation in the delta for the hidden layer

alimohanad

take it easy you can simply apply delta k = output k (1-output k) (output k -desired output) where K is the output node

alimohanad

so if i'll be computing for the gradient on weight from I1-H1, i'll be using the value in I1 and multiply it to -0.02 (in color blue)? is that right?

rgbondad

what if the weights are given, how do you estimate them based on the data given to you?

blazekiller

how do you find the derivative of 1.13? Is it just 1.13^-1

MaxI-npne

Sory. i finaly get it. 0.25 is error, x is sum of inputs,
i can us formula 0.25 * s(x)(1-s(x)), where s(x) is sigmoid(1.13)
and yes mee to get 1.1278 not 1.13, so i am finaly geting closer to resolts :)

if for mee it was hard, maybee this will help some other beginer.

MrSillymee

I think you want error closest to 0, not lowest possible value of error.

ryanavery

one question.

how to get [small delta] from f '(Sum) * (Sum of W from k to i ) * [small delta k] if there is more than 1 neuron in output layer.

it seems that every neuron in output layer will give different [small delta]. witch neuron [small delta] use as [small delta k] for equation?
it's all easy for now but this.

hexenpl

When i calculated H2 i got 0, 005 (not minus) -> Input in wolframalpha: (sigmoid(1, 05) * (1-sigmoid(1, 05))) * 0, 045 * (0, 58).
Maybe I am missing something?

BugheroTheDon

0.25*F'(1.13) = 0.047 Amazing. How did you do that?

complicated

i try sigmoid(x)^2 (-e^(-x)) and i get -0.184546...
but i canot figuret out where you get this 0.25 * ...?

and i have nut study high math, so all this is hard for mee.

MrSillymee

Man you spent so much time for this Tutorials. Thank you for this. But really, its just horrible that u try to paint with the Mouse. Pls get a Tablet!

Aviszzs

I had to shave twice while watching this video

wnbdriver

For example if we would have similar network as in the example but there would be a 3 hidden layers instead of one hidden layer would be a Gradient calculation procedure be same as in video or some modifications need to be done? Thank you in advance for reply!

AsuusG

Neural Network Training (Part 3): Gradient Calculation

Neural Network Training (Part 3): Gradient Calculation

10.16: Neural Networks: Backpropagation Part 3 - The Nature of Code

Part 3: Training the Neural Network

Introduction to Neural Networks - Part 3: Convolution Neural Networks (Cyrill Stachniss, 2021)

Better training of our neural network - Let's code a neural network in plain JavaScript Part 3

Coding a Neural Network: A Beginner's Guide (part 3)

Introduction to Deep Learning - 8. Training Neural Networks Part 3 (Summer 2020)

Neural Network from scratch - Part 3 (Backward Propagation)

EfficientML.ai Lecture 7 - Neural Architecture Search Part I (MIT 6.5940, Fall 2024)

Neural Network Fundamentals (Part3): Regression

How To Code A Neural Network From Scratch Part 3 - Activating a neuron

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

ADAMS - L3 - Neural Network Training - Part 3

'How neural networks learn' - Part III: Generalization and Overfitting

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

What is backpropagation really doing? | Chapter 3, Deep learning

Neural Structured Learning - Part 3: Training with synthesized graphs

Neural Network Calculation (Part 3): Feedforward Neural Network Calculation

Building makemore Part 3: Activations & Gradients, BatchNorm

CS231n Winter 2016: Lecture 6: Neural Networks Part 3 / Intro to ConvNets

NEURAL NETWORKS, PART 3! - CS50 Live, EP. 58

Introducing convolutional neural networks (ML Zero to Hero - Part 3)

Neural Networks Demystified [Part 3: Gradient Descent]

Lecture 5: Neural Networks: Learning the network: Part 3