The Gradient of Mean Squared Error — Topic 78 of Machine Learning Foundations

Показать описание

#MLFoundations #Calculus #MachineLearning

In this video, we first derive by hand the gradient of mean squared error (a popular cost function in machine learning, e.g., for stochastic gradient descent. Secondly, we use the Python library PyTorch to confirm that our manual derivations correspond to those calculated with *automatic* differentiation. Thirdly and finally, we use PyTorch to visualize gradient descent in action over rounds of training.

Jon Krohn is Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and leading industry conferences, as well as online via O'Reilly, his YouTube channel, and the SuperDataScience podcast.

Комментарии

I couldn't thank you enough for such a awesome videos in free.
It took you a lot of time to create those videos, and I just finished in few days that's uneven...hahaha
But I couldn't stop myself binging to your videos, it's so fascinating the way you have explained in such a beautiful way.
Thanks a ton! ❤

aashishrana

Your course is really amazing. I've found it extremely helpful!

paddygthatsme

So, glad to see you happy Jon

@ Wow, what a video 23:32

justsimple

Excellent explanation, thanks for making things easier to understand.

nisarirshad

Amazing video. I have a question as random forest classifier has many hyperparameters, if we want to optimize it, we will use this same technique of partial derivatives and make rounds until we reach the gradient where gradient of cost w.r.t to hyperparameters is minimum? If this so, than I really grapped an amazing intuition of machine leaarning algorithms working behind the scenes which I could never understood till today. Thank you very much for making such an amazing content.

ahsanzafar

Hey first of thankyou so much for these video. I have a question. I am stuck at a point. You said derivative is a slope of the graph at any point and then you put that point and gets the slope at that point. Here you are calculating the aprtial derivative of C with respect to partial derivative of m and you are getting 36.3050. But you are not putting all the values of X. Id ont get that point how come we are using all the values of x to get the derivative.
Using all the values of X to calculate the partail derivate of c with respect to m doesnt amke sense to me. In the previous video we did this with 1 point we use all the points. we can have millions of x values. I jsut cant grasp cant undersatnd why the entire dataset of x values will be used to calculate the derivative. we are not finding with repoect to x we are using m

itsshehri

why is yhat not considered to be a constant?

adriennekline

More instructors should use Jupyter lab. However the example is too simple.

pnachtwey