Lesson 13: Deep Learning Foundations to Stable Diffusion

preview_player
Показать описание

We also discuss the importance of the chain rule in calculating the gradient of the mean squared error (MSE) applied to a model, and demonstrate how to use PyTorch to calculate derivatives and simplify the process by creating classes for ReLU and linear functions. We then explore the issues with floating point math and introduce the log sum exp trick to overcome these issues. Finally, we create a training loop for a simple neural network.

0:00 - Introduction
2:54 - Linear models & rectified lines (ReLU) diagram
10:15 - Multi Layer Perceptron (MLP) from scratch
18:15 - Loss function from scratch - Mean Squared Error (MSE)
23:14 - Gradients and backpropagation diagram
31:30 - Matrix calculus resources
33:27 - Gradients and backpropagation code
38:15 - Chain rule visualized + how it applies
49:08 - Using Python’s built in debugger
1:00:47 - Refactoring the code

Рекомендации по теме
Комментарии
Автор

Great, very enlightening, liked the small details also, thank you!

michaelmuller
Автор

That e^a trick shows that, even though algebra is such a pain, it comes in handy so often to make things move smooth. Reminds me of that trick to avoid overflow in binary search: mid = low + ((high - low) / 2).

Favorite thing about these lectures are the small hints for math and Python along the way. Thanks for being so detail oriented!

mattst.hilaire
Автор

When we compare result of the softmax with the one-hot vector (at 1:21:00), we take only the value of the softmax where one-hot vector is one. Isn't this a missed opportunity to incorporate other "wrong" predictions into the loss function? E.g. if the model is highly confident in making the prediction for some other wrong class (eg. numbers that look similar) then getting more penalised for this could further speed up the training?

markozege