Lesson 5: Deep Learning 2019 - Back propagation; Accelerated SGD; Neural net from scratch

Показать описание

In lesson 5 we put all the pieces of training together to understand exactly what is going on when we talk about *back propagation*. We'll use this knowledge to create and train a simple neural network from scratch.

We'll also see how we can look inside the weights of an embedding layer, to find out what our model has learned about our categorical variables. This will let us get some insights into which movies we should probably avoid at all costs...

Although embeddings are most widely known in the context of word embeddings for NLP, they are at least as important for categorical variables in general, such as for tabular data or collaborative filtering. They can even be used with non-neural models with great success.

Рекомендации по теме

Комментарии

Recap (ResNet Network Architecture): 3:35
Fine-Tuning:
Overview: 8:30
Per-Layer Feature Visualization: 11:40
Freezing Early Layers: 12:50
Discriminative Learning Rates: 14:25
Fine Tuning for Collaborative Filtering:
Model Structure:
Affine Functions: 19:44
Overview: 21:40
One-Hot Encoding of IDs, Embedding Vectors: 23:36
Latent Features: 32:40
Use of Bias Term: 33:08
Questions:
"When we load a pre-trained model, should we explore the activation grids to see what it's good at?": 35:57
"Can we have an explanation of what the first argument in fit_one_cycle actually represents?": 36:32
"What is an affine function?" (And, why you need nonlinearities): 37:20
Loading the MovieLens 100k Dataset: 38:29
Tricks for Training (Scaled Sigmoid, LR Finder): 43:20
Interpreting Trained Model:
Biases: 48:00
Weights (With PCA): 54:25
How collab_learner Works: 1:00:00
Interpreting Embeddings (Neokami Paper): 1:07:00
Optimization Improvements:
Weight Decay: 1:12:10
PyTorch Code for Weight Decay On MNIST: 1:24:00
Adam: 1:43:00
Understanding the Tabular Model:
Overview: 2:03:00
Cross-Entropy Loss: 2:04:05
SoftMax Activation: 2:07:20
PyTorch Code for Tabular Model: 2:11:00

ollinboerbohan

25:21, after spending far too much time being a beginner at matrix multiplications I'd like to clarify to someone else who's confused over why this works:

It will only produce an output as seen if the one hot encoded matrix is multiplied to the weight matrix. See it as One-Hot-Matrix (dot) Weight-Matrix.

It only works if the the one hot matrix is to the left of the weight matrix (not as seen in the Excel document, where the one-hot matrix is to the right). A 15x5 (dot) 209x15 matrix multiplication doesn't work (which makes me feel sort of stupid for even trying to figure it out, in hindsight). Only a 209x15 (dot) 15x5 matrix multiplication will give this result due to the non continuity of matrix multiplications.

Lorkin

Discriminative learning rates in fast.ai (16:26) -- writing "slice(1e-5, 1e-3)" means final layers get LR 1e-3, first layers get 1e-5, and middle layers are logarithmically interpolated.

kevalan

Could you possibly add timestamps to your videos in case people want to re-watch a select topic and not have to skip around looking for it?

rodeezy

Best deep learning MOOC I have ever found. Love it
the best lesson i have learned

kaafkgehag

1:14:01 This “burden” of the statisticians maybe responsible for many many smart ppl who kept saying neural networks doesn’t work during AI winter. I think in M. Nielson famous internet book on neural network, he may have quoted a physicist in the 60s saying “give me 5 parameters and I can fit an elephant”, or something to that effect. I also read quite a few book from computational finance community saying NN are ridiculous have millions of params and an overfitting nightmare. I think credits go to those researchers who finally showed us this actually works.

kawingchan

I think the momentum explanation at 1:49:57 is incorrect. In my understanding, momentum is about "remembering" your direction in multiple dimensions, not about increasing the step size in a single dimension.

kevalan

Isn't the derivative at 1:38:35 wrong? Should it not be 3dw^2?

EDIT: Nvm, I was confused that he was using wd to mean weight decay and not two separate variables w and d. Jeremy's answer is correct.

occasionalvideos

In SGD uses learning rate 0.0001, for RMSprop learning rate is 0.002 and for Adam optimizer learning rate is 1. So the results where you compare optimizers you show to us are not so valid (or try the same experiment with same learning rate)

DavidSmith-zgdy

1:11:07 The scatter plot look interesting, there seems to be a linear boundary under which no instance occurred, wander if there’s any significance and explanation. One good question maybe if the real world distance is a straight line distance from A to B, or distance as measured by the road.

kawingchan

1:10:04 About entity embedding visualization, the other popular method is t-sne, I m guessing the authors may have used that.

kawingchan

Because Andrew is at Stanford he has to use Greek letters, ok?

ramahujan

Ok, I have got only one question, why was there smoke outside ?

whateverhonestly

Entity Embeddings of Categorical Variables and possible interpretation: 1:07:09

EtienneCharlier-Biz

Do anyone knows what type of the drawing tool the host is using?

dapingzheng

Why use exponentially weighted average in loss?

DavidSmith-zgdy

1:12:50, Andrew is a stanforder, he has to use greek letter😂

AIPlayerrrr

@Jeremy Howard
, I am not sure your embeddings for days and months makes much sense. If you knew nothing about days or months how would one get to the clear path you mention? It just doesn't seem like tracing it our the way you did makes much sense.

dbzkidkev

Do not we call Embedding the learned input parameters themselves and not one hot encodings?

vladimirgetselevich

Is there a way to learn and then interpret embeddings from image models with distance? Eg how far is corn plant image from potato plant image vs rice plant image

sohaibarif

Lesson 5: Deep Learning 2019 - Back propagation; Accelerated SGD; Neural net from scratch

Lesson 5: Deep Learning 2019 - Back propagation; Accelerated SGD; Neural net from scratch

Lesson 5: Practical Deep Learning for Coders 2022

Lesson 5: Deep Learning 2018

1 -5 Lesson 5 Deep Learning 2019 Back propagation; Accelerated SGD; Neural net from scratch

TWiML x Fast.ai Deep Learning Part 1 Review Study Group Winter 2019 - Lesson 5

4 - 5 Lesson 5 Deep Learning 2019 Back propagation; Accelerated SGD; Neural net from scratch

2 - 5 Lesson 5 Deep Learning 2019 Back propagation; Accelerated SGD; Neural net from scratch

TWiML x Fast.ai Deep Learning Part 2 Study Group - Lesson 5

05: Lesson 5: Backprop; Neural Nets from scratch | fast.ai 2019 & Things Jeremy Howard says to d...

3- 5 Lesson 5 Deep Learning 2019 Back propagation; Accelerated SGD; Neural net from scratch

Deep Learning | What is Deep Learning? | Deep Learning Tutorial For Beginners | 2023 | Simplilearn

Intro to Machine Learning: Lesson 5

Lesson 5 - Design and Implementation of Simple Auto-encoder for Deep Learning

Lesson 6: Deep Learning 2019 - Regularization; Convolutions; Data ethics

Lesson 2: Deep Learning 2019 - Data cleaning and production; SGD from scratch

Python Machine Learning Tutorial #5 - KNN p.1 - Irregular Data

Lesson 5: Intro to Deep Learning: Solving MNIST Part 5: Training, Testing and Conclusion

Lesson 5 Project Font Matching using Machine Learning

Lesson 8 (2019) - Deep Learning from the Foundations

Data Science - Deep Learning tutorial for beginners using FastAI 2020 [Urdu ] - Lesson 5

DSNet FastAi Study Group - Lesson 5 Discussions

Lesson 2: Deep Learning 2018

Lesson 7: Deep Learning 2

Practical Deep Learning for Coders - Lesson 5 - Part 2