Lesson 7: Practical Deep Learning for Coders 2022

preview_player
Показать описание
00:00 - Tweaking first and last layers
02:47 - What are the benefits of using larger models
05:58 - Understanding GPU memory usage
08:04 - What is GradientAccumulation?
20:52 - How to run all the models with specifications
22:55 - Ensembling
37:51 - Multi-target models
41:24 - What does `F.cross_entropy` do
45:43 - When do you use softmax and when not to?
46:15 - Cross_entropy loss
49:53 - How to calculate binary-cross-entropy
52:19 - Two versions of cross-entropy in pytorch
54:24 - How to create a learner for prediction two targets
1:02:00 - Collaborative filtering deep dive
1:08:55 - What are latent factors?
1:11:28 - Dot product model
1:18:37 - What is embedding
1:22:18 - How do you choose the number of latent factors
1:27:13 - How to build a collaborative filtering model from scratch
1:29:57 - How to understand the `forward` function
1:32:47 - Adding a bias term
1:34:29 - Model interpretation
1:39:06 - What is weight decay and How does it help
1:43:47 - What is regularization

Рекомендации по теме
Комментарии
Автор

You are amazing as always! We all have such a gift and blessed to have you teaching these classes. I am truly amazed with your level of commitment to the society

sunderrajan
Автор

This course is truly priceless, much more deep and didactic than a lot of paid courses out there 🤩 thanks Jeremy

yoverale
Автор

Jeremy my man, you are truly one hell of a human being. I wish you the best

tumadrep
Автор

I love how Jeremy explains techniques like gradient accumulation. He makes it seem so obvious and powerful that it's hard to forget them. Never again I'll think big models are out of scope for my experiments! :D

maraoz
Автор

"At this point if you've heard about embeddings before you might be thinking: that can't be it. And yeah, it's just as complex as the rectified linear unit which turned out to be: replace negatives with zeros. Embedding actually means: “look something up in an array”. So there's a lot of things that we use, as deep learning practitioners, to try to make you as intimidated as possible so that you don't wander into our territory and start winning our Kaggle competitions." 🤣

merelogics
Автор

I loved the collaborative filtering stuff and your explanation of embeddings!

pranavdeshpande
Автор

Accumulated gradients is a nice trick, however for sufficiently large datasets and run times your memory bandwidth latency will increase by the same multiple you accumulate

tljstewart
Автор

Hello where can I find the notebook for this? I found Road to the top Part1, Part two but can't find Part 3 anywhere.

toromanow
Автор

Jeremy - In the deep learning implementation of collaborative filtering the input is concatenated embedding of user and items, however my understanding is that the model is not learning the embedding matrix here, instead it's learning the weights (176 * 100) in the first layer and (100 * 1) in the second layer. Am I missing something? Appreciate your inputs

vinodjoshi
Автор

I understand the advantage of gradient accumulation in terms of being able to run your training on smaller GPUs by "imitating" a larger batch size when calculating the gradients, but wouldn't a major drawback of the gradient accumulation an increase in training time and ultimately in energy use? i.e. isn't your training going to run half as slow when accum is set to 2? And the more you increase the accum number the slower the training gets because your actual batch sizes are getting smaller and smaller?

matthewrice