Lesson 11 (2019) - Data Block API, and generic optimizer

preview_player
Показать описание

We start lesson 11 with a brief look at a smart and simple initialization technique called Layer-wise Sequential Unit Variance (LSUV). We implement it from scratch, and then use the methods introduced in the previous lesson to investigate the impact of this technique on our model training. It looks pretty good!

Then we look at one of the jewels of fastai: the Data Block API. We already saw how to use this API in part 1 of the course; but now we learn how to create it from scratch, and in the process we also will learn a lot about how to better use it and customize it. We'll look closely at each step:

- Transformations: we create a simple but powerful `list` and function composition to transform data on-the-fly
- Split and label: we create flexible functions for each
- DataBunch: we'll see that `DataBunch` is a very simple container for our `DataLoader`s

Next up, we build a new `StatefulOptimizer` class, and show that nearly all optimizers used in modern deep learning training are just special cases of this one class. We use it to add weight decay, momentum, Adam, and LAMB optimizers, and take a look a detailed look at how momentum changes training.

Finally, we look at data augmentation, and benchmark various data augmentation techniques. We develop a new GPU-based data augmentation approach which we find speeds things up quite dramatically, and allows us to then add more sophisticated warp-based transformations.
Рекомендации по теме
Комментарии
Автор

1:40:09 why is the debias term not mom**(step+1)?

aswahd
Автор

58:40
If I have a problem where I only have one number as input and for predicting the target it is useful to have such information as "is the input greater than or less than 'a', greater than or less than 'b'?" than I have a problem where the input dimension is 1 and the output filters I have is 2.
If so, then your argument for having less number of output channels than the number of inputs is incorrect. Am I missing something?

jonatani
Автор

Hi Jeremy, at 8:56 in lsuv_model(), why didn't you scale the variance first, then shift the mean later ( swap the 2 while loops) to exactly get the zero mean and std = 1?

loctruong