Lecture 9 - Normalization and Regularization

preview_player
Показать описание
This lecture gives an overview of normalization layers in deep networks (such as LayerNorm and BatchNorm). It also discusses methods for regularizing networks, include L2 regularization and Dropout. Finally, we cover some challenges with the interaction of optimization, initialization, normalization, and regularization.

Contents:
00:00:00 - Introduction
00:01:07 - Intialization vs. optimization
00:12:22 - Normalization
00:13:51 - Layer normalization
00:19:12 - LayerNorm illustration
00:23:05 - Batch normalization
00:27:29 - Minibatch dependence
00:34:03 - Regularization of deep networks
00:36:34 - Regularization
00:40:27 - L2 regularization a.k.a. weight decay
00:53:27 - Caveats of L2 regularization
00:55:38 - Dropout
00:58:55 - Dropout as stochastic approximation
01:04:36 - Many solution ... many more questions
01:06:44 - BatchNorm: An illustrative example
01:12:36 - BatchNorm: Other benefits?
01:15:46 - The ultimate takeaway message
Рекомендации по теме
Комментарии
Автор

29:01 I especially like how a concept/technique is presented on not just what it is, but also what the implications are if we apply it. This really helps build intuitions that are hard to form

justinchu
Автор

BN, just modify the real value of the paras no the relations between different samples. I don't think this cause dependency between different samples. While, I think this relation is relatively stable when using the rational batch size and the value of the similar scale can be augmented.

艾曦-eg
visit shbcf.ru