NN - 16 - L2 Regularization / Weight Decay (Theory + @PyTorch code)

preview_player
Показать описание
In this video we will look into the L2 regularization, also known as weight decay, understand how it works, the intuition behind it, and see it in action with some pytorch code.

Become a member and get full access to this online course:

*** 🎉 Special YouTube 60% Discount on Yearly Plan – valid for the 1st 100 subscribers; Voucher code: First100 🎉 ***

"NN with Python" Course Outline:
*Intro*
* Administration
* Intro - Long
* Notebook - Intro to Python
* Notebook - Intro to PyTorch
*Comparison to other methods*
* Linear Regression vs. Neural Network
* Logistic Regression vs. Neural Network
* GLM vs. Neural Network
*Expressivity / Capacity*
* Hidden Layers: 0 vs. 1 vs. 2+
*Training*
* Backpropagation - Part 1
* Backpropagation - Part 2
* Implement a NN in NumPy
* Notebook - Implementation redo: Classes instead of Functions (NumPy)
* Classification - Softmax and Cross Entropy - Theory
* Classification - Softmax and Cross Entropy - Derivatives
* Notebook - Implementing Classification (NumPy)
*Autodiff*
* Automatic Differentiation
* Forward vs. Reverse mode
*Symmetries in Weight Space*
* Tanh & Permutation Symmetries
* Notebook - Tanh, Permutation, ReLU symmetries
*Generalization*
* Generalization and the Bias-Variance Trade-Off
* Generalization Code
* L2 Regularization / Weight Decay
* DropOut regularization
* Notebook - DropOut (PyTorch)
* Notebook - DropOut (NumPy)
* Notebook - Early Stopping
*Improved Training*
* Weight Initialization - Part 1: What NOT to do
* Notebook - Weight Initialization 1
* Weight Initialization - Part 2: What to do
* Notebook - Weight Initialization 2
* Notebook - TensorBoard
* Learning Rate Decay
* Notebook - Input Normalization
* Batch Normalization - Part 1: Theory
* Batch Normalization - Part 2: Derivatives
* Notebook - BatchNorm (PyTorch)
* Notebook - BatchNorm (NumPy)
*Activation Functions*
* Classical Activations
* ReLU Variants
*Optimizers*
* SGD Variants: Momentum, NAG, AdaGrad, RMSprop, AdaDelta, Adam, AdaMax, Nadam - Part 1: Theory
* SGD Variants: Momentum, NAG, AdaGrad, RMSprop, AdaDelta, Adam, AdaMax, Nadam - Part 2: Code
*Auto Encoders*
* Variational Auto Encoders

~~~~~ SUPPORT ~~~~~
~~~~~~~~~~~~~~~~~

Intro/Outro Music: Dreamer - by Johny Grimes
Рекомендации по теме
Комментарии
Автор

Great explanation! Very helpful for intution.

ericdu
Автор

Great content, you deserve much more views!

leevroko
Автор

Great information. I have doubt i want to understand what happens if I introduce l1 and l2 regularization in parallel training of model? Does it make any difference? As I am working on pipeline parallel for autoencoder and i want to add these l1 and l2 to parallel

Emily-peq
Автор

Great. It looks PyTorch does not provide build in l1 l2 norm as keras does.

caiyu
join shbcf.ru