Why Does Batch Norm Work? (C2W3L06)

Показать описание

DeepLearningAI

Рекомендации по теме

Комментарии

When the previous layer is covered, all things are clear. Brilliant explanation. Batch normalization works similarly the way input standardization works.

maplex

like this guy - has calm voice / patience

holgip

Wow this is the best explanation I've seen so far! I really like Andrew Ng, he has an amazing talent for explaining even the most complicated things in a simple way and when he has to use mathematics to explain some concepts he does it in such a brilliant way that they become even simpler to understand and not more complicated as with some tutors

digitalghosts

Great work, you have the natural talent to make difficult topics easily learnable

aamira

Beautifully explained, classic Andrew Ng

AnuragHalderEcon

This guy makes it look so easy... one has to love him

randomforrest

The "covariant shift" explanation has been falsified as an explanation for why BatchNorm works. If you are interested check out the paper "How does batch normalization help optimization?"

siarez

when X changes (despite f(x) = y remaining the same, you can't expect the same model to perform well (eg: X1 is pics of black cats only. y=1. else for non-cats y=0. if X2 is pics of all colored cats, model won't do too well). This is co-variant shift.
this co-variant shift is tackled during training through input standardization and batch normalization
batch normalization ensures that the mean and variance of the distribution of the values of hyperparameters in the previous layer remains the same. Doesn't allow these hyperparameters' values to shift much.
it doesn't allow values to change too much, thus reducing the coupling between hyperparameters of different layers and increases independence and hence, increase speed of learning

epistemophilicmetalhead

"don't use it for regularization" - just use it all the time for general good practice, or are there times when I shouldn't use it?

bgenchel

The original paper where the batch normalization technique was introduced (by Sergey Ioffe, Christian Szegedy) says that removing dropout speeds up training, without increasing overfitting and there also recommendations not to use drop out together with batch normalization since it adds noise to stats calculations (mean and variance)...so should we really use DO with BN?

ping_gunter

Since gamma and beta are parameters will be updated, how can mean and variance remain unchanged?

yuchenzhao

Thanks for sharing the great video, explained in simple and good manner.

MuhammadIrshadAli

Is it always have batch normalization in neural network?

YuCai-vk

good to understand but still more nmerical calculations, will show effect

NeerajGarg

6:00, I have a question, do the values of beta[2] and gamma[2] not change as well during training? So the distribution of hidden unit values z[2] also keeps changing. Then the covariate shift problem is still there.

haoming

7:55 why don't we use the mean and variance of the entire trg set instead of just those of a mini-batch? Wouldn't this reduce noise further (similar to using larger mini-batch size)? Unless we want those noise to seek out regularizing effect?

s

Keras people needs to watch this video!

XX-vujo

don't the activation function such as sigmoid in each node already normalize the outputs from neurons for the most part ?

pemfiri

I'm confused. Is that normalizing all neurons within each layer, or normalizing all activations computed from a mini-batch of one neuron ?

anynamecanbeuse

You said that Batch Norm limits the change in values of the 3rd layer ( or more generally, any deeper layer) due to parameters of earlier layers, however, when you are performing Gradient Descent, the values of the new parameters ( parameters due to Batch Norm gamma and beta ), are also being learnt, and are changing with the help of learning rate and henceforth, the mean and variance of earlier layers are changing and are not limited to 0 and 1 respectively ( or more generally whatever you set it to ), so, I am not able to intuite this fixing of mean and variance of the parameters of earlier layer to prevent covariate shift. Can anyone help me out with this?

banipreetsinghraheja

Why Does Batch Norm Work? (C2W3L06)

Batch Normalization (“batch norm”) explained

Why Does Batch Norm Work? (C2W3L06)

Batch normalization | What it is and how to implement it

Batch Normalization - EXPLAINED!

Why Batch Normalization (batchnorm) Works

Why Batch Norm Works #machinelearning #deeplearning #neuralnetworks #datascience

How Does Batch Normalization Work

How Batch Normalization works to solve Internal Covariate Shift

Coding the entire LLM Transformer Block

L11.2 How BatchNorm Works

Normalizing Activations in a Network (C2W3L04)

How does Batch Normalization Help Optimization?

What is Layer Normalization? | Deep Learning Fundamentals

L11.4 Why BatchNorm Works

How does Batch Normalization Help Optimization? (NeurIPS 2018)

Batch Normalization | How does it work, how to implement it (with code)

What is Layer Normalization?

Batch normalization

What is Batch Normalization?

Need of Batch Normalization || Lesson 18 || Deep Learning || Learning Monkey ||

How does Batch Normalization really works? [Lecture 6.3]

Batch Normalization - Part 1: Why BN, Internal Covariate Shift, BN Intro

All About Normalizations! - Batch, Layer, Instance and Group Norm

Layer Normalization by hand