What is Layer Normalization? | Deep Learning Fundamentals

preview_player
Показать описание
You might have heard about Batch Normalization before. It is a great way to make your networks faster and better but there are some shortcomings of Batch Norm. That's why researchers have come up with an improvement over Batch Norm called Layer Normalization.

In this video, we learn how Layer Normalization works, how it compares to Batch Normalization and for what cases it works best.

👇 Get your free AssemblyAI token here

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#MachineLearning #DeepLearning
Рекомендации по теме
Комментарии
Автор

hands down best and fastest explanation on youtube

solidsnake
Автор

for some reason, I have always had doubts about whether I truly understand this concept. But after watching your video, I can confidently say I fully understand. Thank you for your efforts!

samtj
Автор

At 02:30, you said in Normalization, you calculate average and the mean for each neuron. I suppose you meant average and SD there.

AshishKumarSingh-id
Автор

you are confusing the definitions. in the 2:15 minute you claim a batch with 3 data points is coming to the layer and then instead of having 1mean and var for the entire batch, you calc 3 mean and variance for each neuron, which does not look right. Please revisit your video.

maryamaghili
Автор

really excellent explainer -- thanks for making this!

suicidaldonut
Автор

Commenting here again, great series of videos. I know all the concepts, but still going through them again, just to make sure that perhaps I could get a 2nd perspective on the topics. And indeed I am able to view things differently. Really good way of explaining!

ujjalkrdutta
Автор

This is the best explaination I've ever seen!!!! Thanks

nullpointerx
Автор

She is my dreamed girl. the content is super concise and sweet, thanks!

josemuarnapoleon
Автор

Layer Normalization requires a number for normalized_shape, can you please advise what would be a good number for this? Is this the same number as number of layers?

FazeLyndon
Автор

QQ: does layer norm requires features having a similar scale? else it's normalized with a different scale and the feature with a small value is likely to get biased results?

nullpointerx
Автор

And another question, if we can guarantee the sequences with the same length, then BN should work?

nullpointerx
Автор

Thank you very much! This was a great explanation! By any chance does anyone have an idea about why the Transformer architecture uses layer normalization? Because, in the video you mention that the layer normalization works better with RNN. The Transformer model does not use any recurrence, however, they still use layer normalization..

mesutt.
Автор

Is it normalization or standardization? You mention mean and standard deviation which is not normalization.

themightyquinn
Автор

For some reason I don't understand your calculation in the batch normalization aspect

adekoyasamuel
Автор

Thank you, be cous of your model i was traind the latvian languige asr

dainispolis
Автор

Simple and best explanation! By the way, you are very beautiful!

fakhriddintojiboev
Автор

I think the working of LN is well explained and a comparison with BN is also well presented. But I didn't understand why LN is better of sequences.

atchutram
Автор

Please correct me if I am wrong.

I think LayerNormalization has a flaw!

consider two features: House area and number of rooms.

House 1: Area = 1 m², Rooms = 1. After normalization, this would be `[0, 0]`.
House 2: Area = 100 m², Rooms = 100. This would also result in `[0, 0]` after normalization.

How can two distinct sets of features become the same after normalization?

temanangka
Автор

if you have really small batch number, if you have really small batch number,

endlesswu