Weight Standardization (Paper Explained)

Показать описание

It's common for neural networks to include data normalization such as BatchNorm or GroupNorm. This paper extends the normalization to also include the weights of the network. This surprisingly simple change leads to a boost in performance and - combined with GroupNorm - new state-of-the-art results.

Abstract:
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: this https URL.

Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

Links:

Рекомендации по теме

Комментарии

1:00 Main results
2:00 Why batch norm is suboptimal
5:20 Weight Standardization Method
12:55 Backpropagation through WS
16:05 Theory
16:30 Ablations
18:10 Conclusion

YannicKilcher

Woah gosh you're churning out videos at such a furious rate ... it's difficult to keep up... nonetheless I hope to emulate you in terms of such consistency....keep it up 🙏🙏

arnavdas

Pretty simple idea hidden in math piles. Thank you for the explanation!

Carbon-XII

At 1:43 Mask R-CNN is not recurrent, R stands for region (region based CNN)

HoriaCristescu

Hi Yannick,

I was thinking about what you told at 12:10. The weights become large, which will introduce instability or variance. Then, this method should help achieve convergence faster. Because, if we keep training the model, the weights will eventually come to a stable region.

Just a point, let me know what you think.

RohitKumarSingh

considered leaving a comment, nice video!

CristianGarcia

I'm slightly triggered because this type of technical neural network research is only addresses CNN as if it was the only NN architecture. Feedforward NN and RNN need love too.

Chrnalis

great videos. thanks for your efforts.

nikre

Great video Yannic. Recentering and rescaling weights reduces the otherwise high weights (overfitted weights). This acts as regularization and should improve performance when applied without group normalization. Thoughts?

MuditBachhawatIn

did you implement a faster paper processor net in your biological neural net?

impolitevegan

Thank you for the interesting video! I am curious about your experience with weight standardization. In the video you said that you would give it a try and you think it will bring something. Since you posted this video some time ago, I would like to know your feedback on the gain, if there was any.

azinjahedi

What happened to L2 weight regularization? Nobody uses it anymore. This looks like an evolved version of L2 regularization.

Great paper though. I will definitely use it in my projects.

herp_derpingson

Great video, thank you. However, I have a question, if the weights need to have zero mean, isn't it easier to initialized them to have zero mean and the forcing the gradient to have zero mean too. We get to keep all of the current workflow except for a tweak in initializer and optimizer, no?

iejtstr

Tiny inconsequential correction: The R in R-CNN is Regression not Recurrent. Edit: I'm wrong, see comments. d'oh

JackofSome

Weight Standardization (Paper Explained)

Weight Standardization (Paper Explained)

Batch normalization | What it is and how to implement it

FB224 internal weight standardization

what 'FAT' means in kpop #shorts

Plastic surgery transformation in Korea #idhospital #shorts

Feature scaling, Normalization, Standardization

Normalization and Standardization | How and when to use? Differences and similarities

A satisfying chemical reaction

Army Girls height measurement।😱#indianarmy #defencelover #armylover #defencelife #trending #shorts...

BEST DEFENCE ACADEMY IN DEHRADUN | NDA FOUNDATION COURSE AFTER 10TH | NDA COACHING #shorts #nda #ssb

Comment yes for more body language videos! #selfhelp #personaldevelopment #selfimprovement

Metric Measurements | Learn Maths | Graze Education

Japanese Method #shorts #fyp

This is SO cool!

China’s “Fish in Collarbone” Challenge 😯 #beauty #china #chineseculture #chinesewithmia #trends...

Every Man Should Be Able To Pass A Military PT Test

xavier memes #memes

Which Beauty Face Type are you? 😳 #shorts #kbeauty #koreanbeauty #douyin

Titration Method | Step-By-Step #experiment #chemistry

IMPROVE YOUR POSTURE AND BOOST YOUR HEALTH | SHIVANGI DESAI

How To Make equal size row in excel

Crash cycle stand 😭😔 #bike #mtblife #mtb #stuntmeet #cycle #cyclestunt #cycling #stunt

Teacher vs Student drawing challenge #drawing #art #6

India vs japan || mathematics challenge || 😅🤣🤣🤭