RoBERTa: A Robustly Optimized BERT Pretraining Approach

preview_player
Показать описание
This paper shows that the original BERT model, if trained correctly, can outperform all of the improvements that have been proposed lately, raising questions about the necessity and reasoning behind these.

Abstract:
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Рекомендации по теме
Комментарии
Автор

I'm doing my masters in ML and your videos have the perfect level of detail for me. Watched a few of them now and they always get the core of a paper in detail without taking more time than necessary. Thanks a lot for your efforts!

ursinbrunner
Автор

These videos are a real service; thank you!

michaelcarlon
Автор

I'm preparing for a PhD in machine learning and these videos are indispensable. Thank you!

danielmichelin
Автор

Man, you just help me very much.
As a sophomore computer science student, I learn some tips when reading paper, thank you.

zysftvf
Автор

I didn't understand the paper clearly after reading it at first, but your video explains things in a simple and understandable manner. Thank you!

laurynasgrusas
Автор

Thank you! I am also working on a masters in ML and this is incredibly helpful.

hihiendru
Автор

At 14:47 how can you highlight the number "256" and say out loud "five hundred and twelve" ? That's some weird stuff going on there! It's almost like for computer science folk, the powers of two have become their own separate mental entries, beyond the standard numerical quantities, where one can prime another and lead to slips like this. But perfect summary video (like your usual standard). We can always rely on you to give us the main points.

Murphyalex
Автор

For those NLP partitioner, it's very helpful.

thepresistence
Автор

Your videos are great! Do you think you can do one about ALBERT?

PotatoKaboom
Автор

In Full sentences and Doc sentences variant, they only use MLM loss is it ?

soumyasarkar