Attention Is All You Need

preview_player
Показать описание

Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Authors:
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Рекомендации по теме
Комментарии
Автор

Friendship ended with LSTM, transformer is now my best friend.

finlayl
Автор

Nobody knew this paper would change the world

tanmayjain
Автор

I've watched this maybe 5 times over 1 year, each time getting more and more from it. I think I finally intuitively understand how this works. Thanks for your work and your time!

RobotProctor
Автор

I was searching for a channel like "Two minute papers" but not two mins in length and goes in depth. I think I found it!

Subbed!

herp_derpingson
Автор

Finally, someone is drawing vectors to describe what is meant by encoding with vectors, and how the vectors relate to one another. So many talk about this, but barely understand the details.

TimKaseyMythHealer
Автор

Really good explanation. You know how to provide the essence without getting lost into details. Details might be important later but the most important thing at first is the very main nature of the strategy and you provided it crystal clear. Thanks!!!

dariodemattiesreyes
Автор

The explanation of querying a key-value pair is really nice

kema
Автор

By far the best explanation about the paper "Attention Is All You Need". well explained. Thanks Yannic Kilcher

jugsma
Автор

You have done an excellent job in explaining attention method in simple words. Thanks so much!

vijeta
Автор

Very well done! I agree with the other comments that this is the clearest explanation I have seen so far. Thanks for the great work!

shandou
Автор

Excellent video, thank you so much for illustrating these concepts so clearly.

chandlerclement
Автор

Thank you so much Yannic Kilcher, the paper seemed complex but you "encoded", performed "multi-head attention" and "decoded" it in such a simple way (: An amazing job! Undoubtedly the best explanation

akhilvenkataraju
Автор

Thank you very much! This has helped me a lot. All I could find on this specific paper was confusing and hard to understand, I think it was explained extremely well in your video! Please make more of these, I think you might help lots of people :D

deathslnce
Автор

Excellent explanation of Transformers. Clear, easy to follow, and great information. Thanks!

BrettHannigan
Автор

I just got a clear understanding of how the positional encoder works here. Kudos to you. Great Explanation!

mdnayemuddin
Автор

Great video and very unique amongst most machine learning videos on youtube.
Thank you!

tassoskat
Автор

an amazing explanation. truly amazing. I cant say how much I appreciate you putting dot product and soft max into intuitive and easy to understand words. very grateful

YtongT
Автор

It's amazing to have this explanation of the paper that is responsible for all of the AI interest and innovation happening now--- described as 'interesting' shortly after it came out. I love it.

languagemodeler
Автор

VERY helpful, thanks! I'd love to see a "part 2" ...

Julian-tfnj
Автор

you have such a cool state of mind ... really adds to making your teaching style more interesting

fahds