Attention Is All You Need

Показать описание

Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Authors:
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Рекомендации по теме

Комментарии

Friendship ended with LSTM, transformer is now my best friend.

finlayl

Nobody knew this paper would change the world

tanmayjain

I've watched this maybe 5 times over 1 year, each time getting more and more from it. I think I finally intuitively understand how this works. Thanks for your work and your time!

RobotProctor

I was searching for a channel like "Two minute papers" but not two mins in length and goes in depth. I think I found it!

Subbed!

herp_derpingson

Finally, someone is drawing vectors to describe what is meant by encoding with vectors, and how the vectors relate to one another. So many talk about this, but barely understand the details.

TimKaseyMythHealer

Really good explanation. You know how to provide the essence without getting lost into details. Details might be important later but the most important thing at first is the very main nature of the strategy and you provided it crystal clear. Thanks!!!

dariodemattiesreyes

The explanation of querying a key-value pair is really nice

kema

By far the best explanation about the paper "Attention Is All You Need". well explained. Thanks Yannic Kilcher

jugsma

You have done an excellent job in explaining attention method in simple words. Thanks so much!

vijeta

Very well done! I agree with the other comments that this is the clearest explanation I have seen so far. Thanks for the great work!

shandou

Excellent video, thank you so much for illustrating these concepts so clearly.

chandlerclement

Thank you so much Yannic Kilcher, the paper seemed complex but you "encoded", performed "multi-head attention" and "decoded" it in such a simple way (: An amazing job! Undoubtedly the best explanation

akhilvenkataraju

Thank you very much! This has helped me a lot. All I could find on this specific paper was confusing and hard to understand, I think it was explained extremely well in your video! Please make more of these, I think you might help lots of people :D

deathslnce

Excellent explanation of Transformers. Clear, easy to follow, and great information. Thanks!

BrettHannigan

I just got a clear understanding of how the positional encoder works here. Kudos to you. Great Explanation!

mdnayemuddin

Great video and very unique amongst most machine learning videos on youtube.
Thank you!

tassoskat

an amazing explanation. truly amazing. I cant say how much I appreciate you putting dot product and soft max into intuitive and easy to understand words. very grateful

YtongT

It's amazing to have this explanation of the paper that is responsible for all of the AI interest and innovation happening now--- described as 'interesting' shortly after it came out. I love it.

languagemodeler

VERY helpful, thanks! I'd love to see a "part 2" ...

Julian-tfnj

you have such a cool state of mind ... really adds to making your teaching style more interesting

fahds

Attention Is All You Need

Attention Is All You Need

Attention mechanism: Overview

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Illustrated Guide to Transformers Neural Network: A step by step explanation

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention Is All You Need - Paper Explained

Attention is all you need explained

Attention for Neural Networks, Clearly Explained!!!

Transformers, explained: Understand the model behind GPT, BERT, and T5

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Live -Transformers Indepth Architecture Understanding- Attention Is All You Need

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

What are Transformers (Machine Learning Model)?

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

AI Language Models & Transformers - Computerphile

Attention Mechanism In a nutshell

Pytorch Transformers from Scratch (Attention is all you need)

The math behind Attention: Keys, Queries, and Values matrices

Transformers for beginners | What are they and how do they work

CS480/680 Lecture 19: Attention and Transformer Networks

Transformer (Attention is all you need)

How do transformers work? (Attention is all you need)