Transformers for beginners | What are they and how do they work

preview_player
Показать описание
Over the past five years, Transformers, a neural network architecture, have completely transformed state-of-the-art natural language processing.

*************************************************************************
*************************************************************************

The encoder takes the input sentence and converts it into a series of numbers called vectors, which represent the meaning of the words. These vectors are then passed to the decoder, which generates the translated sentence.

Now, the magic of the transformer network lies in how it handles attention. Instead of looking at each word one by one, it considers the entire sentence at once. It calculates a similarity score between each word in the input sentence and every other word, giving higher scores to the words that are more important for translation.
To do this, the transformer network uses a mechanism called self-attention. Self-attention allows the model to weigh the importance of each word in the sentence based on its relevance to other words. By doing this, the model can focus more on the important parts of the sentence and less on the irrelevant ones.
In addition to self-attention, transformer networks also use something called positional encoding. Since the model treats words as individual entities, it doesn't have any inherent understanding of word order. Positional encoding helps the model to understand the sequence of words in a sentence by adding information about their position.
Once the encoder has calculated the attention scores and combined them with positional encoding, the resulting vectors are passed to the decoder. The decoder uses a similar attention mechanism to generate the translated sentence, one word at a time.

Transformers are the model behind GPT, BERT, and T5

#transformers #naturallanguageprocessing #nlp
Рекомендации по теме
Комментарии
Автор

This is the only video around that REALLY EXPLAINS the transformer! I immensely appreciate your step by step approach and the use of the example. Thank you so much 🙏🙏🙏

lyeln
Автор

Its great. I have only one query as whats the input of the masked multi-head attention as its not clear to me kindly guide me about it?

kxnmvws
Автор

I had watched 3 or 4 videos about transformers before this tutorial. Finally, this tutorial made me understand the concept of transformers. Thanks for your complete and clear explanations and your illustrative example. Specially, your description about query, key and value was really helpful.

MrPioneer
Автор

Hello and Thank you so much. 1 question: I don't realize where the numbers in word embedding and positional encoding come from?

mvbovfv
Автор

Very nice high level description of Transformer

mdfarhadhussain
Автор

Accidentally I came across this video, very well explained. You are doing an excellent job .

PallaviPadav
Автор

Great explanation! Keep uploading such nice informative content.

aditichawla
Автор

Very well explained ! I can instantly grab the concept ! Thank you Miss !

exoticcoder
Автор

Well explained. before watching this video i was very confused in understanding how transformers works but your video helped me alot

VishalSingh-wtyj
Автор

Can you please let us know I/p for mask multi head attention. You just said decoder. Can you please explain. Thanks

whvyolw
Автор

Very well explained, even with such niche viewer base, keep making more of these please

harshilldaggupati
Автор

thank you very much for explaining and breaking it down 😀 comparatively so far, your explanation is easy to understand compared to other channels thank you very much for making this video and sharing to everyone❤

satishbabu
Автор

Hello Ma’am
Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍

soravsingla
Автор

This is a fantastic, Very Good explanation.
Thank you so much for good explanation

servatechtips
Автор

Nice explanation to such complex topic

ykakde
Автор

Can you also talk about the purpose of the 'feed forward' layer. looks like its only there to add non-linearity. is that right?

_seeker
Автор

best explanation i saw multiple video but this provide the clear concept keep it up

imranzahoor
Автор

Wow.. you are amazing. Thank you for the clear explanation

oocoxbu
Автор

Best video ever explaining the concepts in really lucid way maam, thanks a lot, pls keep posting, i subscribed 😊🎉

dljqync
Автор

Didn't understand what is the input to the masked multi head self attention layer in the decoder, Can you please explain me?

nikhilrao