Blowing up Transformer Decoder architecture

preview_player
Показать описание
ABOUT ME

RESOURCES

PLAYLISTS FROM MY CHANNEL

MATH COURSES (7 day free trial)

OTHER RELATED COURSES (7 day free trial)

TIMESTAMP
0:00 Introduction
2:00 What is the Encoder doing?
3:30 Text Processing
5:05 Why are we batching data?
6:03 Position Encoding
6:34 Query, Key and Value Tensors
7:57 Masked Multi Head Self Attention
15:30 Residual Connections
17:47 Multi Head Cross Attention
21:25 Finishing up the Decoder Layer
22:17 Training the Transformer
24:33 Inference for the Transformer
Рекомендации по теме
Комментарии
Автор

I've been closely following the Transformer playlist, which has greatly helped in my comprehension of the Transformer Architecture. Your excellent work is evident, and I can truly appreciate the dedication you've shown in simplifying complex concepts. Your approach of deconstructing intricate ideas into manageable steps is truly praiseworthy. I also find it highly valuable how you begin each video with an overview of the entire architecture and contextualize the current steps within it. Your efforts are genuinely commendable, and I'm sincerely grateful for your contributions. Thank you.

ahmadfaraz
Автор

mind BLOWING..lucky enough to find your lectures

SarvaniChinthapalli
Автор

Your drawing skill is actually amazing!

JoeChang
Автор

Man you're a pure treasure! Keep up this outstanding work! 🙏🏼

galileo
Автор

Best drawing to explain this concept 👏🏼👏🏼👏🏼

MapumbaPaulus
Автор

You are really great at articulation, Thank you😇

MaheshKumar-bnq
Автор

truly amazing video, I have read the original paper but this video definitely helped me to understand it better, especially the way that you visualize the whole architecture.

limbenny
Автор

Can you explain in other video, examples of vectors of Q K V ? is still confusing for me what they represent.

jonfe
Автор

Great video !!! Clear explanation about dimensions and the whole process.

lathashreeh
Автор

Will you make a video on transformers using vision transformer + transfotmer decoder for image captioning?

tiffanyk
Автор

Thank you! Your video makes me know a lot

任晶-lo
Автор

Illustrating your explanations with code actually provides much deeper insights. Thanks, man! Quick note on this video: I was wondering why you haven't included the "output embeddings" in your sketch of the decoder?

nicolasdr
Автор

This is Awesome!!!!
thank you so much for the

lakshman
Автор

7:00 I feel as though the implementations that just repeat the Q K V matrices are making a mistake, mostly because the purpose of multihead attention is to learn different attentions right? In the attention blocks the linear layers / learnable parameters are at the beginning for each Q K V, then one big one after the heads are concatenated, so without the individual ones at the beginning (I’m assuming each initialized to random values) I believe the multiple heads would be useless. Thoughts or corrections?

philipbutler
Автор

One thing I don't understand is that at 20:35, the matrix obtained by multiplying the cross-attention matrix, derived from the encoder, with the v matrix is said to represent one English word per row. But the q part of the cross-attention matrix comes from the Kannada sentences in the masked attention, shouldn't each row of the resulting matrix correspond to a Kannada word?

jackwoo
Автор

Great work Ajay, Can you share the diagram link which you have showed in the video?

sandhyas
Автор

While we are yet to translate the sentence to kanada, how can we pass it to the decoder??

AbdulRahman-tjwc
Автор

Thank you for all the videos about transformer. Although I understood the architecture, I still dont know what to set for the input of the decoder (embeded target) and mask for the TEST phase?

sarahgh
Автор

Great work indeed. Helped clear a lot of things especially the part where softmax is used for the decoder output. So the first row will output the target lang first word. But in scenarios where two source words resonate with one target lang word, how is softmax handled their? Can you please help me in figuring this out.

hajrawaheed
Автор

At the end of the decoder block, isn't there supposed to be another "Add & Norm" operation as in the architecture? Did he miss it?

supremachine
visit shbcf.ru