Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Показать описание

This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer.

Рекомендации по теме

Комментарии

very few people know these concepts well enough to give detailed explanation with formulae. thanks a ton. I was having a lot of queries and this video helped resolve those

subusrable

Best YouTube video explaining Transformer ever!

SungheeYun

Undoubtedly, these 8 videos best explain transformers. I tried other videos and tutorials, but you are the best.

shaifulchowdhury

Beautifully explained, thank you. Transformers are so simplistic yet powerful.

notanape

I have been struggling to understand the size mismatch between the encoder-decoder, your video made it clear. others usually skip this part. thanks sir

AI_Life_Journey

These videos are wonderful, thank you for putting in the work. Everything was communicated so clearly and thoroughly.

My interpretation of the attention mechanism is that the result of the similarity (weight) matrix multiplied by the value matrix gives us an offset vector, which we then add to the value and normalize to get a contextualized vector. It's interesting in the decoder, we derive this offset from a value vector in the source language, add it to the target words and it is still somehow meaningful. I presume that it is the final linear layer which ensures that this resulting normalized output vector maps coherently to a discrete word in the target language.

If we can do this across languages, I wonder if this can be done across modalities.

ryanhewitt

great，I regret not seeing your class earlier，many tutorials say little about decoder part

zimingzhang

Thanks a lot teacher....You made many things clear for me🙏🏽❤️

cedricmanouan

Thank you professor for this amazing series on the transformer!

nappingyiyi

Your videos are both precise and very educational, many thanks!

wawa

this is so clear explanations, thanks so much

violinplayer

Thanks a lot, the only complete course about transformers that I found. One question, Why K = [q1 q2 ... q_(nE)] and not K=[ k1 .... ] (or its typo?)

antonisnesios

Dear Lennart, that was awesome, could you please make a tutorial in python as well? :)

TechSuperGirl

thank you for your work, these are incredible videos. but there is one thing I didn't understand. during the training phase, the entire sentence already correctly translated is given as input to the decoder and to prevent the transformer from "cheating" masked self attention is used. How many times does this step happen? because if it only happened once then the hidden words would not be usable during training. During the training phase, after each step does backpropagation occur and then does the mask move, hiding fewer words?

nomecognome-fw

Thanks for the great lecture! One think I'd like to ask: Why do you still call it "self-attention" when information between encoder and decoder are combined? Wouldn't just "attention" or even "cross_attention" make more sense here? If not, what is the self in self-attention and what is not-self-attention?

paulvoigtlaender

In this encode-decoder architecture, I wanted to understand if we have N encoders stacked together (one after the other), is Enocder_1 feeding the Decoder_1 or is Encoder_N feeding the Decoder_1?

mrinalde

After the training of the model, when we are giving an unknown source sentence to the model how does it predict or decode the words?

akhileshbisht

Thank you for this video. During calculations in Encoder-Decoder attention layer, are matrices Wq, Wk, Wv specific to that layer, and learned only in this layer? Also, am I correct to understand, that Decoder masked self attention and Encoder-Decoder attention act as essentially different layers, with different set of W matrices?

КонстантинДемьянов-лп

Many thanks professor. However, I am not sure if we should use transpose(K) * Q, or Q * transpose(K). Suppose that Q.shape = (nd, d), K.shape = (ne, d), I think that we should use Q * transpose(K) to render an output with shape (nd, ne)

chenqu

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

FAN TRAILER: Transformers 8: Rise of Unicron (2025) - First Trailer | Shia LaBeouf

Transformers Prime Predacon Rising Full Movie Part 8 in Hindi. Transformers Prime In Hindi.

Transformers Age of Extinction | Optimus Prime vs Grimlock - Dinobots Join The Fight Scene 4K

Tarn’s on Earth (Full Clip) 4K UHD | Transformers 8 | Michael Bay Fan Made

Optimus Prime VS Bumblebee | Full Fight 🌀 4K

FAN TRAILER: TRANSFORMERS 8: RISE OF THE UNICRON – (2025)

FAN TRAILER: TRANSFORMERS 8: RISE OF THE UNICRON – (2025) (4K)

Transformers Prime Predacon Rising Full Movie Part 9 in Hindi. Transformers Prime In Hindi

Ratchet Use Megatron Parts ? #edformers #transformers #shorts

Transformers 8 (2025) Movie | Dwayne Johnson, Chris Hemsworth & Scarlett | Review & Facts

Transformers Prime Predacon Rising Full Movie Part 8 In Hindi. Terrorcon Vs Predaking VS Predacons

Transformers: Prime | S01 E08 | FULL Episode | Cartoon | Animation | Transformers Official

Transformers Prime Movie:- Predacon Rising Part-12 Last Part. Ultimate Sacrifice of Optimus Prime

This helmet is so real! Megatron! #transformers #megatron #unboxing #optimusprime #robot

Why is this Transformer ALWAYS the first one to go?

Preparing for RISE of the BEASTS with Optimus #transformers #shorts #optimusprime #riseofthebeasts

Most powerful characters in TFP #shorts #transformers

It’s an honor 🙏🏻🙏🏻 #transformers #optimus

Transformers vehicles names #shorts #transformers #optimusprime

The BEST transformers helmets you can find #transformers #shorts #optimus #bumblebee #helmet

The Ultimate Doom: Brainwash, Part 1 | Transformers: Generation 1 | Season 1 | E08 | Hasbro Pulse

The Iconic Optimus Prime Enter vs ILM's 'Mizuki Yamada' | 4K #transformers #optimuspr...

What do the Green Eyes mean in the Transformers Movies? 🤖#shorts🙄

Day-15 |Transformer Part-8 | Er. Mahendra Pindel Sir | UPPCL TG2