How a Transformer works at inference vs training time

preview_player
Показать описание
I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained.

The video goes in detail explaining the difference between input_ids, decoder_input_ids and labels:
- the input_ids are the inputs to the encoder
- the decoder_input_ids are the inputs to the decoder
- the labels are the targets for the decoder.

Resources:
Рекомендации по теме
Комментарии
Автор

i rarely comment on YT videos, but I wanted to say thanks. This video doesn't have all the marketing BS and provides the type of understanding I was looking for

TempusWarrior
Автор

Beautifully explained! I want to shamelessly request you for a series where you go one step deeper to explain this beautiful architecture.

sohelshaikhh
Автор

Inference:
1. Tokens are generated one at a time conditioned on input+prev generation2
2. Language modelling head converts the hidden states to logits
3. Greedy search or beam search is possible

Training:
1. Input ids: input prompt, labels: output
2. Decoder input ids are copied from labels, prepended with <s>
3. Decoder generates text all at once but uses causal attention mask
to mask out future tokens from decoder input ids
4. -100 is given to padded position in labels to indicate cross entropy function to not compute loss there

ykhvtzm
Автор

For someone comming from a software enginering background this was hands down the most useful explanation of the transformer architecture.

kevinsummerian
Автор

You are a great teacher Niels! Would really appreciate if you add more such videos on hot ML/DL topics.

ashishnegi
Автор

This is the best explanation I have met so far on this particular topic (inference vs training). I hope that more videos like this are released in the future. Well done!

farrugiamarc
Автор

I didn't find a lot of resources that include both drawings of the process, as well as code examples / snippets that demonstrate the drawings practically. Thank you, this helps me a lot :)

vsucc
Автор

I am using the huggingface library and this video finally gave me a clear understanding of the wordings used and the transformer architecture flow. Thank you!

omgwenxx
Автор

This is the best video on transformers. Everybody explains about the structure and attention mechanism but you choose to explain the training and inference phase. Thank you so much for this video. You are awesome 😎.
Love from India ❤

shivamsengupta
Автор

This is one of the cleanest explaination of transformer inference and training on the web. Great Video!

jasonzhang
Автор

The most clear explaination of transformer model I have seen. Thanks Niels!

zobinhuang
Автор

Excellent overview of how the encoder-decoder work together. Thanks.

forecenterforcustomermanag
Автор

Thanks so much, you hit upon the points that are confusing for a first-time user of LLMs. Thank you!

sanjaybhatikar
Автор

Unbelievably great and intuitive explanation. Something for us to learn. Thanks a lot, Niels.

RamDhiwakarSeetharaman
Автор

Thanks Man. We need more this type of Video.

amitsingha
Автор

Niels, thank you very much for this video! It was really helpful! The concept behind Transformers is pretty complicated, but your explanation definitely helped me to understand it.

lucasbandeira
Автор

Great video, very comprehensible explanation of a complex subject.

henrik-ts
Автор

Great video! I have to say thank you. This video is just what I need, because I have learned some basic ideas about word2vec, LSTM, RNN and something like that, but, I cannot understand how the Transformer works and what are the input and output, your video make me all clear about them. Yes, someone drop comments said this video is "pointless" or something, no, I cannot agree that, as different audiences have different background, so it is really hard to make something happy for everyone! Someone lack some basic ideas like word2vec(why use input_ids) then they would not be able to understand this video, and instead that someone are superior good at Transformer/Diffusion, then they won't need to watch this video! So how can I say that? This video taught me how are the encoder and decoder working on every single step, very detailed, really appreciated!

zagorot
Автор

This is one of the greatest explanations I know. Thanks!

thorty
Автор

Thank you Niels, this was really helpful to me for understanding this complex topic. These aspects of the model are not normally covered in most resources I've seen.

marcoxs