How a Transformer works at inference vs training time

Показать описание

I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained.

The video goes in detail explaining the difference between input_ids, decoder_input_ids and labels:
- the input_ids are the inputs to the encoder
- the decoder_input_ids are the inputs to the decoder
- the labels are the targets for the decoder.

Resources:

Niels Rogge

Рекомендации по теме

Комментарии

i rarely comment on YT videos, but I wanted to say thanks. This video doesn't have all the marketing BS and provides the type of understanding I was looking for

TempusWarrior

Beautifully explained! I want to shamelessly request you for a series where you go one step deeper to explain this beautiful architecture.

sohelshaikhh

Inference:
1. Tokens are generated one at a time conditioned on input+prev generation2
2. Language modelling head converts the hidden states to logits
3. Greedy search or beam search is possible

Training:
1. Input ids: input prompt, labels: output
2. Decoder input ids are copied from labels, prepended with <s>
3. Decoder generates text all at once but uses causal attention mask
to mask out future tokens from decoder input ids
4. -100 is given to padded position in labels to indicate cross entropy function to not compute loss there

ykhvtzm

For someone comming from a software enginering background this was hands down the most useful explanation of the transformer architecture.

kevinsummerian

You are a great teacher Niels! Would really appreciate if you add more such videos on hot ML/DL topics.

ashishnegi

This is the best explanation I have met so far on this particular topic (inference vs training). I hope that more videos like this are released in the future. Well done!

farrugiamarc

I didn't find a lot of resources that include both drawings of the process, as well as code examples / snippets that demonstrate the drawings practically. Thank you, this helps me a lot :)

vsucc

I am using the huggingface library and this video finally gave me a clear understanding of the wordings used and the transformer architecture flow. Thank you!

omgwenxx

This is the best video on transformers. Everybody explains about the structure and attention mechanism but you choose to explain the training and inference phase. Thank you so much for this video. You are awesome 😎.
Love from India ❤

shivamsengupta

This is one of the cleanest explaination of transformer inference and training on the web. Great Video!

jasonzhang

The most clear explaination of transformer model I have seen. Thanks Niels!

zobinhuang

Excellent overview of how the encoder-decoder work together. Thanks.

forecenterforcustomermanag

Thanks so much, you hit upon the points that are confusing for a first-time user of LLMs. Thank you!

sanjaybhatikar

Unbelievably great and intuitive explanation. Something for us to learn. Thanks a lot, Niels.

RamDhiwakarSeetharaman

Thanks Man. We need more this type of Video.

amitsingha

Niels, thank you very much for this video! It was really helpful! The concept behind Transformers is pretty complicated, but your explanation definitely helped me to understand it.

lucasbandeira

Great video, very comprehensible explanation of a complex subject.

henrik-ts

Great video! I have to say thank you. This video is just what I need, because I have learned some basic ideas about word2vec, LSTM, RNN and something like that, but, I cannot understand how the Transformer works and what are the input and output, your video make me all clear about them. Yes, someone drop comments said this video is "pointless" or something, no, I cannot agree that, as different audiences have different background, so it is really hard to make something happy for everyone! Someone lack some basic ideas like word2vec(why use input_ids) then they would not be able to understand this video, and instead that someone are superior good at Transformer/Diffusion, then they won't need to watch this video! So how can I say that? This video taught me how are the encoder and decoder working on every single step, very detailed, really appreciated!

zagorot

This is one of the greatest explanations I know. Thanks!

thorty

Thank you Niels, this was really helpful to me for understanding this complex topic. These aspects of the model are not normally covered in most resources I've seen.

marcoxs

How a Transformer works at inference vs training time

How does a Transformer work ?

How does a Transformer work - Working Principle electrical engineering

How a Transformer Works 3D

How a Transformer Works

How a Transformer Works ⚡ What is a Transformer

What are Transformers (Machine Learning Model)?

How Does a Transformer Works? - Electrical Transformer explained

Is it easy to create your own Transformer? Everything you need to know about Transformers! || EB#42

What is Resistor || How resistor work || Understanding Resistors in Electronics

02 - What is a Transformer & How Does it Work? (Step-Up & Step-Down Transformer Circuits)

What's inside a Transformer?

How does a Transformer Work ANIMATION

What is a Transformer And How Do They Work? | Transformer Working Principle | Electrical4U

Building a Transformer - Physics Experiment

How a Transformer works at inference vs training time

Transformer Action

Transformers explained

Transformer Design and Construction: How it's made? #vigyanrecharge #transformers

How does a Transformer work ? | working principle of transformer | Transformer working animation HD

How Electrical Power Transformer are made in Factory Amazing Process 😲☝

How a Toroidal Transformer Works ⚡ What is a Toroidal Transformer

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Transformer Parts and Functions

How does a Transformer Work?