Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

preview_player
Показать описание
A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference
Рекомендации по теме
Комментарии
Автор

This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.

gabrielnsionu
Автор

The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much

DembaDiop-omgv
Автор

I have read and watched a lot to understand the Transformer architecture. However, this is the best one of them so far. Nobody went to this level of minute details as you went. Thank you. Please keep it up.

sinaabdi
Автор

The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.

hackie
Автор

What a gem of a video! I would request people to read the paper and then come back here so that you will understand the value we get from the instructor. Awesome work, keep it up!

saravanannatarajan
Автор

I have been religiously watching your videos and it has helped me understand difficult papers so smoothly. Kudos 👏 you are doing a great job. It feels like you are the next Andrej Karpathy.

nabanitadash
Автор

Umar, you are a great teacher. I have not seen such a great explanation of transformer. Your transformer from scratch coding is also awesome. So, basically you understand which part needs more explanation. Thanks for your effort.

snehotoshbanerjee
Автор

I'm so glad I found this again. Do NOT rely on YouTube watch history it doesn't look at all your history. This is definitely the best explanation of transformers and attention and believe me I've watched quite a few! Kudos again Umar.

JulianHarris
Автор

best explanation of the paper on the whole internet

kerrykilian
Автор

the best laid out presentation of Transformers, thank you Umar Jamil🥰

jamesmina
Автор

You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.

sushantpenshanwar
Автор

The clearest explanation of a very important breakthrough paper that I have seen on YouTube. Thank you!

_seeker
Автор

I cannot tell you how grateful I am for this explanation provided by you nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students

utkarshashinde
Автор

I must say it started off a bit bad when you started writing with the red stick, I almost tuned out. Turns out I have to agree this is the best explanation of self attention I have seen on youtube, congratulations, this is really good and properly explained, specially the QKV

laodrofotic
Автор

Wow, this is an incredibly detailed explanation of the Transformer Model! Thank you for sharing all the insights and resources. Understanding the layers and processes involved is crucial for anyone working with this model. Keep up the great work!

rachadlakis
Автор

This video is surely among the top 3 among the 50 videos that I watched to understand this subject.
We are very grateful to you, keep the energy, YouTube numbers will follow !

NJCLM
Автор

Your video has clarified and tied together the missing pieces from reading papers and watching other videos, and is the best explanation I've seen. My background is in psychology and psychometrics, so learning tranformer architectures for my dissertation has been a slog - but you've saved me a lot of time wasted on confusing explanations. Thank you so much!

barretvermilion
Автор

we love you Umar...never stop delivering

IsaacKLusuku
Автор

Thanks Umar for the amazing video. This is the most comprehensive yet understandable walkthrough of the transformer architecture that I came across. Super helpful. I feel like I have a good foundation for tackling more complex LLMs because of it.

keviny
Автор

This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail.

A thing that you cannot find in any other video.

Thank you so much sir, this is very instructif and helpful.

hamzaomari