Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Показать описание

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

Рекомендации по теме

Комментарии

This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.

gabrielnsionu

The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much

DembaDiop-omgv

I have read and watched a lot to understand the Transformer architecture. However, this is the best one of them so far. Nobody went to this level of minute details as you went. Thank you. Please keep it up.

sinaabdi

The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.

hackie

What a gem of a video! I would request people to read the paper and then come back here so that you will understand the value we get from the instructor. Awesome work, keep it up!

saravanannatarajan

I have been religiously watching your videos and it has helped me understand difficult papers so smoothly. Kudos 👏 you are doing a great job. It feels like you are the next Andrej Karpathy.

nabanitadash

Umar, you are a great teacher. I have not seen such a great explanation of transformer. Your transformer from scratch coding is also awesome. So, basically you understand which part needs more explanation. Thanks for your effort.

snehotoshbanerjee

I'm so glad I found this again. Do NOT rely on YouTube watch history it doesn't look at all your history. This is definitely the best explanation of transformers and attention and believe me I've watched quite a few! Kudos again Umar.

JulianHarris

best explanation of the paper on the whole internet

kerrykilian

the best laid out presentation of Transformers, thank you Umar Jamil🥰

jamesmina

You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.

sushantpenshanwar

The clearest explanation of a very important breakthrough paper that I have seen on YouTube. Thank you!

_seeker

I cannot tell you how grateful I am for this explanation provided by you nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students

utkarshashinde

I must say it started off a bit bad when you started writing with the red stick, I almost tuned out. Turns out I have to agree this is the best explanation of self attention I have seen on youtube, congratulations, this is really good and properly explained, specially the QKV

laodrofotic

Wow, this is an incredibly detailed explanation of the Transformer Model! Thank you for sharing all the insights and resources. Understanding the layers and processes involved is crucial for anyone working with this model. Keep up the great work!

rachadlakis

This video is surely among the top 3 among the 50 videos that I watched to understand this subject.
We are very grateful to you, keep the energy, YouTube numbers will follow !

NJCLM

Your video has clarified and tied together the missing pieces from reading papers and watching other videos, and is the best explanation I've seen. My background is in psychology and psychometrics, so learning tranformer architectures for my dissertation has been a slog - but you've saved me a lot of time wasted on confusing explanations. Thank you so much!

barretvermilion

we love you Umar...never stop delivering

IsaacKLusuku

Thanks Umar for the amazing video. This is the most comprehensive yet understandable walkthrough of the transformer architecture that I came across. Super helpful. I feel like I have a good foundation for tackling more complex LLMs because of it.

keviny

This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail.

A thing that you cannot find in any other video.

Thank you so much sir, this is very instructif and helpful.

hamzaomari

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Attention Is All You Need

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention mechanism: Overview

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Illustrated Guide to Transformers Neural Network: A step by step explanation

Attention Is All You Need - Paper Explained

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

Transformer -Attention is all you Need in Tamil |Transformers Explained in Tamil: Step-by-Step Guide

Transformers, explained: Understand the model behind GPT, BERT, and T5

Attention is all you need explained

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

The next Attention is All You Need? Test Time Training Explained

What are Transformers (Machine Learning Model)?

Attention for Neural Networks, Clearly Explained!!!

Live -Transformers Indepth Architecture Understanding- Attention Is All You Need

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Attention is All you Need - Explained!

C5W3L07 Attention Model Intuition

Pytorch Transformers from Scratch (Attention is all you need)

AI Language Models & Transformers - Computerphile

Attention Mechanism In a nutshell

Attention is All You Need Paper Implementation (Arabic) Part 1

Transformers for beginners | What are they and how do they work