Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

preview_player
Показать описание
Visual Guide to Transformer Neural Networks (Series) - Step by Step Intuitive Explanation

Episode 0 - [OPTIONAL] The Neuroscience of "Attention"

Episode 1 - Position Embeddings

Episode 2 - Multi-Head & Self-Attention

Episode 3 - Decoder’s Masked Attention

This video series explains the math, as well as the intuition behind the Transformer Neural Networks that were first introduced by the “Attention is All You Need” paper.

--------------------------------------------------------------
References and Other Great Resources
--------------------------------------------------------------

Attention is All You Need

Jay Alammar – The Illustrated Transformer

The A.I Hacker – Illustrated Guide to Transformers Neural Networks: A step by step explanation

Amirhoussein Kazemnejad Blog Post - Transformer Architecture: The Positional Encoding

Yannic Kilcher Youtube Video – Attention is All You Need
Рекомендации по теме
Комментарии
Автор

*CORRECTIONS*

A big shoutout to the following awesome viewers for these 2 corrections:

1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512.

2. @JU PING NG
- The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape.

Here are the timestamps associated with the concepts covered in this video:
0:00 – Recaps of Part 0 and 1
0:56 – Difference between Simple and Self-Attention
3:11 – Multi-Head Attention Layer – Query, Key and Value matrices
11:44 – Intuition for Multi-Head Attention Layer with Examples

HeduAI
Автор

Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here

thegigasurgeon
Автор

All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.

nitroknocker
Автор

Damn. This is exactly what a developer coming from other backgrounds need.

Simple analogies for a rapid understanding.

Thanks a ton.

Keep plss

nurjafri
Автор

Absolutely underrated, hands down one of the best explanations I've found on the internet

ML-oknf
Автор

Best explanation ever on Transformers !!!

rohanvaidya
Автор

I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !

malekkamoua
Автор

The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!

HuyLe-nnft
Автор

this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)

rohtashbeniwal
Автор

Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.

chaitanyachhibba
Автор

This is one of the best Transformer videos on YouTube. I hope YouTube always recommends this Value (V), aka video, as a first Key (K), aka Video Title, when someone uses the Query (Q) as "Transformer"!! 😄

EducationPersonal
Автор

As someone NOT in the field reading the Attention paper, after having watched DOZENS of videos on the topic this is the FIRST explanation that laid it out in an intuitive manner without leaving anything out. I don't know your background, but you are definitely a great teacher. Thank you.

adscript
Автор

The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!

sebastiangarciaacosta
Автор

Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!

MGMG-lilt
Автор

Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.

forresthu
Автор

I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture.
Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment

jao
Автор

I just repeat what everybody else said: these videos are the best! thank you for the effort

Srednicki
Автор

best, best best explanation on transformer, you are adding so much value to the world.

devchoudhary
Автор

I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.

shubheshswain
Автор

This is quite literally the best attention mechanism video out there guys

sujithkumar