Attention in transformers, visually explained | Chapter 6, Deep Learning

preview_player
Показать описание
Demystifying attention, the key mechanism inside transformers and LLMs.
An equally valuable form of support is to simply share the videos.

Demystifying self-attention, multiple heads, and cross-attention.

And yes, at 22:00 (and elsewhere), "breaks" is a typo.

------------------

Here are a few other relevant resources

Build a GPT from scratch, by Andrej Karpathy

If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:

If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.

Site with exercises related to ML programming and GPTs

An early paper on how directions in embedding spaces have meaning:

------------------

Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending

------------------

These animations are largely made using a custom Python library, manim. See the FAQ comments here:

All code for specific videos is visible here:

The music is by Vincent Rubinetti.

------------------

3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.

Рекомендации по теме
Комментарии
Автор

A few added notes based on common comments I see.

Concerning masking self-attention, several people ask about cases where it feels like later words should update the meaning of earlier words, like languages with adjectives following nouns. The model can always put the richest meaning into the last token (e.g. early nouns getting baked into later adjectives). For example, @victorlevoso8984 noted below how empirical evidence suggests the meaning of a sentence often gets baked into the embedding of the punctuation mark at its end. Keep in mind that the model doesn't have to conceptualize things the way we humans do, and in all likelihood doesn't at all, so I wouldn't over-index on the motivating example given in this video.

Also, one thing I should have called out more explicitly is how I personally like to think of vectors like embeddings, keys, queries, etc. as columns, and as a convention display them this way, but other sources, including the Attention is All You Need paper, may present them organized in a row-by-row fashion. This is relevant to parsing the equation shown at 10:29, where the expression from the paper that looks like Q K^T would, by the conventions of this video, instead look like K^T Q.

bluebrown
Автор

I'm a university lecturer with a PhD in AI, and I cannot compete with the quality of this work. Videos like this put the entire higher education system to shame. Fantastic! ❤️

philrod
Автор

I've got to say - "Attention Is All You Need" is an incredible title for a research paper.

Steamrick
Автор

Are you kidding me? ONE WEEK FOR 2 MASTERPIECES?!
Thank you so much!

actualBIAS
Автор

3b1b is the only content producer whose videos I start by first making coffee, then upvoting, then hitting the play button.

sriramsrinivasan
Автор

As a graduating PhD student working in Natural Language Processing, I still found that video to be extremely beneficial. Awesome!

hailking
Автор

How I wish this video was available when the "Attention is What You Need" paper just came out. It was really hard to visualize by simply reading the paper. I read it multiple times but could not figure out what it was trying to do.

Then subsequently, Jay Alammar posted a blog post called The illustrated transformer. That was a huge help for me back then. But this video raises the illustration to an entirely different level.

Great job! I'm sure many undergraduates or hobbyist studying machine learning would benefit greatly.

QuantAI-kpxt
Автор

Attention existed before the 2017 paper "Attention Is All You Need".

The main contribution was that attention was... all you needed for sequence processing (you didn't need recurrence). Self-attention specifically was novel though.

Henry-fvbc
Автор

Geez Grant, I spent thousands of dollars on a very good deep learning executive certification from Carnegie Mellon, and your series here is better than their math slides. This series is really turning out great.

DataRae-AIEngineer
Автор

I cannot stress enough what a tour de force this is. It's probably one of the best math classes ever done anywhere in the world in all time.

You're the best in the game and an inspiration for many. So so much thank you, Grant, you're doing God's work here.

MatheusC
Автор

As a Master's student in Data Science and AI, I never really understood how attention worked. Thank you for making this video!

muelleer
Автор

There are people … all over the world … like me … who really, really, really appreciate you. I cannot thank you enough for taking the time to share your knowledge and help others to understand this technology much more deeply. Seriously, kudos and sincerest thanks. ❤

JustGrowers
Автор

Just Wow, the educational value of this video is incredible.
There are so many highly relevant and original ideas to explain abstract concepts and drastically simplify comprehension.
I'm so thankful that you've made this content available to everyone for free.
I absolutely love it!!

Otomega
Автор

As director of video content for a major educational publisher, this is some of the best educational content I’ve ever seen. Your content gives me ideas of how to shape the future of undergraduate level STEM videos. A true legend and inspiration in this space- thank you for the meticulously outstanding work that you do.

JonyBetancourt
Автор

You not only put out some of the best content on youtube but also give constant shutouts to other content creators that you admire. You are a GOAT 3Blue1Brown.

michaelthompson
Автор

If I could write poetry about how much I appreciate and learn from your videos, I would but I'm not a poet. Thanks to everyone who worked on these videos.

glizzy
Автор

the fact that this is freely available on YT is insane: thanks for all the amazing work throughout the years.

fluffybunny
Автор

I'm a Computer Science student currently working with a Transformer for my master thesis and this video is absolute gold to me. I think this is the best explanation video I've ever seen. Holy shit, it is so clear and insightful. I'm so looking forward to the third video of the series!!!! The first one was absolutely amazing too. Thank you sooo much for this genius piece of work!!!!

annachester
Автор

Grant is all you need.

This was propably the tenth video or podcast about the subject and only now I understand the underlying motivation for each component it has.

bola
Автор

Thank you for the mention, Grant!

For those who relate to the pain of wanting more practice problems for Machine Learning, I hear you.

I’ve created coding problems (run against test cases in your browser!) & quizzes covering the core ML concepts.

Check out the resource Grant mentioned (linked in the description) or just click on my channel!

gptLearningHub