ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

preview_player
Показать описание
This video covers everything about self attention in Vision Transformer - VIT , and its implementation from scratch.
I go over all the details and explain everything happening inside attention in vision transformer in detail through visualizations and also go over how an implementation of self-attention from scratch would look like in Pytorch.

I cover Vision transformer ( VIT ) in three parts:
2. Self Attention in Vision Transformer VIT - This video

*Other Good Resources*

*TimeStamps* :
00:00 Intro
00:33 Intuition of What isAttention & Why its helpful
03:23 Inside Attention - What is Relevant
07:53 Inside Attention - Building Context Representation
08:45 Building Context Representation For All Patches
09:45 Why Multi Head Attention
11:15 Building Context Representation For Multi Head Attention
12:35 Combining Wq, Wk,Wv matrix
13:34 Shapes of Every Matrix in Attention
14:48 Implementation Parts of Attention
15:12 Pytorch Implementation for Attention in Vision Transformer VIT
18:26 Outro

Background Track - Fruits of Life by Jimena Contreras
Рекомендации по теме
Комментарии
Автор

Amazing explanation... i did not come accross the beautiful and easy explanation of transformers that seems extremely difficult... this channel deserves millions of subscribers 🎉

DrAIScience
Автор

Best explanation of multi-head attention i have attended to! I already had a reasonable intuition but still gathered so much more, massive respect to your work 🙏

sladewinter
Автор

Great content! This is helping a lot!! Keep it up :)

sebastiancavada
Автор

Sir Can you explain dual attention vision transformers (Davit)please

shashankdevraj
Автор

Would rearranging by heads before splitting into q, k, v cause any logical difference. Just means fewer lines of code, and operations, but mostly was just curious to verify as it felt same to me.

sladewinter
Автор

Helping, much appreciated. Sir how about self attention in image context

muhammadawais