Coding the self attention mechanism with key, query and value matrices

preview_player
Показать описание
In this lecture, we code an advanced attention mechanism from scratch, with trainable key, query and value weight matrices. Based on the key, query and value matrices: we compute the attention weights and then we compute the attention scores. We use these attention scores to then calculate the context vectors.

This is a very dense lecture consisting of detailed whiteboard notes, mathematics intuition and hands on Python coding.

0:00 Lecture objective
4:04 Context vector recap
8:37 Key, Query and Value Weight Matrices
15:33 Coding the Key, Query and Value Weight Matrices
22:31 Transforming Input Embeddings to Keys, Queries and Values
25:07 Calculating attention scores
30:07 Coding attention scores
35:17 Calculating attention weights
39:58 Coding attention weights
42:39 Scaling by square root of key dimension
51:06 Calculating context vectors
53:50 Context vectors visually explained
57:43 Context vector mathematical formula
1:00:46 Self Attention Python class - Basic version
1:09:08 Self Attention Python class - Advanced version
1:12:29 One figure to visualise self attention
1:15:29 Key, Query, Value intuition

Why do we divide the attention scores by sqrt(key matrix dimension) calculation sheet for dot product variance:

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)
Рекомендации по теме
Комментарии
Автор

The best explanation of this topic present on the internet by far. Sailing into the clouds of complex concepts while maintaining contact with the roots(fundamentals) makes it a gem. Please maintain this theme of explanation.
Grateful to you.
Waiting for more such content.

JhilmilShrivas
Автор

This video is fantastic! The content is really well-made and engaging. Keep up the amazing work!"

anish-vq
Автор

Thank you for all your efforts, I am following this series from the beginning and coding along with you, that helps in building the understanding of the topic.

MrGirishbarhate
Автор

Please add tag this is lecture 15 and add tags to subsequent lectures so that no one will miss the order

pendekantimaheshbabu
Автор

following the concepts but need to brush up on the math to better understand what is going on. Do you have a set of videos that teach the math from scratch in an easy to follow manner, similar to how you teach? Thx.

helrod
Автор

`What do the terms query, key, and value mean in attention mechanisms?

poornadayapule
Автор

Is there any discord community about this lecture series?

sageraza