Coding the self attention mechanism with key, query and value matrices

Показать описание

In this lecture, we code an advanced attention mechanism from scratch, with trainable key, query and value weight matrices. Based on the key, query and value matrices: we compute the attention weights and then we compute the attention scores. We use these attention scores to then calculate the context vectors.

This is a very dense lecture consisting of detailed whiteboard notes, mathematics intuition and hands on Python coding.

0:00 Lecture objective
4:04 Context vector recap
8:37 Key, Query and Value Weight Matrices
15:33 Coding the Key, Query and Value Weight Matrices
22:31 Transforming Input Embeddings to Keys, Queries and Values
25:07 Calculating attention scores
30:07 Coding attention scores
35:17 Calculating attention weights
39:58 Coding attention weights
42:39 Scaling by square root of key dimension
51:06 Calculating context vectors
53:50 Context vectors visually explained
57:43 Context vector mathematical formula
1:00:46 Self Attention Python class - Basic version
1:09:08 Self Attention Python class - Advanced version
1:12:29 One figure to visualise self attention
1:15:29 Key, Query, Value intuition

Why do we divide the attention scores by sqrt(key matrix dimension) calculation sheet for dot product variance:

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

The best explanation of this topic present on the internet by far. Sailing into the clouds of complex concepts while maintaining contact with the roots(fundamentals) makes it a gem. Please maintain this theme of explanation.
Grateful to you.
Waiting for more such content.

JhilmilShrivas

This video is fantastic! The content is really well-made and engaging. Keep up the amazing work!"

anish-vq

Thank you for all your efforts, I am following this series from the beginning and coding along with you, that helps in building the understanding of the topic.

MrGirishbarhate

Please add tag this is lecture 15 and add tags to subsequent lectures so that no one will miss the order

pendekantimaheshbabu

following the concepts but need to brush up on the math to better understand what is going on. Do you have a set of videos that teach the math from scratch in an easy to follow manner, similar to how you teach? Thx.

helrod

`What do the terms query, key, and value mean in attention mechanisms?

poornadayapule

Is there any discord community about this lecture series?

sageraza

Coding the self attention mechanism with key, query and value matrices

Self Attention in Transformer Neural Networks (with Code!)

Attention mechanism: Overview

Coding the self attention mechanism with key, query and value matrices

Attention in transformers, visually explained | Chapter 6, Deep Learning

Pytorch Transformers from Scratch (Attention is all you need)

Coding Self Attention in Transformer Neural Networks

Attention Mechanism In a nutshell

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Multi Head Attention Part 2: Entire mathematics explained

Implementing the Self-Attention Mechanism from Scratch in PyTorch!

Multi Head Attention in Transformer Neural Networks with Code!

Attention for Neural Networks, Clearly Explained!!!

Self Attention with torch.nn.MultiheadAttention Module

What is Self Attention in Transformer Neural Networks?

Self-attention in deep learning (transformers) - Part 1

L19.4.1 Using Attention Without the RNN -- A Basic Form of Self-Attention

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Self Attention in Transformers | Deep Learning | Simple Explanation with Code!

Cross Attention vs Self Attention

Let's build GPT: from scratch, in code, spelled out.

Unlock the Power of Self-Attention in Python: A Beginner-Friendly Guide!

Why masked Self Attention in the Decoder but not the Encoder in Transformer Neural Network?

Self-Attention in NLP | how does it works?

Illustrated Guide to Transformers Neural Network: A step by step explanation