Lecture 14: Simplified Attention Mechanism - Coded from scratch in Python | No trainable weights

Показать описание

In this lecture, we code a simplified attention mechanism from scratch, in Python. In the process, we learn about context vectors, attention scores and attention weights. We pay equal attention to theory, visual intuition and code.

0:00 Lecture objective
2:29 Context vectors
9:34 Coding embedding vectors in Python
14:45 What are attention scores?
19:18 Dot product and attention scores
22:57 Coding attention scores in Python
26:22 Simple normalisation
34:07 Softmax normalisation
37:34 Coding attention weights in Python
43:46 Context vector calculation visualised
50:19 Coding context vectors in Python
55:29 Coding attention score matrix for all queries
01:00:22 Coding attention weight matrix for all queries
01:04:27 Coding context vector matrix for all queries
01:14:10 Need for trainable weights in the attention mechanism

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

Hi Raj I am 58 yrs old I follow LLM topics with interest Your videos are very good and have solid foundation You kept 3 dimensions for each word vector, and that made things very easy to understand Thank you very much for your efforts God Bless you!

pendekantimaheshbabu

I am really enjoying the content. This is what I was looking for. I hope this series will cover all the three phases.

abhijitbarman

Amazing content, thank you very much :)

Xmen

I think the following is the best approach to get the most from these exceptional lectures
First watch the whole lecture and let your subconscious mind capture the essence
. Then watch it again with full attention … everything becomes so easy to understand

tripchowdhry

Thank You Sir, For This Amazing Lecture :D

Omunamantech

Splendid —- how did we fill up the tensor at the start. Of py thon code

tripchowdhry

As mentioned in the lecture, the vector embedding captures the semantic meaning of each word. How do these vector embeddings capture the meaning derived from tokens? sir.

poornadayapule

I have one doubt sir last lecture you shown [8, 4, 256] here 4 is four word for each row and 8 is no of rows for one batch and 256 is dimensions for wach word so i this lecture how you take journey, you like 3 dimensions is your token id as sample or random id are chould please this sir

damakoushik

great lecture sir ! can we get access to the lecture notes sir, it would be a great resource to refer.

Techiiot

why we dont use cosine similarity or euclidean distance? and only use dot product ?

binnypero

Could you please explain Sir, how have you taken values for each word in three dimensional space ... Is the values taken in the video are random values or any real values of the corresponding word ? ... Please answer Sir ... Thanking and waiting for your response

damakoushik

20:07 Google represents the magnitude of a vector as its absolute value. That's strange.

Jvo_Rien

this lesson was a bit more complex so I will review until it sticks. ❓QUESTION: at the 1:08:45 time frame, did you mean to show "r2*c1, r2*c2, r2*c3"? I see "r3*c3" instead. ❓Thank you!

helrod

Lecture 14: Simplified Attention Mechanism - Coded from scratch in Python | No trainable weights

Lecture 14: Simplified Attention Mechanism - Coded from scratch in Python | No trainable weights

CS 198-126: Lecture 14 - Transformers and Attention

Lecture 13: Attention

Computational Creativity Lecture 14: Attention and transformers

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Illustrated Guide to Transformers Neural Network: A step by step explanation

C5W3L07 Attention Model Intuition

Lecture 14: Seq2Seq and machine translation

Lecture 14: Limited Attention

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 14 – Transformers and Self Attention...

Ali Ghodsi, Deep Learning, Attention mechanism, self-attention, S2S, Fall 2023, Lecture 9

Attention Is All You Need

Lecture : 14 Limiting behaviour of measurable functions

Transformer, explained in detail | Igor Kotenkov | NLP Lecture (in Russian)

CS480/680 Lecture 19: Attention and Transformer Networks

C5W3L08 Attention Model

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

Lecture 10: Neural Machine Translation and Models with Attention

BERT Neural Network - EXPLAINED!

Lecture 14: Visualizing and Understanding

Lecture 14 | Deep Reinforcement Learning

14. Learning: Sparse Spaces, Phonology

Live -Transformers Indepth Architecture Understanding- Attention Is All You Need

Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention