Lecture 14: Simplified Attention Mechanism - Coded from scratch in Python | No trainable weights

preview_player
Показать описание
In this lecture, we code a simplified attention mechanism from scratch, in Python. In the process, we learn about context vectors, attention scores and attention weights. We pay equal attention to theory, visual intuition and code.

0:00 Lecture objective
2:29 Context vectors
9:34 Coding embedding vectors in Python
14:45 What are attention scores?
19:18 Dot product and attention scores
22:57 Coding attention scores in Python
26:22 Simple normalisation
34:07 Softmax normalisation
37:34 Coding attention weights in Python
43:46 Context vector calculation visualised
50:19 Coding context vectors in Python
55:29 Coding attention score matrix for all queries
01:00:22 Coding attention weight matrix for all queries
01:04:27 Coding context vector matrix for all queries
01:14:10 Need for trainable weights in the attention mechanism

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)
Рекомендации по теме
Комментарии
Автор

Hi Raj I am 58 yrs old I follow LLM topics with interest Your videos are very good and have solid foundation You kept 3 dimensions for each word vector, and that made things very easy to understand Thank you very much for your efforts God Bless you!

pendekantimaheshbabu
Автор

I am really enjoying the content. This is what I was looking for. I hope this series will cover all the three phases.

abhijitbarman
Автор

Amazing content, thank you very much :)

Xmen
Автор

I think the following is the best approach to get the most from these exceptional lectures
First watch the whole lecture and let your subconscious mind capture the essence
. Then watch it again with full attention … everything becomes so easy to understand

tripchowdhry
Автор

Thank You Sir, For This Amazing Lecture :D

Omunamantech
Автор

Splendid —- how did we fill up the tensor at the start. Of py thon code

tripchowdhry
Автор

As mentioned in the lecture, the vector embedding captures the semantic meaning of each word. How do these vector embeddings capture the meaning derived from tokens? sir.

poornadayapule
Автор

I have one doubt sir last lecture you shown [8, 4, 256] here 4 is four word for each row and 8 is no of rows for one batch and 256 is dimensions for wach word so i this lecture how you take journey, you like 3 dimensions is your token id as sample or random id are chould please this sir

damakoushik
Автор

great lecture sir ! can we get access to the lecture notes sir, it would be a great resource to refer.

Techiiot
Автор

why we dont use cosine similarity or euclidean distance? and only use dot product ?

binnypero
Автор

Could you please explain Sir, how have you taken values for each word in three dimensional space ... Is the values taken in the video are random values or any real values of the corresponding word ? ... Please answer Sir ... Thanking and waiting for your response

damakoushik
Автор

20:07 Google represents the magnitude of a vector as its absolute value. That's strange.

Jvo_Rien
Автор

this lesson was a bit more complex so I will review until it sticks. ❓QUESTION: at the 1:08:45 time frame, did you mean to show "r2*c1, r2*c2, r2*c3"? I see "r3*c3" instead. ❓Thank you!

helrod