Multi Head Attention Part 2: Entire mathematics explained

Показать описание

In this lecture, we learn about multi-head attention with weight splits. We see a step by step mathematical explanation of each and every step of calculating multi-head attention.

After this in-depth lecture, you will master the foundations of multi-head attention: theory as well as code.

0:00 Multi-head attention recap
3:18 Multi-head attention with weight splits introduction
9:41 Defining inputs
11:44 Decide output dimension, number of heads
13:45 Initialize trainable key, query, value weight matrices
16:12 Calculate the key, query and value matrices
19:14 Unroll key, query, value dimensions to include num_heads
24:48 Group matrices by number of heads
28:45 Finding attention scores
36:00 Finding attention weights
44:05 Finding multi-head context vectors
54:50 Hands on example testing
59:31 Conclusion

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)