Lecture 17: Multi Head Attention Part 1 - Basics and Python code

preview_player
Показать описание
In this lecture, we learn the basics of the multi-head attention mechanism and code it in Python. This is Part 1 of the 2 part multi head attention series.

0:00 Causal attention: concept recap
10:00 Causal attention class: code recap
13:19 What is multi-head attention?
20:18 Coding multi-head attention in Python
29:07 Making multi-head more efficient

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)
Рекомендации по теме
Комментарии
Автор

Thank You Sir For This Amazing Lecture :D

Omunamantech
Автор

Sir, Can you please elaborate on mhawrapper, enumeration, super() in 3-4 sentences along the usual session going forward . This is from a person not having software background. No pressure. Thanks for your wonderful teaching.

SasiKumar-sppy
Автор

is this attention head number can be hyper tuned or optimized via hypertuning ? or its fixed? please reply and also i wanna know that how can we decode the context vector like we have attention weights tellings us importance of all words with respect to others but after it got converted to context vec the dim decreases so how model understands that thing like whats logic their for converting attn wts to context vec? whats need ?

binnypero