Lecture 17: Multi Head Attention Part 1 - Basics and Python code

Показать описание

In this lecture, we learn the basics of the multi-head attention mechanism and code it in Python. This is Part 1 of the 2 part multi head attention series.

0:00 Causal attention: concept recap
10:00 Causal attention class: code recap
13:19 What is multi-head attention?
20:18 Coding multi-head attention in Python
29:07 Making multi-head more efficient

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

Thank You Sir For This Amazing Lecture :D

Omunamantech

Sir, Can you please elaborate on mhawrapper, enumeration, super() in 3-4 sentences along the usual session going forward . This is from a person not having software background. No pressure. Thanks for your wonderful teaching.

SasiKumar-sppy

is this attention head number can be hyper tuned or optimized via hypertuning ? or its fixed? please reply and also i wanna know that how can we decode the context vector like we have attention weights tellings us importance of all words with respect to others but after it got converted to context vec the dim decreases so how model understands that thing like whats logic their for converting attn wts to context vec? whats need ?

binnypero

Lecture 17: Multi Head Attention Part 1 - Basics and Python code

Lecture 17: Multi Head Attention Part 1 - Basics and Python code

Visualize the Transformers Multi-Head Attention in Action

Illustrated Guide to Transformers Neural Network: A step by step explanation

Multi-head attention | Scaled dot Product Attention | Transformers attention is all you need |Part 2

11-785, Fall 22 Lecture 17: Sequence to Sequence Models: Attention Models

ADL Lecture 6.3: Multi-Head Attention (20/04/07)

Multi-headed attention

17. Transformers Explained Easily: Part 1 - Generative Music AI

EfficientML.ai Lecture 16 - Vision Transformer (MIT 6.5940, Fall 2024)

Attention is all you need. A Transformer Tutorial: 9. Efficient Multi-head attention

Lecture 17: Self Attention -- Query, Key and Value vectors

Lecture 17: Issues in NLP and Possible Architectures for NLP

Masked Self Attention | Masked Multi-head Attention in Transformer | Transformer Decoder

ADL Lecture 3.1: Word Representations (20/03/17)

CMU Neural Nets for NLP 2021 (7): Attention

torch.nn.TransformerDecoderLayer - Part 2 - Embedding, First Multi-Head attention and Normalization

ADL Lecture 5.3: Multi-Head Attention (21/03/29)

Transformer, explained in detail | Igor Kotenkov | NLP Lecture (in Russian)

ADL Lecture 5.4: Transformer (21/03/29)

Taster Lecture Series: Attention in Deep Learning

What is Attention in Neural Networks

Lecture 17: Small UAS Operations

F23 Lecture 17: Recurrent Networks, Modeling Language Sequence-to-Sequence Models

25. Transformers