Lecture 16: Causal Self Attention Mechanism | Coded from scratch in Python

Показать описание

In this lecture, we learn and code the casual attention mechanism from scratch. We learn about masking, dropout and key concepts involved in causal attention.

This is a very dense lecture consisting of detailed whiteboard notes, mathematics intuition and hands on Python coding.

0:00 Self attention recap
9:30 What is causal attention?
14:25 Coding the casual attention mask in Python
24:45 Data leakage
26:39 Negative infinity masking and softmax
32:33 Dropout in causal attention
36:13 Coding causal attention dropout in Python
40:45 Coding the Causal Attention Class in Python
51:27 register_buffer in PyTorch
53:53 Next steps

PyTorch Upper and Lower Triangular Matrix implementation:

PyTorch Masked_fill implementation:

PyTorch Dropout implementation:

PyTorch register_buffer implementation:

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

Thanks for this amazing playlist as an AI domain student I am blessed that i found this channel :}

prathamanand

thank you for the deep dive. the concepts are all falling in place. Appreciate your dedication.

helrod

Sir great lecture, request you to also start a playlist on causal inference techniques in Ml(ab testing, propensity score models, uplift models) as these are heavily used in the industry and not many creators have solid content on these!

anishj

Thank You Sir For This Amazing Lecture :D

Omunamantech

We need another class learning ML AI maths from scratch and follow the same pattern
Theory white board Python programming 👌

tripchowdhry

Thanks for this initiative.. is it possible to get access to the whiteboard notes also for each video session?

adityajoshi

I think, giving out the notes is not appropriate right now - as it will make us lazy. Only after all the lectures are done, then class notes should be given out

tripchowdhry

Lecture 16: Causal Self Attention Mechanism | Coded from scratch in Python

Lecture 16: Causal Self Attention Mechanism | Coded from scratch in Python

Lecture 16: Transformer-based language models

5 concepts in transformers (part 3)

Lecture 13: Introduction to the Attention Mechanism in Large Language Models (LLMs)

Alan Watts: A Coincidence of Opposites – Being in the Way Podcast Ep. 16 – Host: Mark Watts

Let's build GPT: from scratch, in code, spelled out.

Lecture 20 - Transformers and Attention

LLMs | Multimodal Models-I | Lec17.2

Coding Position Encoding in Transformer Neural Networks

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

FlashAttention - Tri Dao | Stanford MLSys #67

Vision Transformer Basics

Increased Cardiac Arrest Emergency Events in Israel During Vaccination Period (Age 16-39)

Programming for AI (AI504, Fall 2020), Practice 11: Transformer

Attention; Transformer (Encoder)

How To Be A Great Event Emcee (15 Tips From The World's #1 Seminar MC)- Devon Brown

Lecture 16 - Disturbance Observers, Part 2 - Advanced Control Systems

Lecture 16 | Dilated & Transposed Convolutions

Stanford CS25: V4 I Overview of Transformers

14. Causal Inference, Part 1

Becoming Who You Really Are - The Philosophy of Friedrich Nietzsche

Causes of Borderline Personality Disorder

TNRG #12: Long-Context Attention in Near-Linear Time

Week 9 - Attention & Language - Rami Al-Rfou