Lecture 13: Introduction to the Attention Mechanism in Large Language Models (LLMs)

Показать описание

In this lecture, we learn about the attention mechanism

In particular, we look at 5 aspects:
(1) Why we care about “attention”
(2) RNNs and their limitations
(3) The working of the attention mechanism
(4) History of RNNs, LSTMs, Bahdanau Attention and Transformers
(5) Self attention

0:00 Why we care about “attention”
6:19 4 types of attention mechanism
10:21 Problems with modeling long sequences
16:12 How RNNs work
23:35 RNN Limitations
27:12 Bahdanau Attention Mechanism
42:03 History of RNNs, LSTMs, Attention and Transformers
44:08 Self attention
48:00 Lecture recap

=================================================

=================================================
Vizuara philosophy:

As we learn AI/ML/DL the material, we will share thoughts on what is actually useful in industry and what has become irrelevant. We will also share a lot of information on which subject contains open areas of research. Interested students can also start their research journey there.

Students who are confused or stuck in their ML journey, maybe courses and offline videos are not inspiring enough. What might inspire you is if you see someone else learning and implementing machine learning from scratch.

No cost. No hidden charges. Pure old school teaching and learning.

=================================================

🌟 Meet Our Team: 🌟

🎓 Dr. Raj Dandekar (MIT PhD, IIT Madras department topper)

🎓 Dr. Rajat Dandekar (Purdue PhD, IIT Madras department gold medalist)

🎓 Dr. Sreedath Panat (MIT PhD, IIT Madras department gold medalist)

🎓 Sahil Pocker (Machine Learning Engineer at Vizuara)

🎓 Abhijeet Singh (Software Developer at Vizuara, GSOC 24, SOB 23)

🎓 Sourav Jana (Software Developer at Vizuara)

Vizuara

Рекомендации по теме

Комментарии

Attention has infinite reference window, whereas RNN, LSTM has short reference window. Is this right?

Ashishkumar-idnn

Thank You Sir, For This Amazingg Lecture :D

Omunamantech

Can anyone suggest which laptop would be good for ai, ml projects?

Elsaof-flph

Agreed...another great lecture.
Understanding the theory and what is under the hood helps to better understand and adjust to the practical application of things. This differentiates your teaching from may others who only teach the sizzle.

helrod

Lecture 13: Introduction to the Attention Mechanism in Large Language Models (LLMs)

Lecture 13: Introduction to ENSO (El Nino and La Nina)

Introduction to Higher Mathematics - Lecture 13: Construction of the Real Numbers

2017 Personality 13: Existentialism via Solzhenitsyn and the Gulag

Lecture 13 | Generative Models

Calculus 3 Lecture 13.1: Intro to Multivariable Functions (Domain, Sketching, Level Curves)

Lecture 13: Convolutional Neural Networks

Lecture 13 Introduction to 5 Basic Operations for WFST Minimization

Lecture 13: Diffie-Hellman Key Exchange and the Discrete Log Problem by Christof Paar

Class-10 Mathematics || Probability || Lecture:1 Introduction

Lecture 13: Fundamental Matrix

Lecture13 Endocrine Part1

Lecture #13: Publishing Part Two — Brandon Sanderson on Writing Science Fiction and Fantasy

Lecture 13: Building an Expert System and PyKE

Lecture 13: Limits of Functions

Quantum Transport, Lecture 13: Superconductivity

Lecture 13: Spacetime (International Winter School on Gravity and Light 2015)

Lecture 13: More on Scattering

2014 Personality Lecture 13: Aleksandr Solzhenitsyn (Existentialism)

Lecture 13 Time Series Analysis

DSP Lecture 13: The Sampling Theorem

MIT Numerical Methods for PDE Lecture 13: Introduction to Finite Element

General Chemistry 1A. Lecture 13. Hybridization Examples and MO Diagram Introduction.

Lecture 13: Attention

Introduction to differential geometry - Lecture 13 - Prof. Alan Huckleberry