How did the Attention Mechanism start an AI frenzy? | LM3

Показать описание

The attention mechanism is well known for its use in Transformers. But where does it come from? It's origins lie in fixing a strange problems of RNNs.

The source code for the animations can be found here:

These animation in this video was made using 3blue1brown's library, manim:

Chapters
0:00 Introduction
0:22 Machine Translation
2:01 Attention Mechanism
8:04 Outro

Music (In Order):
Helynt - Route 10
Helynt - Bo-Omb Battlefield
Helynt - Underwater
Helynt - Twinleaf Town

Follow me!

Рекомендации по теме

Комментарии

With that, these are the three videos I had planned out. Do check out the previous ones if you missed them!

What kind of videos would you guys like to see next?

vcubingx

This is one of the best explanations of attention I have seen so far. Understanding the bottleneck motivation really makes this clear right around 3:15.

scottmcevoy

you’re doing god’s work brother, thank you for the series

blackveganarchist

I really like how easy you make it to understand the why of things. I think you've accomplished your goal of making it seem like I could come up with this!

Please cover multi headed self attention next! :)

I am worried that this simple approach skips important pieces of the puzzle though. Transformers do have a lot of moving parts it seems. But it seems like you're only getting started!

antoineberkani

What a great video mfv I paid attention the whole time

calix-tang

Thank you for your work! Your videos were very helpful for understanding the evolution of transformers 👍

FlyingHenroxx

Best explanation when i have found so far, thank you

shukurullomeliboyev

Thank you, Vivek. Absolutely love your content. Please also keep adding Math content, though. Maybe create a playlist about different functions, limits etc? Whatever suits you.

kevindave

Nice, time to boost this video in the algorithm by typing out a comment

TheRoganExperienceJoe

Great material and presentation, thanks a lot for your work! I'd like to see some deep dive into how embeddings work, as we can get embeddings from decoder-only models like GPTs, Llamas, etc. and they use some form of embeddings for their internal representations, right? But there are also encoder-only models like BERT and others (OpenAIs text-embedding models) which are actually used instead. What is their difference and why does one work better than the other? Is it just because of computer differences or are there some inherent differences?

Fussfackel

What about Q, K a d V matrixes meaning?

maurogdilalla

If you hate others, your really just hating yourself, because we are all one with god source

aidanthompson

Weird 3b1b has the same series going on now.

OBGynKenobi

How did the Attention Mechanism start an AI frenzy? | LM3

Attention mechanism: Overview

Attention Mechanism In a nutshell

How did the Attention Mechanism start an AI frenzy? | LM3

Attention for Neural Networks, Clearly Explained!!!

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention Mechanism: Overview

The Attention Mechanism in Large Language Models

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformers, explained: Understand the model behind GPT, BERT, and T5

Illustrated Guide to Transformers Neural Network: A step by step explanation

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Self-attention in deep learning (transformers) - Part 1

What are Transformers (Machine Learning Model)?

Why Transformer over Recurrent Neural Networks

Transformers | What is attention?

Attention Mechanism | Deep Learning

The math behind Attention: Keys, Queries, and Values matrices

Visualize the Transformers Multi-Head Attention in Action

Transformers | Basics of Transformers

Attention Is All You Need

C5W3L07 Attention Model Intuition

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention