Transformers explained | The architecture behind LLMs

Показать описание

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers.
9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation!

Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector).
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings
15:11 Residual connections and Normalization Layers
17:09 Masked Language Modelling
17:59 Difference to RNNs

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Kshitij

📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Sunset n Beachz - Ofshane
Video editing: Nils Trost

Рекомендации по теме

Комментарии

Thanks for the explanation. At 9:19 : Shouldn't the order of multiplication be the opposite here? E.g. x1(vector) * Wq(matrix) = q1(vector). Otherwise I don't understand how we get the 1x3 dimensionality at the end

YuraCCC

Understood about 10%, but I like these vidoes and feel intuitively the usefulness.

Thomas-gk

Thanks, you helped so much explain Transformers to my PhD advisors <3

phiphi

BEST of BEST Explanation. 1) Visually, 2) intuitively, 3) by numerical examples. And your English is better than native for Foreigners to listen.

heejuneAhn

Had to go back and rewatch a section after I realized I'd been spacing out staring at the coffee bean's reactions.

uwisplaya

Great Video!! Nice improvement over the original

DatNgo-ukft

Thanks so much for this video. I’ve gone through a number of videos on transformers and this is much easier to grasp and understand for a non-data scientist like myself.

Clammer

Letitia, you're awesome and I look forward to learning more from you.

darylallen

You know how to explain things. This one is not easy: I can see the amount of work that went into this video, and it was a lot. I hope that your career takes you where you deserve.

DaveJ

I think I had at least 10 aha moments watching this, and I've watched many videos on these topics. Incredible job, thank you!

mccartym

Absolute banger of a video. Wish I had seen this when I was learning about transformers in uni last year :-)

l.suurmeijer

What a wonderful video! Thank you so much for sharing it!

manuelafernandesblancorodr

This is a very well-made explanation. I hadn't known that the feedforward layers only received one token at a time. Thanks for clearing that up for me! 😁

xxlvulkann

As far as I am aware, word embedding has changed from legacy static embedding like Word2Vec/GLOVE (like the famous queen=woman+king-man metaphor) to BPE & unigram, this change gave me quite a headache, as most of paper do not mention any detail of their "word embedding". Perhaps Letitia you can make a video to clarify this a bit for us.

tildarusso

Tomorrow i have thesis evaluation and i was thinking about watching that video again, but youtube algorithm suggested me without searching anything, Thank u youtube algo..
😅❤🔥

rahulrajpvrd

Time is quadratic, but memory is linear -- see the FlashAttention paper.
But the number of parameters is constant -- that's the magic !
Thanks for the excellent videos ! 👍

davidespinosa

Thank you very much for the very clear explanations and detailed analysis of the transformer architecture. Your truly the 3blue1brown of machine learning!

cosmic_reef_

One of the best videos on transformers that I have ever watched. Views 📈

abhishek-tandon

Best didatic explanation about Transformers so far. Thank you for sharing it.

jcneto

Thank you for the video! Maybe an explanation on the Mamba Architecture next?

SamehSyedAjmal

Transformers explained | The architecture behind LLMs

Transformers, explained: Understand the model behind GPT, BERT, and T5

Illustrated Guide to Transformers Neural Network: A step by step explanation

What are Transformers (Machine Learning Model)?

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformers explained | The architecture behind LLMs

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

Embedded real-time Linux performance optimization

Transformers for beginners | What are they and how do they work

Transformers for beginners | What are they and how do they work

Transformer Neural Networks Derived from Scratch

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

The Transformer architecture

Transformer models and BERT model: Overview

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

Tranformer Explainer- Learn About Transformer With Visualization

Attention Is All You Need

Attention in transformers, visually explained | Chapter 6, Deep Learning

Live -Transformers Indepth Architecture Understanding- Attention Is All You Need

CS480/680 Lecture 19: Attention and Transformer Networks

Attention is all you need. Transformer architecture explained

What are Transformer Neural Networks?

[ 100k Special ] Transformers: Zero to Hero