The matrix math behind transformer neural networks, one step at a time!!!

Показать описание

Transformers, the neural network architecture behind ChatGPT, do a lot of math. However, this math can be done quickly using matrix math because GPUs are optimized for it. Matrix math is also used when we code neural networks, so learning how ChatGPT does it will help you code your own. Thus, in this video, we go through the math one step at a time and explain what each step does so that you can use it on your own with confidence.

NOTE: This StatQuest assumes that you are already familiar with:

If you'd like to support StatQuest, please consider...
...or...

...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...

...or just donating to StatQuest!
venmo: @JoshStarmer

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

0:00 Awesome song and introduction
1:43 Word Embedding
3:37 Position Encoding
4:28 Self Attention
12:09 Residual Connections
13:08 Decoder Word Embedding and Position Encoding
15:33 Masked Self Attention
20:18 Encoder-Decoder Attention
21:31 Fully Connected Layer
22:16 SoftMax

#StatQuest #Transformer #ChatGPT

Рекомендации по теме

Комментарии

Josh Starmer is the GOAT. Literally every morning I wake up with some statquest, and it really helps me get ready for my statistics classes for the day. Thank you Josh!

samglick

so happy i reached here..passing through all complicated topics and now just few topics away from completion..it's all your dedication to teaching..thank you

navneettiwari

Very educational, and also innovative in the way of doing it. I have never seen such teaching elsewhere. You are the BEST !

NJCLM

You weren't kidding, it's here! You're a man of your word and a man of the people.

jpfdjsldfji

As an electronics hobbyist/student from way back in the 70s I like to keep up as best I can with technology. I'm really glad I don't have to remember all the details in this series. There are so many layers upon layers that at times I do ''just keep going to the end'' of the videos. Nevertheless I still manage to learn key aspects and new terms from your excellent teaching abilities. There must be an incredible amount of work involved in creating these lessons.
I will purchase your book because you deserve some form of appreciation and it'll serve as a great reference resource. Much respect Josh and thanks, Kieron.

colekeircom

Josh! Thanks for this video, it has been easier for me to see the matricial representation of the computation than using the previous arrows. I really appreciate your explanation using matrices!

BaronSpartan

Josh Starmer is the GOAT, thank you, dear Josh.

jamesmina

DUDE JOSH, FINALLY! I have been waiting for this episode for a year or more. I’m so proud of you bro. You got there!

mraarone

This is really good. The simple example you used was very effective for demonstrating the inner workings of the transformer.

TheCJD

Amazing, thank you Josh. You deserve millions more subscribers

MakeDataUseful

Thanks for introducing the concepts about transformers

liuwingki

statquest's the best thing i ever found on the internet

Aa-fkjg

always been a huge fan of the channel and at this point in my life this video really couldn't have come at a better time. Thanks for enabling helping us viewers with some of the best content on the planet (I said what I said)!

roro

Your videos are a didactic stroke of genius! 👍

NewsLetter-sqeh

I will recommend this video to my friends who wants to study transformer ❤❤

sachinmohanty

Please Add this video in your Neural Network Playlist. I recently started watching that playlist

adityabhosale

Wow Sqatch! Long time no see my friend! Good to see you.

Your videos are so much fun that one does not feel we are actually in the class. Thank you Josh.

itsawonderfullife

Amazing video! Can't wait for the next one. By the way, I think there's a small typo at 5:15 where the first query weight in the matrix notation should be 2.22 instead of 0.22

Erkthbs

Thanks a lot, keep going please please

mortezamahdavi

Thanks for the great contents! One minor thing - at 5:24 minute, the first element of the Query weight matrix should be 2.22, but not 0.22

Keshi-lzef

The matrix math behind transformer neural networks, one step at a time!!!

The matrix math behind transformer neural networks, one step at a time!!!

How LLM transformers work with matrix math and code - made easy!

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Illustrated Guide to Transformers Neural Network: A step by step explanation

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

The Math behind Transformers | Srijit Mukherjee | Computer Vision | Natural Language Processing

Attention in transformers, visually explained | Chapter 6, Deep Learning

Generative AI & Prompt Engineering | Transformer Architecture | Visual intro to Transformers

Linear transformations and matrices | Chapter 3, Essence of linear algebra

How AI Discovered a Faster Matrix Multiplication Algorithm

Visualize the Transformers Multi-Head Attention in Action

Attention mechanism: Overview

Transformer Positional Embeddings With A Numerical Example.

The Math behind (most) 3D games - Perspective Projection

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

A Glitch In The Matrix Caught On Camera At Disneyland #shorts

CS480/680 Lecture 19: Attention and Transformer Networks

Three-dimensional linear transformations | Chapter 5, Essence of linear algebra

HOW CHINESE STUDENTS SO FAST IN SOLVING MATH OVER AMERICAN STUDENTS

Maths with transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Coding a ChatGPT Like Transformer From Scratch in PyTorch

But what is the Fourier Transform? A visual introduction.