How might LLMs store facts | Chapter 7, Deep Learning

preview_player
Показать описание
Unpacking the multilayer perceptrons in a transformer, and how they may store facts
An equally valuable form of support is to share the videos.

AI Alignment forum post from the Deepmind researchers referenced at the video's start:

Anthropic posts about superposition referenced near the end:

Some added resources for those interested in learning more about mechanistic interpretability, offered by Neel Nanda

Mechanistic interpretability paper reading list

Getting started in mechanistic interpretability

An interactive demo of sparse autoencoders (made by Neuronpedia)

Coding tutorials for mechanistic interpretability (made by ARENA)

Sections:
0:00 - Where facts in LLMs live
2:15 - Quick refresher on transformers
4:39 - Assumptions for our toy example
6:07 - Inside a multilayer perceptron
15:38 - Counting parameters
17:04 - Superposition
21:37 - Up next

------------------

These animations are largely made using a custom Python library, manim. See the FAQ comments here:

All code for specific videos is visible here:

The music is by Vincent Rubinetti.

------------------

3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.

Рекомендации по теме
Комментарии
Автор

Okay, the superposition bit blew my mind! The idea that you can fit so many perpendicular vectors in higher dimensional spaces is wild to even conceptualize, but so INSANELY useful, not just for embedding spaces, but also possibly for things like compression algorithms!
Thank you so much for this truly brilliant series!

PiercingSight
Автор

Still can't help but being blown away by the quality of pedagogy in these videos...

mr_rede_de_stone
Автор

The near-perpendicular embedding is wild! Reminds me of the ball in 10 dimensions.

ishtaraletheia
Автор

Broo, just watched chapter 4 and re-watched the previous three chapters. Your first three videos had just dropped when I was learning about neural networks in grad school, like perfect timing. Took a couple of years off drifting around. Now, I'm going back into machine learning, hopefully gonna do a PhD, so I was re-watching the video series, then realized this one was rather new and got curious, noticed the last chapter 7 is missing, then check your home page and lo and BEHOLD you've released chapter 7 like 53 minutes ago. Talk about impeccable timing. I feel like you just dropped these just for me to go into machine learning haha... kinda like "welcome back, let's continue". Anyway thank you so much for taking me on this wonderful journey.

rigelr
Автор

This is simply the most comprehensible explanation of transformers anywhere. Both the script and the visuals are fantastic. I learned a lot. Thank you so much.

lesmcfarland
Автор

There are no words for how good this is and all of the 3Blue1Brown videos are. Thank you, Grant Sanderson, you are providing a uniquely amazing service to humanity.

Vadim-rhlr
Автор

The combination of Linear + RELU + Linear function, and adding the result to the original, is known as Residual Networks. As 3b1b demonstrated in this video, the advantage Residual Networks have against a simple perceptron network is that the layers perturb (nudge) the input vector, rather than replace it completely.

SunnyKimDev
Автор

This video is pure gold! This complex topic is just so clearly and correctly explained in this video! I will show this to all my students in AI-related classes. Very highly recommended for everyone wanting to understand AI!

Gabriel-tpvc
Автор

Decided on a whim last night to get a refresher on the previous 2 videos in the series. Beautiful timing; great work as usual

kinderbeno
Автор

hey Grant im sure i cant understand everything from this series so im skipping this video, but the purpose of this comment is thank you for creating manim python library because you started(i would say) "animation for education" to entirely different level and encourage many people to do that in your style of animation using manim, because of you indirectly im learning many things in youtube, thanks again and i wish you have more success in your carrier with your loved ones

pavithran
Автор

When I was giving a talk last year on how transformers worked... I envisioned something like this video in my mind. But of course it was amateur hour in comparison, both visually and in explanation. You are a true professional at this, Grant.

awelshphoto
Автор

You have explained this topic so well that it almost looks trivial. Amazing.

EzequielPalumbo
Автор

Unbelievable this very clear of explanation for something that "very hard" is exist.

very excited to wait the next chapter!!

achmadzidan
Автор

Grant does so good of a job explaining these in interesting manner - I bet 3b1b has measurable impact on a whole humanity's grasp of Math at this point.

juliuszkocinski
Автор

During the whole video I was thinking "ok but it can only encode as many 'ideas' as there are rows in the embedding vector, so 'basketball' or 'Michael' seem oddly specific when we're limited to such a low number". When you went over the superposition idea everything clicked, it makes so much more sense now! Thank you so much for making these videos, Grant!

mchammer
Автор

The script you ran with randomly distributed vectors was mind-opening, let alone once tuned - that's incredible. It's such an awesome quirk of high dimensions. I spent a good chunk of yesterday (should have spent a good chunk of today but oh well) working on animations to try to communicate traversing a high dimensional configuration space and why gradient ascent really sucks for one particular problem, so the whole topic couldn't be more top-of-mind. (my script already contains a plug for your previous video with the "directions have meanings" foundation. this series is so good!)

AlphaPhoenixChannel
Автор

I was waiting for this video for so long. Thanks for this!

utsav
Автор

Fascinating stuff. I think the idea of polysemanticity is another fascinating way to explore how the LLM scaling can be so exponential, as more parameters are added the way they can be combined with other parameters expands combinatorially (even better than exponentially!)

Daniel_Van_Zant
Автор

I'm here to support all the other people like me who need to or choose to watch more than once to understand.

InternetOfGames
Автор

Fantastic explanation. Also, am I the only one who appreciates his correct use of "me" and "I"? So rare these days.

grotmx