How might LLMs store facts | Chapter 7, Deep Learning

Показать описание

Unpacking the multilayer perceptrons in a transformer, and how they may store facts
An equally valuable form of support is to share the videos.

AI Alignment forum post from the Deepmind researchers referenced at the video's start:

Anthropic posts about superposition referenced near the end:

Some added resources for those interested in learning more about mechanistic interpretability, offered by Neel Nanda

Mechanistic interpretability paper reading list

Getting started in mechanistic interpretability

An interactive demo of sparse autoencoders (made by Neuronpedia)

Coding tutorials for mechanistic interpretability (made by ARENA)

Sections:
0:00 - Where facts in LLMs live
2:15 - Quick refresher on transformers
4:39 - Assumptions for our toy example
6:07 - Inside a multilayer perceptron
15:38 - Counting parameters
17:04 - Superposition
21:37 - Up next

------------------

These animations are largely made using a custom Python library, manim. See the FAQ comments here:

All code for specific videos is visible here:

The music is by Vincent Rubinetti.

------------------

3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.

Рекомендации по теме

Комментарии

Okay, the superposition bit blew my mind! The idea that you can fit so many perpendicular vectors in higher dimensional spaces is wild to even conceptualize, but so INSANELY useful, not just for embedding spaces, but also possibly for things like compression algorithms!
Thank you so much for this truly brilliant series!

PiercingSight

Still can't help but being blown away by the quality of pedagogy in these videos...

mr_rede_de_stone

The near-perpendicular embedding is wild! Reminds me of the ball in 10 dimensions.

ishtaraletheia

Broo, just watched chapter 4 and re-watched the previous three chapters. Your first three videos had just dropped when I was learning about neural networks in grad school, like perfect timing. Took a couple of years off drifting around. Now, I'm going back into machine learning, hopefully gonna do a PhD, so I was re-watching the video series, then realized this one was rather new and got curious, noticed the last chapter 7 is missing, then check your home page and lo and BEHOLD you've released chapter 7 like 53 minutes ago. Talk about impeccable timing. I feel like you just dropped these just for me to go into machine learning haha... kinda like "welcome back, let's continue". Anyway thank you so much for taking me on this wonderful journey.

rigelr

This is simply the most comprehensible explanation of transformers anywhere. Both the script and the visuals are fantastic. I learned a lot. Thank you so much.

lesmcfarland

There are no words for how good this is and all of the 3Blue1Brown videos are. Thank you, Grant Sanderson, you are providing a uniquely amazing service to humanity.

Vadim-rhlr

The combination of Linear + RELU + Linear function, and adding the result to the original, is known as Residual Networks. As 3b1b demonstrated in this video, the advantage Residual Networks have against a simple perceptron network is that the layers perturb (nudge) the input vector, rather than replace it completely.

SunnyKimDev

This video is pure gold! This complex topic is just so clearly and correctly explained in this video! I will show this to all my students in AI-related classes. Very highly recommended for everyone wanting to understand AI!

Gabriel-tpvc

Decided on a whim last night to get a refresher on the previous 2 videos in the series. Beautiful timing; great work as usual

kinderbeno

hey Grant im sure i cant understand everything from this series so im skipping this video, but the purpose of this comment is thank you for creating manim python library because you started(i would say) "animation for education" to entirely different level and encourage many people to do that in your style of animation using manim, because of you indirectly im learning many things in youtube, thanks again and i wish you have more success in your carrier with your loved ones

pavithran

When I was giving a talk last year on how transformers worked... I envisioned something like this video in my mind. But of course it was amateur hour in comparison, both visually and in explanation. You are a true professional at this, Grant.

awelshphoto

You have explained this topic so well that it almost looks trivial. Amazing.

EzequielPalumbo

Unbelievable this very clear of explanation for something that "very hard" is exist.

very excited to wait the next chapter!!

achmadzidan

Grant does so good of a job explaining these in interesting manner - I bet 3b1b has measurable impact on a whole humanity's grasp of Math at this point.

juliuszkocinski

During the whole video I was thinking "ok but it can only encode as many 'ideas' as there are rows in the embedding vector, so 'basketball' or 'Michael' seem oddly specific when we're limited to such a low number". When you went over the superposition idea everything clicked, it makes so much more sense now! Thank you so much for making these videos, Grant!

mchammer

The script you ran with randomly distributed vectors was mind-opening, let alone once tuned - that's incredible. It's such an awesome quirk of high dimensions. I spent a good chunk of yesterday (should have spent a good chunk of today but oh well) working on animations to try to communicate traversing a high dimensional configuration space and why gradient ascent really sucks for one particular problem, so the whole topic couldn't be more top-of-mind. (my script already contains a plug for your previous video with the "directions have meanings" foundation. this series is so good!)

AlphaPhoenixChannel

I was waiting for this video for so long. Thanks for this!

utsav

Fascinating stuff. I think the idea of polysemanticity is another fascinating way to explore how the LLM scaling can be so exponential, as more parameters are added the way they can be combined with other parameters expands combinatorially (even better than exponentially!)

Daniel_Van_Zant

I'm here to support all the other people like me who need to or choose to watch more than once to understand.

InternetOfGames

Fantastic explanation. Also, am I the only one who appreciates his correct use of "me" and "I"? So rare these days.

grotmx

How might LLMs store facts | Chapter 7, Deep Learning

How might LLMs store facts | Chapter 7, Deep Learning

LLM Explained | What is LLM

How ChatGPT Works Technically | ChatGPT Architecture

LLMs in Production - First Chapter Summary

Using AI and data for predictive planning and supply chain

How AIs, like ChatGPT, Learn

How Does Rag Work? - Vector Database and LLMs #datascience #naturallanguageprocessing #llm #gpt

Is ChatGPT Stealing Our Data? How to Stay Private When Using AI

Do Large Language Models have a Duty to Tell the Truth? with Brent Mittelstadt, PhD

Contextual Retrieval with Any LLM: A Step-by-Step Guide

[1hr Talk] Intro to Large Language Models

“What's wrong with LLMs and what we should be building instead” - Tom Dietterich - #VSCF2023...

Vector Databases simply explained! (Embeddings & Indexes)

Run LLMs locally - 5 Must-Know Frameworks!

GenAI: An Unreliable Information Store // Noble Ackerson // LLMs in Production Conference III Talk

Industrial-scale Web Scraping with AI & Proxy Networks

How AI Could Empower Any Business | Andrew Ng | TED

Anthropic's new improved RAG: Explained (for all LLM)

Train ChatGPT On Your Data (Easy Method)

Finetuning Open-Source LLMs

Local LLMs, some facts some fiction

What is Prompt Tuning?

Build Anything with AI Agents, Here's How

Vector Databases (Pinecone)