The Key to Modern AI: How I Finally Understood Self-Attention (With PyTorch)

preview_player
Показать описание
Understand the core mechanism that powers modern AI: self-attention.In this video, I break down self-attention in large language models at three levels: conceptual, process-driven, and implementation in PyTorch.

Self-attention is the foundation of technologies like ChatGPT and GPT-4, and by the end of this tutorial, you’ll know exactly how it works and why it’s so powerful.

Key Takeaways:
* High-Level Concept: Self-attention uses sentence context to dynamically update word meanings, mimicking human understanding.
* The Process: Learn how attention scores, weights, and value matrices transform input data into context-enriched embeddings.
* Hands-On Code: See step-by-step how to implement self-attention in PyTorch, including creating embeddings and computing attention weights.

By understanding self-attention, you’ll unlock the key to understanding transformers and large language models.
Рекомендации по теме
Комментарии
Автор

it's paramount to have a great understanding of word2vec (aka word embedding vector) but even more important is understanding n-grams to have a grasp as to why word embeddings is such a significant advancement in nlp

SkegAudio
Автор

Better explanation than most, keep up the good work .
There is no requirement for math to be boring or inscrutible.

adeadetayo
Автор

I think this is a very good, basic explanation. It's quite illustrative. I believe it's an excellent introduction to understanding how self-attention works in transformers and how it's implemented in large language models.

sapdalf
Автор

These white board style videos are really helpful. Keep it up! You got a subscriber in me and I look forward to seeing the grow!

ryparks
Автор

Subbed. I really like how you presented the topic. The software you use is great for breaking ideas down. I would have loved if you went through the paper at the same time. Trying to break down the complicated equations into what “they’re essentially saying”

coopernik
Автор

Great explainer! As for other videos I’d be interested in… what is the deal with positional encoding, specifically the current state of the art; how does a text embedding actually guide the diffusion process in image generation; and, how is there even a gradient that can be useful in training these attention matrices.

mshonle
Автор

Thanks for the explanation. I've been thinking about how to better describe LLMs in general, and one angle I came up with is that it's more like a language calculator.

Similar to how you wouldn't say a calculator understands arithmetic, it does arithmetic - LLMs don't understand language, they do language.

I don't know if others would agree with that or not.

AB-wfek
Автор

Im still bum out thar people aren't able to draw connection with old parser and chatbot, i strongly feel that knowing how the stanford parser or chatscript works is a great insight about how llm works. Llm would feel a lot less black boxy, because they improve and do not exactly replace.

timmygilbert
Автор

subbed
what tool are you using for your presentation ?

michael_gaio
Автор

Why do you do this? Are you working at OpenAI/Anthropic developing LLMs?

MudroZvon
Автор

Just another video that describes but does not explain. Why is being able to describe so often confused with understanding?

yvesbernas
Автор

What happens then we choose not to speak no more?
Who then is paying attention?

Jeremy-Ai
Автор

It repeats the same phrases and words of the paper and other videos. Doesn't explain or add anything

davide