Large Language Models in Five Formulas

preview_player
Показать описание
Tutorial on building intuition about LLMs.

00:00 - Intro
02:15 - 1: Generation (Perplexity)
15:40 - 2: Memory (Attention)
28:00 - 3: Efficiency (GEMM)
38:40 - 4: Scaling (Chinchilla)
46:37 - 5: Reasoning (RASP)
55:33 - Conclusion

Developed for an invited tutorial at the Harvard Data Science Initiative.

Note: This tutorial is rather high-level and leaves out much of the scientific and citation history. There are other great guides that provide this in detail. My goal here was chalk-board level intuition.
Рекомендации по теме
Комментарии
Автор

This appears to be a distillation of the most important concepts in large language models today. Thanks for the exposition.

nintishia
Автор

Extremely high entropy video. Amazing clarity, delivery, content, and follow. Pure genius!

muhannadobeidat
Автор

This is a great modern supplement to Karpathy's guide to language models! Thanks Sasha! Just subbed

DistortedV
Автор

I found this to be an incredibly unique and interesting approach to explaining LLMs, an excellent introduction, thank you so much for the video!

sarthak-ti
Автор

Thank you for making this video so interesting with those nice graphics and examples. I need to sit down and watch it attentively.

sheikhakbar
Автор

Excellent presentation! Easy to follow and tons of great material including the links to the slides

joedigiovanni
Автор

Knowledge/sec in this video is off the chart, and the info is cutting edge!

icriou
Автор

Amazing content, thanks for putting this together!

donatocapitella
Автор

Thanks a lot Prof. Rush for this material.

syedmostofamonsur
Автор

For someone like me who is new to this field and wants to understand the nitty-gritty of language models, it's necessary to see each part separately, understand it first, and then move on to the next part. But still, I can sense how fantastically it is explained to those who have the basic understanding of deep learning.

arkaprovobhattacharjee
Автор

Thanks for the video good high level overview. I like the excalidraw slides also

ItzGanked
Автор

Hey Sasha, What tools do you use to make your presentations? It's so different from the typical academic presentations :)

shubhamtoshniwal
Автор

This is very insightful. Thanks for posting!

pebre
Автор

this was a wonderful video thanks so much for this

ChinaTalkMedia
Автор

Great complement to Karphathy's video

FabienFabienB
Автор

Thanks for this awesome explanation! Can someone explain one point to me? The issue with argmax at 22:15 is that it has no derivative, so neural network parameters cannot be trained using it. If I understand correctly, the argmax is the word which should be "attended" when predicting the next word (park). Why is argmax the desired function here - what if the prediction of the next word depends on not the most important single word, but the most important two words in the context? Considering this case, doesn't softmax have an additional benefit over the "naive" argmax that it can also compute distributions with more than one mode?

benjaminsteenhoek
Автор

was narration generated? I would love to use the same technique for narrating text.

AllNightLearner
Автор

At 32:41, isn't each element in AB rows in A multiplying with columns in B? Waiting for your answer.

ZylinTeo
Автор

Well every output must be mathematical proven ingest so can we not build a formula for every pattern of output. Let's say it out human sense n grammar sense of each word constructed. While it construct can it not out how it did it

martiancoders
Автор

WOOHOO! just found this channel. it is almost better than porn. how do we give you our money so you keep making videos? pls tell us :o

Tubernameu