Let's reproduce GPT-2 (124M)

preview_player
Показать описание
We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

Links:

Supplementary links:

Chapters:
00:00:00 intro: Let’s reproduce GPT-2 (124M)
00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint
00:13:47 SECTION 1: implementing the GPT-2 nn.Module
00:28:08 loading the huggingface/GPT-2 parameters
00:31:00 implementing the forward pass to get logits
00:33:31 sampling init, prefix tokens, tokenization
00:37:02 sampling loop
00:41:47 sample, auto-detect the device
00:45:50 let’s train: data batches (B,T) → logits (B,T,C)
00:52:53 cross entropy loss
00:56:42 optimization loop: overfit a single batch
01:02:00 data loader lite
01:06:14 parameter sharing wte and lm_head
01:13:47 model initialization: std 0.02, residual init
01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms
01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms
01:39:38 float16, gradient scalers, bfloat16, 300ms
02:00:18 flash attention, 96ms
02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms
02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping
02:21:06 learning rate scheduler: warmup + cosine decay
02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms
02:34:09 gradient accumulation
02:46:52 distributed data parallel (DDP)
03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU)
03:23:10 validation data split, validation loss, sampling revive
03:28:23 evaluation: HellaSwag, starting the run
03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro
03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA
03:59:39 summary, phew, build-nanogpt github repo

Corrections:
I will post all errata and followups to the build-nanogpt GitHub repo (link above)

SuperThanks:
I experimentally enabled them on my channel yesterday. Totally optional and only use if rich. All revenue goes to to supporting my work in AI + Education.
Рекомендации по теме
Комментарии
Автор

It’s rare to find such high-quality, free resources that make complex topics accessible and engaging!

kiw
Автор

Do not ever look at how long your videos are. Your content is perfect and you should keep explaining things step by step. You are doing a great job. I believe you will be remembered in history as one of the pillars of AI.

carlosgermosen
Автор

Thanks Andrej! You have taught me everything I know about the theory and practice of neural networks, starting with CS231n till now. I love how you explain things starting with simple examples to build intuitions (template matching for CV, bigram/table look up for sequence modelling), and then build to state of the art. Your lessons have had a profound impact on my learning, and I can imagine there are 1000s of engineers out there just like me.

zachyamaoka
Автор

Sorry, i love your videos and what you doing for me. I couldn't attend Stanford or get into openai but learning from you is blessing to me.. i would pay you back 100times in coming years. And i was watching your git repository last two months, i could see many git code push in private, but i was confused what he is working on.. this is he was working on. To provide quality pratical knowledge to us all on youtube.

neilamrathod
Автор

Thanks for spreading the knowledge! Happy to see a 4hr workout session 😅

manojr
Автор

Thanks! 4 hours of decoding a "Decoder-Transformer", Kudos and appreciate your existence in this field.

unclecode
Автор

My anterior mid-cingulate cortex is getting bigger just watching this video because it’s hard! Thank you for your lessons, master Kaparthy.

anthonycho
Автор

The fact this video is free is incredible.

mohamedalansary
Автор

Thank you Andrej! from zero to hero boosted my professional career!

marcotuc-ilmarinaio
Автор

I am an undergraduate student. This is the lost lecture that professors never touched upon but absolutely crucial, thank you!!
I especially love how you start from the basics for so many notions, and I really learned a lot.

chenmarkson
Автор

Thanks AK, appreciate you sharing your knowledge with the world!

tanaysood
Автор

My life is simple;
Andrej drops GPT-2 The Movie, I watch.

Doggidog
Автор

Hello Andrej, thank you so much for the sharing and effort! Really appreciate it!

rainwang
Автор

Andrej is doing himself what OpenAi was supposed to do in the early days — make AI open. Thank you, Andrej!

CDK
Автор

You are the Excalibur of cutting through the hype. Thank you so much. Your ethics are inspiring, and your educational materials priceless.

PlasticCant
Автор

I've learned a lot from your Neural Network video playlist. Thank you

Themojii
Автор

4 hours of pure interest and understanding. Totally worth this donation. Thanks.

Автор

Thanks a lot for the great material. I really appreciate your videos. It takes a lot of effort and patience to come with theses. This is just a token of appreciation, not much but

kartikeyasharma
Автор

Thank you Andrej! You are really good at making seemingly difficult things so easy to understand. This makes learning so much easier more fun.

xiaochen
Автор

You are the Math and CS teacher I never had in school. Loved the approach of building incrementally in code while developing solid intuition about the Math (Calculus) and Systems (GPU hardware).

ankugoel