Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

preview_player
Показать описание
Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, Multi-Query Attention, KV Cache, Grouped Query Attention (GQA), the SwiGLU Activation function and more!

I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P
I also explain the math behind the Rotary Positional Embedding, with step by step proofs.

Prerequisites:

Chapters
00:00:00 - Introduction
00:01:20 - LLaMA Architecture
00:03:14 - Embeddings
00:05:22 - Coding the Transformer
00:19:55 - Rotary Positional Embedding
01:03:50 - RMS Normalization
01:11:13 - Encoder Layer
01:16:50 - Self Attention with KV Cache
01:29:12 - Grouped Query Attention
01:34:14 - Coding the Self Attention
02:01:40 - Feed Forward Layer with SwiGLU
02:08:50 - Model weights loading
02:21:26 - Inference strategies
02:25:15 - Greedy Strategy
02:27:28 - Beam Search
02:31:13 - Temperature
02:32:52 - Random Sampling
02:34:27 - Top K
02:37:03 - Top P
02:38:59 - Coding the Inference
Рекомендации по теме
Комментарии
Автор

would love to see lighterweight llms trained on custom datasets, thanks for the video! this channel is a gold mine.

imbingle
Автор

No comments.... Need to learn many things... Thank you very much for creating such interesting and helpful content...
I am fortunate - that I found your channel.

sounishnath
Автор

Highly recommended for anyone who wants to understand open source LLM inside and out.

TheMzbac
Автор

Might you consider creating a Discord guild? I'd love to hang with the people that are watching these videos!

pi
Автор

Haven't watched the full video yet but thanks for the promising content. please keep it going.
Would like to see more of the environment set up and the debugging process.

gabchen
Автор

55:44 "I could have also written the code and not tell you and not tell you anything but I like to give proof to what i do " Wow thank you for going that extra mile we really appreciate it.

Patrick-wnuj
Автор

Very good video. You have a knack for conveying complex content in understandable format. Thank you and keep up the great work

RaghavendraK
Автор

You are a hidden gem, great explanation with theoretical and technical concepts.

mazenyasser
Автор

its an honor to me, to be in those 23500 viewers who watched this video, thank you so much umar jamil for your content <3

justcars
Автор

Marked for my next watch. Thanks for producing high quality video for the series. Hope you have fun in China.

dongdongqiaqia
Автор

Great video @Umar.
I think line 47, The transformation goes from (B, Seq_Len, H, Head_Dim) -> (B, Seq_Len, H, Head_Dim/2, 2)

abishekkamal
Автор

Thank you for such a detailed analysis of the architecture and implementation features of the model! You are very good at presenting information!

jflimnl
Автор

this is hardcore machine learning engineering!

saima
Автор

Very excited for this!!! Weekend is going to be fun!

ravimandliya
Автор

Thanks for explaining all of these concepts. Keep up the good work 😎

marshallmcluhan
Автор

EXCELENT! I would like to see the se series with Llava.

renanangelodossantos
Автор

Umar bhai, your tutorials on transformer architectures and open-source LLMs are truly remarkable. As a Pakistani, seeing your expertise in deep learning is incredibly inspiring. Have you ever considered creating Urdu versions of your content? It could make your valuable knowledge more accessible to a wider audience. Your contributions are invaluable to the global tech community. Keep up the fantastic work! Huge fan of your work. May ALLAH bless you with health and success!

sharjeel_mazhar
Автор

Thank you so much for sharing this, it was really well done!

yonistoller
Автор

Thanks! I learned a lot from your excellent video.

n.
Автор

Can somebody help to explain why when calculating theta, we are not including the -2, e.g., theta = theta ** (-2 * theta_numerator / head_dim)

jensenlwt