Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Показать описание

Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, Multi-Query Attention, KV Cache, Grouped Query Attention (GQA), the SwiGLU Activation function and more!

I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P
I also explain the math behind the Rotary Positional Embedding, with step by step proofs.

Prerequisites:

Chapters
00:00:00 - Introduction
00:01:20 - LLaMA Architecture
00:03:14 - Embeddings
00:05:22 - Coding the Transformer
00:19:55 - Rotary Positional Embedding
01:03:50 - RMS Normalization
01:11:13 - Encoder Layer
01:16:50 - Self Attention with KV Cache
01:29:12 - Grouped Query Attention
01:34:14 - Coding the Self Attention
02:01:40 - Feed Forward Layer with SwiGLU
02:08:50 - Model weights loading
02:21:26 - Inference strategies
02:25:15 - Greedy Strategy
02:27:28 - Beam Search
02:31:13 - Temperature
02:32:52 - Random Sampling
02:34:27 - Top K
02:37:03 - Top P
02:38:59 - Coding the Inference

Рекомендации по теме

Комментарии

would love to see lighterweight llms trained on custom datasets, thanks for the video! this channel is a gold mine.

imbingle

No comments.... Need to learn many things... Thank you very much for creating such interesting and helpful content...
I am fortunate - that I found your channel.

sounishnath

Highly recommended for anyone who wants to understand open source LLM inside and out.

TheMzbac

Might you consider creating a Discord guild? I'd love to hang with the people that are watching these videos!

pi

Haven't watched the full video yet but thanks for the promising content. please keep it going.
Would like to see more of the environment set up and the debugging process.

gabchen

55:44 "I could have also written the code and not tell you and not tell you anything but I like to give proof to what i do " Wow thank you for going that extra mile we really appreciate it.

Patrick-wnuj

Very good video. You have a knack for conveying complex content in understandable format. Thank you and keep up the great work

RaghavendraK

You are a hidden gem, great explanation with theoretical and technical concepts.

mazenyasser

its an honor to me, to be in those 23500 viewers who watched this video, thank you so much umar jamil for your content <3

justcars

Marked for my next watch. Thanks for producing high quality video for the series. Hope you have fun in China.

dongdongqiaqia

Great video @Umar.
I think line 47, The transformation goes from (B, Seq_Len, H, Head_Dim) -> (B, Seq_Len, H, Head_Dim/2, 2)

abishekkamal

Thank you for such a detailed analysis of the architecture and implementation features of the model! You are very good at presenting information!

jflimnl

this is hardcore machine learning engineering!

saima

Very excited for this!!! Weekend is going to be fun!

ravimandliya

Thanks for explaining all of these concepts. Keep up the good work 😎

marshallmcluhan

EXCELENT! I would like to see the se series with Llava.

renanangelodossantos

Umar bhai, your tutorials on transformer architectures and open-source LLMs are truly remarkable. As a Pakistani, seeing your expertise in deep learning is incredibly inspiring. Have you ever considered creating Urdu versions of your content? It could make your valuable knowledge more accessible to a wider audience. Your contributions are invaluable to the global tech community. Keep up the fantastic work! Huge fan of your work. May ALLAH bless you with health and success!

sharjeel_mazhar

Thank you so much for sharing this, it was really well done!

yonistoller

Thanks! I learned a lot from your excellent video.

n.

Can somebody help to explain why when calculating theta, we are not including the -2, e.g., theta = theta ** (-2 * theta_numerator / head_dim)

jensenlwt

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

FINALLY! Open-Source 'LLaMA Code' Coding Assistant (Tutorial)

How to use the Llama 2 LLM in Python

Coding Llama 2 from scratch in PyTorch - Part 3

Coding LLaMA-2 from scratch in PyTorch - Part 1

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

End To End LLM Project Using LLAMA 2- Open Source LLM Model From Meta

Train Llama 2 from Scratch in PyTorch Locally

🛠️ Build Your Own Chatbot Using Llama 3.1 8B | Ollama & Streamlit 🚀

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

How to Create Custom Datasets To Train Llama-2

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Coding Llama 3 from scratch in PyTorch - Part 1

Build Anything with Llama 3 Agents, Here’s How

Getting to Know Llama 2: Everything You Need to Start Building

'okay, but I want Llama 3 for my specific use case' - Here's how

How to build a Llama 2 chatbot

Coding Llama-2 from scratch in PyTorch - Part 2

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source

The EASIEST way to finetune LLAMA-v2 on local machine!

Step-by-step guide on how to setup and run Llama-2 model locally

Is CODE LLAMA Really Better Than GPT4 For Coding?!

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU