DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

Показать описание

Learn about DeepSeek R1's innovative AI architecture from @deeplearningexplained. The course explores how R1 achieves exceptional reasoning through reinforcement learning, focusing on Group Relative Policy Optimization (GRPO) and how it improves upon traditional PPO methods. You'll also understand KL divergence's role in model stability, with practical code examples and clear mathematical explanations.

Contents
⌨️ (0:00:00) Introduction
⌨️ (0:01:49) R1 Overview - Overview
⌨️ (0:03:52) R1 Overview - DeepSeek R1-zero path
⌨️ (0:05:32) R1 Overview - Reinforcement learning setup
⌨️ (0:08:36) R1 Overview - Group Relative Policy Optimization (GRPO)
⌨️ (0:13:04) R1 Overview - DeepSeek R1-zero result
⌨️ (0:16:53) R1 Overview - Cold start supervised fine-tuning
⌨️ (0:17:44) R1 Overview - Consistency reward for CoT
⌨️ (0:18:35) R1 Overview - Supervised Fine tuning data generation
⌨️ (0:21:06) R1 Overview - Reinforcement learning with neural reward model
⌨️ (0:22:53) R1 Overview - Distillation
⌨️ (0:26:16) GRPO - Overview
⌨️ (0:26:55) GRPO - PPO vs GRPO
⌨️ (0:30:25) GRPO - PPO formula overview
⌨️ (0:33:25) GRPO - GRPO formula overview
⌨️ (0:36:48) GRPO - GRPO pseudo code
⌨️ (0:38:56) GRPO - GRPO Trainer code
⌨️ (0:49:24) KL Divergence - Overview
⌨️ (0:49:55) KL Divergence - KL Divergence in GRPO vs PPO
⌨️ (0:51:20) KL Divergence - KL Divergence refresher
⌨️ (0:55:32) KL Divergence - Monte Carlo estimation of KL divergence
⌨️ (0:56:43) KL Divergence - Schulman blog
⌨️ (0:57:38) KL Divergence - k1 = log(q/p)
⌨️ (1:00:01) KL Divergence - k2 = 0.5*log(p/q)^2
⌨️ (1:02:19) KL Divergence - k3 = (p/q - 1) - log(p/q)
⌨️ (1:04:44) KL Divergence - benchmarking
⌨️ (1:07:28) Conclusion

🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual

--

freeCodeCamp.org

Рекомендации по теме

Комментарии

Thank you everyone for watching, I hope this technical tutorial was useful.
Don't hesitate to reach out to me if you have any question on the content or deep learning in general.

Also, do read the paper!
You have more than enough knowledge now to appreciate it fully! :)

Have a great day! 🌹

deeplearningexplained

I've been developing a custom LLM Controller Web Chat UI specifically built around using DeepSeek so this video came at the perfect time!!

NickDoddTV

This is an excellent walk through of the research papers and hope to see more content like this, thank you!

jhonasttan

Whats the prerequisites for this ?like i have very basic ml knowledge, never gone hands on with ml

Iammuslim

anyhow, since this video description doesn't include the resources and papers he's referencing, they are available on 3 videos on his own page

mimosveta

I wanted to find a map he shows at 2:00 and searched google, and it showed me some maps, and then some "red flags" of all the stocks falling on us stock market ><

mimosveta

How the implementation of embeddings and positional encoding correspondece to all maths theory.. just to have imagination and projective views what are really happening inside all the maths... Thanks for the video anyway ..👍

MdhotshotShotter

Erm erm emm ehh sorry but even on 1.5speed this is annoying to listen to

MiroKrotky

DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

DeepSeek R1 Theory Overview | GRPO + RL + SFT

DeepSeek R1 Explained to your grandma

What is DeepSeek? AI Model Basics Explained

DeepSeek-R1 Paper Explained - A New RL LLMs Era in AI?

Dave Plummer explains Deepseek R1

Reinforcement Learning in DeepSeek-R1 | Visually Explained

Deepseek R1 vs ChatGPT O3 Mini – The Ultimate AI Battle in 2025! 🏆🤖

DeepSeek R1 Explained - A fascinating LLM AI!

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek R1 - The Era of Reasoning models

DeepSeek R1 Explained in 2 Minutes

How to Use DeepSeek R1 for Free right now

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

How does DeepSeek learn? GRPO explained with Triangle Creatures

Stop Using DeepSeek R1 Until You Watch This!

Illustrated DeepSeek-R1 And Live Q&A With Jay Alammar

DeepSeek R1 - The Chinese AI 'Side Project' That Shocked the Entire Industry!

DeepSeek R1 vs DeepSeek R1 Zero [Architecture Explained] | Run DeepSeek R1 Locally with Ollama

DeepSeek R1 Explained in 5 Minutes: AI Stability Made Simple

DeepSeek R1 101

KL Divergence in DeepSeek R1 | Implementation Walk-through

DeepSeek R1: Ultimate Guide to Account Creation & Using the R1 Reasoning Model

Stunning: Deep Seek R1 on Fundamental Physics