DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

preview_player
Показать описание
Learn about DeepSeek R1's innovative AI architecture from @deeplearningexplained. The course explores how R1 achieves exceptional reasoning through reinforcement learning, focusing on Group Relative Policy Optimization (GRPO) and how it improves upon traditional PPO methods. You'll also understand KL divergence's role in model stability, with practical code examples and clear mathematical explanations.

Contents
⌨️ (0:00:00) Introduction
⌨️ (0:01:49) R1 Overview - Overview
⌨️ (0:03:52) R1 Overview - DeepSeek R1-zero path
⌨️ (0:05:32) R1 Overview - Reinforcement learning setup
⌨️ (0:08:36) R1 Overview - Group Relative Policy Optimization (GRPO)
⌨️ (0:13:04) R1 Overview - DeepSeek R1-zero result
⌨️ (0:16:53) R1 Overview - Cold start supervised fine-tuning
⌨️ (0:17:44) R1 Overview - Consistency reward for CoT
⌨️ (0:18:35) R1 Overview - Supervised Fine tuning data generation
⌨️ (0:21:06) R1 Overview - Reinforcement learning with neural reward model
⌨️ (0:22:53) R1 Overview - Distillation
⌨️ (0:26:16) GRPO - Overview
⌨️ (0:26:55) GRPO - PPO vs GRPO
⌨️ (0:30:25) GRPO - PPO formula overview
⌨️ (0:33:25) GRPO - GRPO formula overview
⌨️ (0:36:48) GRPO - GRPO pseudo code
⌨️ (0:38:56) GRPO - GRPO Trainer code
⌨️ (0:49:24) KL Divergence - Overview
⌨️ (0:49:55) KL Divergence - KL Divergence in GRPO vs PPO
⌨️ (0:51:20) KL Divergence - KL Divergence refresher
⌨️ (0:55:32) KL Divergence - Monte Carlo estimation of KL divergence
⌨️ (0:56:43) KL Divergence - Schulman blog
⌨️ (0:57:38) KL Divergence - k1 = log(q/p)
⌨️ (1:00:01) KL Divergence - k2 = 0.5*log(p/q)^2
⌨️ (1:02:19) KL Divergence - k3 = (p/q - 1) - log(p/q)
⌨️ (1:04:44) KL Divergence - benchmarking
⌨️ (1:07:28) Conclusion

🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual

--

Рекомендации по теме
Комментарии
Автор

Thank you everyone for watching, I hope this technical tutorial was useful.
Don't hesitate to reach out to me if you have any question on the content or deep learning in general.

Also, do read the paper!
You have more than enough knowledge now to appreciate it fully! :)

Have a great day! 🌹

deeplearningexplained
Автор

I've been developing a custom LLM Controller Web Chat UI specifically built around using DeepSeek so this video came at the perfect time!!

NickDoddTV
Автор

This is an excellent walk through of the research papers and hope to see more content like this, thank you!

jhonasttan
Автор

Whats the prerequisites for this ?like i have very basic ml knowledge, never gone hands on with ml

Iammuslim
Автор

anyhow, since this video description doesn't include the resources and papers he's referencing, they are available on 3 videos on his own page

mimosveta
Автор

I wanted to find a map he shows at 2:00 and searched google, and it showed me some maps, and then some "red flags" of all the stocks falling on us stock market ><

mimosveta
Автор

How the implementation of embeddings and positional encoding correspondece to all maths theory.. just to have imagination and projective views what are really happening inside all the maths... Thanks for the video anyway ..👍

MdhotshotShotter
Автор

Erm erm emm ehh sorry but even on 1.5speed this is annoying to listen to

MiroKrotky
join shbcf.ru