DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs
About
No channel description available.
Latest Posts
Video Description
In this video, I break down DeepSeek's Group Relative Policy Optimization (GRPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to GRPO, including: 🔵 Policy Gradient Methods 🔵 The REINFORCE Algorithm 🔵 Actor-Critic Models 🔵 PPO (Proximal Policy Optimization) 🔵 GRPO (Group-Relative policy Optimization) Papers: GRPO paper (DeepSeekMath): https://arxiv.org/pdf/2402.03300 DeepSeek-R1 paper: https://arxiv.org/pdf/2501.12948 PPO paper: https://arxiv.org/pdf/1707.06347 GAE paper: https://arxiv.org/pdf/1506.02438 TRPO paper: https://arxiv.org/pdf/1502.05477 Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLboo... 00:00 Intro 00:53 Where GRPO fits within the LLM training pipeline 04:17 RL fundamentals for LLMs 08:25 Policy Gradient Methods & REINFORCE 11:58 Reward baselines & Actor-Critic Methods 14:10 GRPO 21:42 Wrap-up: PPO vs GRPO 22:32 Research papers are like Instagram
Essential Reinforcement Learning Tools
AI-recommended products based on this video

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

PNY NVIDIA Quadro RTX 4000 - The World’S First Ray Tracing GPU

Of Course It's Good!: Aggressively Delicious Meals ANYONE Can Make and EVERYONE Will Love

Genki 1 Third Edition: An Integrated Course in Elementary Japanese 1 Textbook & Workbook Set

4-in-1 Wooden Activity Desk & Chair Set – Birch Plywood Table with Reversible Whiteboard & Chalkboard, 3 Storage Areas for Learning, Art, and Organization

Translation Earbuds Real Time - AI 144 Language Translator Earbuds, Audifonos Traductores Inglés Español, 3-in-1 Translating Device, Translate Ear Buds for Travel Learning with Charging Cradle

LEGO Creator 3 in 1 Wild Animals: Pink Flamingo Animal Building Toy - Building Toy with 3 Building Options, Pink Flamingo, Cockatoo, or Axolotl - Learning Toy for Kids, Ages 8+- 31170










