[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Yannic Kilcher • January 26, 2025

Yannic Kilcher

About

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society. Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord BitChute: https://www.bitchute.com/channel/yannic-kilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Latest Posts

[Video Response] What Cloudflare's code mode misses about MCP and tool calling

Yannic Kilcher

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Yannic Kilcher

On the Biology of a Large Language Model (Part 1)

Yannic Kilcher

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Yannic Kilcher

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Yannic Kilcher

About

Latest Posts

[Video Response] What Cloudflare's code mode misses about MCP and tool calling

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

On the Biology of a Large Language Model (Part 1)

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

You May Also Like

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Mamba | فكرة جديدة لترانسفورمرز أكفأ

Titans: Learning to Memorize at Test Time (Paper Analysis)

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

If You Don’t Understand QKV, This Video Will Fix It

The Strange Math That Predicts (Almost) Anything

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

The Code That Revolutionized Orbital Simulation

Something Weird Happens When E=−mc²

Why string theory isn't real physics | Roger Penrose, Brian Greene, and Eric Weinstein

Richard Sutton – Father of RL thinks LLMs are a dead end

How Attention Got So Efficient [GQA/MLA/DSA]

The Day Feynman Realized Students Knew NOTHING (Brazil Lecture, 1952)

Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained

Scalable MatMul-free Language Modeling (Paper Explained)

Every Nation Is in Debt… So Who’s the Lender | Yanis Varoufakis

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

How does DeepSeek learn? GRPO explained with Triangle Creatures

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Yannic Kilcher

About

Latest Posts

[Video Response] What Cloudflare's code mode misses about MCP and tool calling

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

On the Biology of a Large Language Model (Part 1)

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

You May Also Like

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Mamba | فكرة جديدة لترانسفورمرز أكفأ

Titans: Learning to Memorize at Test Time (Paper Analysis)

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

If You Don’t Understand QKV, This Video Will Fix It

The Strange Math That Predicts (Almost) Anything

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

The Code That Revolutionized Orbital Simulation

Something Weird Happens When E=−mc²

Why string theory isn't real physics | Roger Penrose, Brian Greene, and Eric Weinstein

Richard Sutton – Father of RL thinks LLMs are a dead end

How Attention Got So Efficient [GQA/MLA/DSA]

The Day Feynman Realized Students Knew NOTHING (Brazil Lecture, 1952)

Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained

Scalable MatMul-free Language Modeling (Paper Explained)

Every Nation Is in Debt… So Who’s the Lender | Yanis Varoufakis

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

How does DeepSeek learn? GRPO explained with Triangle Creatures

Enhance Your Math Skills Today

Texas Instruments BA II Plus Financial Calculator, Black

Pishengfa DIY Stitch Book, Embroidery Stitch Book, DIY Sewing Books, EmbroideryS Books of Stitches, with 86/128 Sewing Tips for Beginners, Unique Embroidery Handmade Gifts (Grey)

NAFYRE N11 PRO GPS Drone with Camera for Adults 4K UHD, 90 Min Long Flight Time, Long Control Range, Auto Return, Follow Me, Brushless Motor, 5G FPV RC Quadcopter for Beginners

Super Enduring Brushless Motor Drone with Camera for Beginners, CHUBORY A68 WiFi FPV Quadcopter with 2K HD Camera, Auto Hover, 3D Flips, Headless Mode, Trajectory Flight

Brushless Super Endurance Foldable Drone for Beginners 35+ mins Flight Time Drone with 90° Wide-Angle 2K HD Camera, Follow me, Dual Cameras, Auto Hover and Trajectory Flight(3 Batteries)

BYANDBY Tablet 7 inch Android 14.0 Tablet, 4GB+32GB ROM （1TB Expand）, Quad-Core, WiFi, GMS, Dual Camera, Educational, Games(Blue

Loading...