Proximal Policy Optimization (PPO) - How to train Large Language Models

Serrano.Academy • November 24, 2024

Serrano.Academy

About

Welcome to Serrano.Academy! I'm Luis Serrano and I love demystifying concepts, capturing their essence, and sharing these videos with you. I prefer illustrations, analogies, and cartoons, rather than formulas (although we don't shy away from the math when needed). The topics I have are machine learning, mathematics (probability and statistics), but I'm open to many others. If you have any topics you'd like to suggest, feel free to add them in the comments or drop me a line! For more information, check out http://serrano.academy. And also check out my book! Grokking Machine Learning http://manning.com/books/grokking-machine-learning (40% discount code: serranoyt)

Latest Posts

Will AI help us, or make us dependent? - A Tale of Two Cities

Serrano.Academy

Strengths and Weaknesses of Large Language Models

Serrano.Academy

Keys, Queries, and Values: The celestial mechanics of attention

Serrano.Academy

Model that won the 2024 Physics Nobel Prize - Hopfield Networks

Serrano.Academy

Proximal Policy Optimization (PPO) - How to train Large Language Models

Serrano.Academy

About

Latest Posts

Will AI help us, or make us dependent? - A Tale of Two Cities

Strengths and Weaknesses of Large Language Models

Keys, Queries, and Values: The celestial mechanics of attention

Model that won the 2024 Physics Nobel Prize - Hopfield Networks

You May Also Like

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained

How Attention Got So Efficient [GQA/MLA/DSA]

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

The Attention Mechanism in Large Language Models

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

Policy Gradient Methods | Reinforcement Learning Part 6

The Code That Revolutionized Orbital Simulation

Reinforcement Learning from Human Feedback: From Zero to chatGPT

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

Every Nation Is in Debt… So Who’s the Lender | Yanis Varoufakis

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

What are Transformer Models and how do they work?

The Strange Math That Predicts (Almost) Anything

Master Reinforcement Learning With These 3 Projects

L4 TRPO and PPO (Foundations of Deep RL Series)

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

Deep Dive into LLMs like ChatGPT

Proximal Policy Optimization (PPO) - How to train Large Language Models

Serrano.Academy

About

Latest Posts

Will AI help us, or make us dependent? - A Tale of Two Cities

Strengths and Weaknesses of Large Language Models

Keys, Queries, and Values: The celestial mechanics of attention

Model that won the 2024 Physics Nobel Prize - Hopfield Networks

You May Also Like

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained

How Attention Got So Efficient [GQA/MLA/DSA]

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

The Attention Mechanism in Large Language Models

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

Policy Gradient Methods | Reinforcement Learning Part 6

The Code That Revolutionized Orbital Simulation

Reinforcement Learning from Human Feedback: From Zero to chatGPT

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

Every Nation Is in Debt… So Who’s the Lender | Yanis Varoufakis

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

What are Transformer Models and how do they work?

The Strange Math That Predicts (Almost) Anything

Master Reinforcement Learning With These 3 Projects

L4 TRPO and PPO (Foundations of Deep RL Series)

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

Deep Dive into LLMs like ChatGPT

Upgrade Your iPhone Experience

BASESAILOR iPhone 16 16e Car Charger Adapter 3Pack,USB to USBC Connector Type C Female to A Male Cable Converter for iPhone 15 14 Pro Max Plus,Apple Watch 10,Airpods 4,iPad mini 7,Samsung Galaxy S25

URVNS 240W USB C Car Charger, Super Fast Charging 4-Ports PD3.1 140W/PD3.0 100W PPS 45W QC65W Type-C Car Adapter for MacBook Pro Laptop iPhone 15 14 Max iPad Samsung Galaxy S24/23/22 Pixel etc

UGREEN 130W Car Charger USB C Fast Charger PD3.0 QC4.0 PPS Fast Charging Car Adapter with 100W USB C Cable Compatible with Dell XPS, MacBook, iPhone 16 15 14 Pro Max, Galaxy S25 S24 Ultra, iPad Pro

VANMASS [85+LBS Strongest Suction &amp; Clip] 2025 Upgrade Car Phone Holder [Patent &amp; Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung

2Pack 10ft iPhone Charger 90 Degree Lightning Cable Right Angle Long USB Charging Cord Compatible with Apple iPhone 14 13 12 11 Pro Max XS XR X 8 7 6 5 Plus iPad iPod

Magnetic Wireless Charger for iPhone: Fodable 3 in 1 Charging Station for Multiple Apple Devices - Travel Charging Pad Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus &amp; Airpod Global Recycled Standard

Magnetic Wireless Charger for iPhone - 3 in 1 Mag-Safe Charging Station for Multiple Devices Apple - Charger Stand Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus Airpods

iPhone Charger [Apple MFi Certified] 3FT, 3Pack USB A to Lightning Cable Braided Fast Charging Apple Cord Compatible with iPhone 14 13 12 11 Pro Max XS XR X 8 7 6 5 Plus iPad/iPod/AirPods

3 in 1 Charging Station for iPhone, Wireless Charger for iPhone 16 15 14 13 12 11 X Pro Max &amp; Apple Watch - Charging Stand Dock for AirPods Global Recycled Standard

3 in 1 Charging Station for iPhone, Wireless Charger for iPhone 16 15 14 13 12 11 X Pro Max &amp; Apple Watch - Charging Stand Dock for AirPods (Blue) Global Recycled Standard

Magnetic Wireless Charger for iPhone: Fodable 3 in 1 Charging Station for Multiple Apple Devices - Travel Charging Pad Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus &amp; Airpod Global Recycled Standard

Magnetic Wireless Charger for iPhone - 3 in 1 Mag-Safe Charging Station for Multiple Devices Apple - Charger Stand Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus Airpods

Anker USB C Charger Cable (6ft 100W, 2Pack), USB 2.0 Type C Fast Charging Cable for iPhone 15 / 15Pro / 15Plus / 15ProMax MacBook Pro 2020, iPad Pro 2020, iPad Air 4, Samsung Galaxy S23+/S23 Ultra ClimatePartner certified

Spigen Case for iPhone 16 Pro Max Case, Rugged Armor [MagFit] Compatible with MagSafe - Matte Black

BASESAILOR iPhone 16 16e Car Charger Adapter 3Pack,USB to USBC Connector Type C Female to A Male Cable Converter for iPhone 15 14 Pro Max Plus,Apple Watch 10,Airpods 4,iPad mini 7,Samsung Galaxy S25

URVNS 240W USB C Car Charger, Super Fast Charging 4-Ports PD3.1 140W/PD3.0 100W PPS 45W QC65W Type-C Car Adapter for MacBook Pro Laptop iPhone 15 14 Max iPad Samsung Galaxy S24/23/22 Pixel etc

UGREEN 130W Car Charger USB C Fast Charger PD3.0 QC4.0 PPS Fast Charging Car Adapter with 100W USB C Cable Compatible with Dell XPS, MacBook, iPhone 16 15 14 Pro Max, Galaxy S25 S24 Ultra, iPad Pro

VANMASS [85+LBS Strongest Suction &amp; Clip] 2025 Upgrade Car Phone Holder [Patent &amp; Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung

Anker USB C Charger Cable (6ft 100W, 2Pack), USB 2.0 Type C Fast Charging Cable for iPhone 15 / 15Pro / 15Plus / 15ProMax MacBook Pro 2020, iPad Pro 2020, iPad Air 4, Samsung Galaxy S23+/S23 Ultra ClimatePartner certified

Bose QuietComfort Wireless Noise Cancelling Headphones, Bluetooth Over Ear Headphones with Up to 24 Hours of Battery Life, Moonlight Grey - Limited Edition

Wireless Charging Case for AirPods Pro 1st / 2nd Gen with Type-C Charging Port, Compatible with AirPod Pro 2nd Generation Replacement Case, with Blue-Tooth Sync Button, No Earbuds

Anker USB C Cable, [2-Pack, 6 ft] Type C Charger Premium Nylon USB Cable, USB A to Type C Charging Cable Fast Charge for Samsung Galaxy S10 S10+ / Note 8, LG V20 and Other USB C Charger (Black)

BASESAILOR iPhone 16 16e Car Charger Adapter 3Pack,USB to USBC Connector Type C Female to A Male Cable Converter for iPhone 15 14 Pro Max Plus,Apple Watch 10,Airpods 4,iPad mini 7,Samsung Galaxy S25

URVNS 240W USB C Car Charger, Super Fast Charging 4-Ports PD3.1 140W/PD3.0 100W PPS 45W QC65W Type-C Car Adapter for MacBook Pro Laptop iPhone 15 14 Max iPad Samsung Galaxy S24/23/22 Pixel etc

UGREEN 130W Car Charger USB C Fast Charger PD3.0 QC4.0 PPS Fast Charging Car Adapter with 100W USB C Cable Compatible with Dell XPS, MacBook, iPhone 16 15 14 Pro Max, Galaxy S25 S24 Ultra, iPad Pro

VANMASS [85+LBS Strongest Suction &amp; Clip] 2025 Upgrade Car Phone Holder [Patent &amp; Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung

Magnetic Wireless Charger for iPhone - 3 in 1 Mag-Safe Charging Station for Multiple Devices Apple - Charger Stand Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus Airpods

4-in-1 Wireless Magnetic Power Bank 10800mAh Portable Charger with iWatch Charger, QC4.0+20W PD Fast Charging USB C Battery Pack for Magsafe, iPhone 16/15/14/13/12 Series, Apple Watch etc. (White)

eazpower for Magsafe Power Bank 10000mAh, Magnetic Battery Pack Wireless, 20W PD Fast Charging, Compact Portable Charger with Built-in Cables and Stand for iPhone 16/15/14/13/12/ Apple Watch/Airpods

Loading...

VANMASS [85+LBS Strongest Suction & Clip] 2025 Upgrade Car Phone Holder [Patent & Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung

Magnetic Wireless Charger for iPhone: Fodable 3 in 1 Charging Station for Multiple Apple Devices - Travel Charging Pad Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus & Airpod Global Recycled Standard

3 in 1 Charging Station for iPhone, Wireless Charger for iPhone 16 15 14 13 12 11 X Pro Max & Apple Watch - Charging Stand Dock for AirPods Global Recycled Standard

3 in 1 Charging Station for iPhone, Wireless Charger for iPhone 16 15 14 13 12 11 X Pro Max & Apple Watch - Charging Stand Dock for AirPods (Blue) Global Recycled Standard

Magnetic Wireless Charger for iPhone: Fodable 3 in 1 Charging Station for Multiple Apple Devices - Travel Charging Pad Dock for Apple Watch iPhone 16 15 14 13 12 Pro Max Plus & Airpod Global Recycled Standard

VANMASS [85+LBS Strongest Suction & Clip] 2025 Upgrade Car Phone Holder [Patent & Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung

VANMASS [85+LBS Strongest Suction & Clip] 2025 Upgrade Car Phone Holder [Patent & Safety Certs] Cell Phone Mount Truck Stand for Dashboard Windshield Vent for iPhone 16 Pro Max 15 14 13 12 Samsung