Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

StatQuest with Josh Starmer • May 5, 2025

StatQuest with Josh Starmer

@statquest

About

Statistics, Machine Learning, Data Science, and AI seem like very scary topics, but since each technique is really just a combination of small and simple steps, they are actually quite simple. My goal with StatQuest is to break down the major methodologies into easy-to-understand pieces. That said, I don't dumb down the material. Instead, I build up your understanding so that you are smarter. Contact, Video Index, Etc: https://statquest.org

Latest Posts

PT4M

Reinforcement Learning: Essential Concepts

StatQuest with Josh Starmer1 year ago

73790

PT4M

StatQuest on DeepLearning.AI!!! Check out my short course on attention!

StatQuest with Josh Starmer1 year ago

21776

PT4M

Luis Serrano + Jay Alammar + Josh Starmer Q&A Livestream!!!

StatQuest with Josh Starmer1 year ago

6636

PT4M

Coding a ChatGPT Like Transformer From Scratch in PyTorch

StatQuest with Josh Starmer1 year ago

102679

Video Description

Generative Large Language Models, like ChatGPT and DeepSeek, are trained on massive text based datasets, like the entire Wikipedia. However, this training alone fails to teach the models how to generate polite and useful responses to your prompts. Thus, LLMs rely on Supervised Fine-Tuning and Reinforcement Learning with Human Feedback (RLHF) to align the models to how we actually want to use them. This StatQuest explains every step in training an LLM, with special attention to how RLHF is done. NOTE: This video is based on the original manuscript for Instruct-GPT: https://arxiv.org/abs/2203.02155 Also, you should check out Serrano Academy if you can: https://www.youtube.com/@SerranoAcademy If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying a book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! paypal: https://www.paypal.me/statquest venmo: @JoshStarmer Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 2:25 Pre-Training an LLM 5:06 Supervised Fine-Tuning 7:35 Reinforcement Learning with Human Feedback (RLHF) 10:07 RLHF - training the reward model 15:02 RLHF - using the reward model #StatQuest