Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Serrano.Academy June 23, 2024
Video Thumbnail

You May Also Like

AI Assistant

Loading...