Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford Online • May 16, 2024

View Channel

About

No channel description available.

Latest Posts

PT4M

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

Stanford Online4 months ago

232024

PT4M

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford Online4 months ago

66633

PT4M

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 5: Off-Policy Actor Critic

Stanford Online4 months ago

3645

PT4M

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Stanford Online4 months ago

4554

Video Description

April 25, 2024 Speaker: Albert Jiang, Mistral AI / University of Cambridge Demystifying Mixtral of Experts In this talk I will introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combines their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. I will go into the architectural details and analyse the expert routing decisions made by the model. About the speaker: Albert Jiang is an AI scientist at Mistral AI, and a final-year PhD student at the computer science department of Cambridge University. He works on language model pretraining and reasoning at Mistral AI, and language models for mathematics at Cambridge. More about the course can be found here: https://web.stanford.edu/class/cs25/ View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM

Stanford CS25: V4 I Demystifying Mixtral of Experts

About

Latest Posts

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 5: Off-Policy Actor Critic

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Video Description

You May Also Like

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 14: Exploration

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

AI & Cybersecurity: Dan Boneh Interviews Sam Altman

AI in Healthcare Series: Tracking and Trusting AI in Medicine, Dr. Shantanu Nundy

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Stanford CS25: V5 I On the Biology of a Large Language Model, Josh Batson of Anthropic

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: Evaluation

AI in Healthcare Series: Navigating Medical Innovation, Dr. Amy Abernathy, Highlander Health

Stanford CS224N: NLP with Deep Learning | Spring 2024 | Lecture 1 - Intro and Word Vectors

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Seminar - Responsible AI (h)as a Learning and Design Problem

Stanford Webinar - Creating Fair, Useful, and Reliable AI in Healthcare

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Communicating Tech to Non-Tech People

Stanford CS25: V4 I Demystifying Mixtral of Experts

About

Latest Posts

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 5: Off-Policy Actor Critic

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Video Description

You May Also Like

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 14: Exploration

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

AI & Cybersecurity: Dan Boneh Interviews Sam Altman

AI in Healthcare Series: Tracking and Trusting AI in Medicine, Dr. Shantanu Nundy

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Stanford CS25: V5 I On the Biology of a Large Language Model, Josh Batson of Anthropic

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: Evaluation

AI in Healthcare Series: Navigating Medical Innovation, Dr. Amy Abernathy, Highlander Health

Stanford CS224N: NLP with Deep Learning | Spring 2024 | Lecture 1 - Intro and Word Vectors

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Seminar - Responsible AI (h)as a Learning and Design Problem

Stanford Webinar - Creating Fair, Useful, and Reliable AI in Healthcare

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Communicating Tech to Non-Tech People

Essential Log Cabin Building Tools

2-in-1 Cordless Pole Saw 8 Inch＆Mini Chainsaw, 2× 21V 4.0Ah Batteries, 15ft Max Reach, Electric Pole Chainsaw for Tree Trimming Branch Cutting, Brushless Motor, Extending Pole

Electric Pole Saw 6 Inch Cordless Chainsaw with Extendable Pole up to 16-Foot Max, 21V Battery Powered Pole Saw with 2 x 2000mAh Batteries, Great for Tree Trimming &amp; Cutting &amp; Pruning

JDR Harmonica C, Blues armonica Key of C 10 Hole 20 Tone with Case Mouth Organ Standard Diatonic for Kids Beginner Adults Professional Player Teacher Parents Students Black New Year Gift

Loading...

Electric Pole Saw 6 Inch Cordless Chainsaw with Extendable Pole up to 16-Foot Max, 21V Battery Powered Pole Saw with 2 x 2000mAh Batteries, Great for Tree Trimming & Cutting & Pruning