He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]
About
No channel description available.
Video Description
The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research (CTM) which might lead the way forwards. We speak about "Inventor's Remorse" & The Trap of Success Despite being one of the original authors of the famous "Attention Is All You Need" paper that gave birth to the Transformer, Llion explains why he has largely stopped working on them. He argues that the industry is suffering from "success capture"—because Transformers work so well, everyone is focused on making small tweaks to the same architecture rather than discovering the next big leap. **SPONSOR MESSAGES START** — Build your ideas with AI Studio from Google - http://ai.studio/build — Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/ — cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst Submit investment deck: https://cyber.fund/contact?utm_source=mlst — **END** The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling. They argue that today's AI models are similar—they are incredible at mimicking intelligent answers without having an internal process of "thinking". Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information. The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step. Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely. The pair discuss the culture of Sakana AI, which is modeled after the early days of Google Brain/DeepMind. Llion nostalgically recalls that the Transformer wasn't born from a corporate mandate, but from random people talking over lunch about interesting problems. https://sakana.ai/ https://x.com/YesThisIsLion https://x.com/LearningLukeD TRANSCRIPT: https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8 TOC: 00:00:00 - Stepping Back from Transformers 00:00:43 - Introduction to Continuous Thought Machines (CTM) 00:01:09 - The Changing Atmosphere of AI Research 00:04:13 - Sakana’s Philosophy: Research Freedom 00:07:45 - The Local Minimum of Large Language Models 00:18:30 - Representation Problems: The Spiral Example 00:29:12 - Technical Deep Dive: CTM Architecture 00:36:00 - Adaptive Computation & Maze Solving 00:47:15 - Model Calibration & Uncertainty 01:00:43 - Sudoku Bench: Measuring True Reasoning REFS: Why Greatness Cannot be planned [Kenneth Stanley] https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237 https://www.youtube.com/watch?v=lhYGXYeMq_E The Hardware Lottery [Sara Hooker] https://arxiv.org/abs/2009.06489 https://www.youtube.com/watch?v=sQFxbQ7ade0 Continuous Thought Machines [Luke Darlow et al / Sakana] https://arxiv.org/abs/2505.05522 https://sakana.ai/ctm/ https://youtu.be/5X9cjGLggv0 great walkthrough of algo by Yacine Mahdid LSTM: The Comeback Story? [Prof. Sepp Hochreiter] https://www.youtube.com/watch?v=8u2pW2zZLCs Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley] https://arxiv.org/pdf/2505.11581 Intelligent Matrix Exponentiation [Thomas Fischbacher] (Spiral reference) https://arxiv.org/abs/2008.03936 A Spline Theory of Deep Networks [Randall Balestriero] https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf https://www.youtube.com/watch?v=86ib0sfdFtw https://www.youtube.com/watch?v=l3O2J3LMxqI On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al] https://transformer-circuits.pub/2025/attribution-graphs/biology.html The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects” https://www.youtube.com/watch?v=mTX_sAq--zY Neural Turing Machine [Graves] https://arxiv.org/pdf/1410.5401 Adaptive Computation Time for Recurrent Neural Networks [Graves] https://arxiv.org/abs/1603.08983 Sudoko Bench [Sakana] https://pub.sakana.ai/sudoku/
Revolutionize Your ML Workstation
AI-recommended products based on this video

Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

MSI NVIDIA GeForce RTX 3050 Ventus 2X XS 8G OC Graphics Card - 8 GB GDDR6, 1807 MHz, PCI Express Gen 4, 128 Bits, DP v 1.4a, DL DVI-D, HDMI 2.1 (Supports 4K at 120Hz)

Asus Dual NVIDIA GeForce RTX 3050 6GB OC Edition Gaming Graphics Card - PCIe 4.0, 6GB GDDR6 Memory, HDMI 2.1, DisplayPort 1.4a, 2-Slot Design, Axial-tech Fan Design, 0dB Technology, Steel Bracket

Intel Core i9-12900KF Desktop Processor 16 (8P+8E) Cores up to 5.2 GHz Unlocked LGA1700 600 Series Chipset 125W

CORSAIR MP700 PRO 4TB M.2 PCIe Gen5 x4 NVMe 2.0 SSD – M.2 2280 – Up to 12,400MB/sec Sequential Read – High-Density TLC NAND – Black

Corsair MP600 PRO LPX 4TB M.2 NVMe PCIe x4 Gen4 SSD - Optimized for PS5 (Up to 7,100MB/sec Sequential Read & 6,800MB/sec Sequential Write Speeds, High-Speed Interface, Compact Form Factor) Black

CORSAIR iCUE Link XD5 RGB Elite LCD Pump-Reservoir Unit - D5 PWM Pump - 480x480 IPS LCD Screen - 22 Addressable RGB LEDs - 440ml Nylon Reservoir - White

CORSAIR iCUE Link XC7 RGB Elite CPU Water Block - Transparent Flow Chamber - 24 RGB LEDs - Fits Intel® LGA 1700, AMD® AM5 and Older - White

Crucial X9 4TB Portable SSD, Up to 1050MB/s, USB 3.2 USB-C, External Solid State Drive, Compatible with Windows, Mac, & Android, Reliable Storage for Games, Files, & Backups, Black - CT4000X9SSD902

Crucial X10 Pro 4TB Portable SSD, Up to 2100MB/s Read, 2000MB/s Write, 3.2 USB-C, External Solid State Drive, Durable Storage for PC & Mac, for Professional Creators, Black - CT4000X10PROSSD902

Crucial X9 Pro 4TB Portable SSD, Up to 1050MB/s Read & Write, 3.2 USB-C, External Solid State Drive, Durable Storage for PC and Mac, for Professional Creators, Silver - CT4000X9PROSSD902

Crucial T705 4TB PCIe Gen5 NVMe M.2 SSD - Up to 14,100 MB/s - Game Ready - Internal Solid State Drive (PC) - +1mo Adobe CC - CT4000T705SSD3

95MM T129215SU Cooling Fan for ASUS ROG Strix for GeForce RTX 3060 3070 3080 3090 Ti Graphics Card CF1010U12S/D T129215BU(Black A-Fan T12)

ASUS ROG Strix G16 Gaming Laptop, GeForce RTX 5070 Ti 12GB GDDR7, AMD Ryzen 9 8940HX, 64GB DDR5, 2TB SSD, Backlit Keyboard, Wi-Fi 6E, 16" WUXGA 165Hz Display, Win 11, Gray, 1TB Docking Station Set

ASUS ROG Strix G16 (2024) Gaming Laptop, 16” 16:10 FHD+ 165Hz, GeForce RTX 4060, Intel Core i7-13650HX, 16GB DDR5, 1TB PCIe SSD, Wi-Fi 6E, Windows 11, G614JV-AS71-CA

Asus ROG STRIX Z890-A GAMING WIFI Z890 LGA 1851 ATX motherboard, Intel® Core™ Ultra Series 2 Ready, Advanced AI PC-ready, 16+2+1+2 stages, DDR5, WiFi 7, 2.5G, 5x M.2, Thunderbolt™ 4, USB Type-C, AI OC

Dell 24 Monitor - SE2425HM - 23.8-inch Full HD (1920x1080) 16:9 100Hz Display, IPS Panel, 16.70 Million Colors, Anti-Glare, 1 HDMI / 1 VGA Port, TÜV Rheinland 3-Star*, Comfortview Plus - Black

Dell S2425HS Monitor - 23.8 Inch, FHD (1920x1080) Display, 100Hz Refresh Rate 1500:1 Contrast Ratio, TÜV Rheinland Eye Comfort 4 Star, Integrated 2x5W Speaker, Height/Tilt/Swivel/Pivot - Ash White

USB C Docking Station Dual Monitor for Dell Hp,15-in-1 Laptop Docking Station 3 Monitors USB C Hub with Dual 4K HDMI,8K DP,Button,PD Charging,Ethernet,6 USB A&C,SD/TF, Audio USB-C Multiport Adapter

Dell 24 Monitor - P2425H EPEAT

![Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/yq318DIwPqw/hqdefault.jpg)
![Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/pO0WZsN8Oiw/hqdefault.jpg)
![AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/9suqiofCiwM/hqdefault.jpg)
![Why Scientists Can't Rebuild a Polaroid Camera [César Hidalgo]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/vzpFOJRteeI/hqdefault.jpg)

![Why High Benchmark Scores Don’t Mean Better AI [SPONSORED]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/rqiC9a2z8Io/hqdefault.jpg)
![The Mathematical Foundations of Intelligence [Professor Yi Ma]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/QWidx8cYVRs/hqdefault.jpg)

![Tensor Logic "Unifies" AI Paradigms [Pedro Domingos]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/4APMGvicmxY/hqdefault.jpg)



![We Built Calculators Because We're STUPID! [Prof. David Krakauer]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/dY46YsGWMIc/hqdefault.jpg)
![Why Humans Are Still Powering AI [Sponsored] - Phelim Bradley](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/R11ESdfVX64/hqdefault.jpg)
![The Universal Hierarchy of Life - Prof. Chris Kempes [SFI]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/iwClZ-7OweY/hqdefault.jpg)


![AI training data will never be fully synthetic [SPONSORED]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/cnxZZTl1tkk/hqdefault.jpg)
