The MAD Podcast with Matt Turck
Episode: AI is Already Building AI | Google DeepMind’s Mostafa Dehghani
Date: April 2, 2026
Guest: Mostafa Dehghani, Google DeepMind
Host: Matt Turck
Episode Overview
This episode dives into the rapidly evolving landscape of artificial intelligence building artificial intelligence, focusing on the concepts of recursive self-improvement, looping architectures, continual learning, and the transition from human-driven to autonomous model development. Mostafa Dehghani, a leading researcher at Google DeepMind and a core contributor to Universal Transformers, Vision Transformer, and Gemini, provides expert insights into the current frontier of AI research and development. The conversation covers key technical innovations, philosophical implications, model evaluation, data, and the future of enterprise AI.
Key Discussion Points & Insights
1. The Reality of Recursive Self-Improvement ("AI Building AI")
-
Current State:
- AI models are already significantly involved in building next-generation models, using outputs from previous versions to drive progress.
- Human bottlenecks are being progressively minimized, but full automation and long-horizon improvement loops are still in development.
- "Most of the people don't realize that this is like already happening, especially over the past few months. In almost every lab, the new generation of the models are built heavily using the previous generation of the models." – Mostafa (00:00)
-
What’s Missing:
- Full automation and long-term, self-contained improvement cycles (closing the self-improvement loop).
- "What is missing right now is long horizon and full automation and we are moving to that direction super, super fast." – Mostafa (00:18)
2. Looping: The Next Paradigm in Model Improvement
-
Looping Defined:
- At inference level: Models use recursive processes to improve outputs (e.g., chain of thought, refining answers, negative sparsity).
- At developmental level: Removing humans from model improvement, letting AI iterate on architecture, data, and training frames.
- "The self improvement and this loop into development is just the next step in the same direction. The whole point of it is you’re removing the human bottleneck and bias from improving these models." – Mostafa (03:24)
-
Technical Mechanisms:
- Increasing test-time compute, reusing model components recursively, verifying solutions, and negative sparsity.
- Models internally reevaluate, adjust, or restart—just as researchers iterate their experiments.
3. Recursive Self-Improvement: From Science Fiction to Reality
-
From Vision to Practice:
- Once a science fiction concept, recursive self-improvement ("RSI") is now emerging in research and deployment.
- Projects like Karpathy's Auto-Research show AIs contributing to both research and engineering (07:32).
- "Maybe, basically, that kind of golden part of the recipe, a successful recipe that mostly coming from like intuition of a good researcher is coming to kind of these development loops by these models." – Mostafa (07:45)
-
Roadblocks:
- Evaluation: Measuring model progress and improvement is hard, especially in complex or non-formal domains (10:14)
- Model collapse: Risk of overfitting to synthetic or self-generated data, leading to loss of generalization (15:01)
- Compute: Providing sufficient resources for increasingly autonomous and demanding models
4. Model Collapse and Generalization
-
Definition:
- "When you have some sort of data and environment that these models are interacting with, but those environments and data are designed for example by another model… you become really, really good at this specific part and then suddenly you lose generalization to anything beyond that." – Mostafa (15:06)
-
Generalization vs. Specialization:
- Specialist models can accelerate progress and serve as stepping stones towards generalist AGI.
- "People don’t care what category their problem falls into... if a human calls something a problem, then AI should be able to solve it. And I think that’s fundamentally a generalist need." – Mostafa (17:45)
5. Data and Signals: The Expanding Meaning
- Beyond Tokens:
- In a world where AI self-improves, "data" evolves from static tokens to encompass any signals—sensory input, environment, interactions.
- Future value shifts towards building environments and richer real-world interaction, not just providing training data (24:09)
- “How can I make smell accessible to these models? … Data becomes like information or anything… right now I’m sitting here, I know how hard is my chair, what is the temperature of this room. All this sensory information…” – Mostafa (24:50)
6. Continual Learning vs. Self-Improvement
-
Distinction:
- Self-improvement: The model gets "smarter," improves ability.
- Continual learning: The model stays up-to-date, doesn’t go stale as the world advances.
- "Self improvement is about a model getting smarter over time and improving its capability like the model itself doing it. Continual learning is mostly about a model staying current." – Mostafa (30:06)
-
Research Status:
- Active area, not yet production-ready; catastrophic forgetting is a main challenge (31:53).
7. Pre-training and Post-training: Two Modes, Both Important
-
Trends:
- Post-training (especially RLHF) currently yields strong short-term gains.
- Pre-training is foundational; "pre-training isn’t dead,” but its techniques evolve rapidly (28:14–28:23)
-
Insight:
- "You can never post train your way out of a weak base model... At the end of the day I always expect to circle… between post training and pre training." – Mostafa (26:50)
8. Modalities and Multimodal Models (NanoBanana, Gemini)
-
NanoBanana & Visual Transformers:
- Gemini is natively multimodal and can process & generate text, image, and more—breaking down boundaries between modalities.
- "You could apply a transformer architecture to image, wherein in the past you had two different families, you had the CNN world and the transformer world for text. And your breakthrough was to prove that transformer could scale equally well to images… paved the way to a Gemini 3 today." – Matt (42:56)
-
Technical Approach:
- Simplicity in splitting images into 16x16 patches and feeding to Transformer (40:13).
- Not just “translation”—models can now reason and generate imagery in a stepwise, interleaved fashion, leading to more flexible and powerful image generation (47:50).
-
Efficiency gains:
- Model size, MOE (Mixture of Experts), distillation, and infrastructure all contribute to “flash” speedups in NanoBanana 2 (53:04).
Notable Quotes & Memorable Moments
-
"Every time that we removed human judgment from this process, we kind of got over a bottleneck… I would say the self improvement and looping over the development is kind of like doing that at the highest level."
— Mostafa (03:23) -
"If I want to put it very simply, it is really just the continuation of the trend that we've been riding for decades… it's the same story, just a new chapter of the same story."
— Mostafa (02:19) -
"100%, at the end of the day, you can only improve what you can measure. And then getting evaluation is just hard."
— Mostafa on evaluation as a bottleneck (10:14) -
"Model collapse mainly happens when you have a loop that is completely closed, right? … There's a good chance that your model collapses. But if you have a strong verifier or some sort of a real reward signal that anchor this kind of signals… it can be quite powerful."
— Mostafa (14:11) -
"Short term, I would say building a specialist model is probably the fastest way to learn what is actually possible. And in many cases these specialized models are becoming stepping stone toward a generalist model, which is super valuable, right?"
— Mostafa (16:40) -
"Sometimes I say like, 'Oh, this is going to happen in six months.' Never happened. Sometimes like, 'Oh, this is just so hard… absolutely no chance to solve it.' And then boom, in two months, three months, someone had a brilliant idea and they solve it. So it's like really hard to predict the future."
— Mostafa (21:20) -
"Data becomes more about, okay, how can I give access to this specific model to something that we never had? For example… How can I make like smell accessible to these models?"
— Mostafa (24:50) -
"Pre training isn’t dead... the way that we used to do pre training maybe like a year ago or two years ago, maybe diminishing return is obvious. But I can see how new ideas are bringing fresh energy into the pre training and suddenly just open a door toward… something exotic."
— Mostafa (28:14)
Deep Dives into Must-Listen Segments
-
Recursive Self-Improvement Explained
[05:29–09:38]- How labs are already using AI to engineer next-generation AI.
- Differences between partial automation and the full, closed self-improvement loop.
-
Model Collapse and Generalization
[15:01–17:45]- What model collapse is, how tight feedback loops can cause it, and striking the right balance between specialization and generalization.
-
Continual Learning and Its Industry Impact
[30:06–33:42]- Core definitions, challenges (e.g., catastrophic forgetting), and why current enterprise AI pipelines (e.g., RAG) may be disrupted by continual learning.
-
Technical Origins Stories: Universal Transformer & Vision Transformer
[33:55–43:46]- The story of choosing to intern with the "random" Transformer team, pioneering recursive deep learning architectures, and transforming multimodal AI.
-
NanoBanana/Gemini and Image Generation Breakthroughs
[43:46–53:04]- How natively multimodal models learn, generate images, and why this is fundamentally different from prior "translate and draw" approaches.
- Incremental and interleaved image and text generation capabilities.
-
Hot Takes and the Future
[54:51–63:57]- What's being underestimated (jagged intelligence), what's underrated (continual learning), the fate of RAG, the need for new definitions of intelligence, and the hard math of reliable long-horizon automation.
Timestamps for Key Topics
| Time | Segment/Theme | |----------|---------------------------------------------------------------| | 00:00 | AI models are already building AI; closing the self-improvement loop | | 01:33 | What "thinking in loops" means in AI | | 05:29 | Recursive self-improvement as an emerging reality | | 07:32 | Karpathy's Auto-Research and AI in research | | 10:01 | Roadblocks: compute, evaluation, philosophical challenges | | 14:11 | Model collapse: risks and mitigations | | 16:40 | Spectrum of generalization vs. specialization | | 24:09 | Data as signals and environment; future of interaction | | 26:50 | Pre-training vs. post-training; short vs. long-term gains | | 30:06 | Continual learning explained and contrasted with self-improvement | | 33:55 | Mostafa's personal journey: internships, Universal Transformer | | 40:13 | Vision Transformer (ViT): scaling images with transformers | | 43:46 | NanoBanana/Gemini: how natively multimodal models work | | 53:04 | Speed and efficiency innovations in modern image models | | 54:51 | Hot takes: what the field misses, underrated and overrated trends | | 60:09 | The hardest and most exciting problems ahead |
Final Thoughts and Takeaways
- Looping and recursion are here to stay and will likely yield dramatic advances as human bottlenecks give way to autonomous improvement cycles.
- Model evaluation and ensuring generalization remain open, philosophically challenging problems with major practical implications.
- Continual learning and new definitions of “data” will disrupt today's enterprise data pipelines and evaluation paradigms.
- Deep multimodality and incremental, agentic reasoning are unlocking new capabilities—models that reason in both images and text, and plan their outputs.
- The future of AI may belong to models that are grounded in the real world, able to update and self-improve as the world changes, and guided by ever more sophisticated evaluation metrics and philosophical clarity.
Closing Quote
"At the end of the day, we really need a systematic way of maybe defining intelligence. That is hard. And again, making progress based on what we have right now is good, but at some point that becomes a little bit more important to really pinpoint what is the target and what is the goal and then push toward that with maximum speed."
— Mostafa (63:26)
