The MAD Podcast with Matt Turck
Episode: DeepMind Gemini 3 Lead: What Comes After "Infinite Data"
Date: December 18, 2025
Guest: Sebastien Bourgeau, Pretraining Lead for Gemini @ Google DeepMind
Host: Matt Turck
Overview of the Episode
This episode dives deeply into the making and implications of Google DeepMind’s Gemini 3, one of the most advanced AI models to date. Sebastien Bourgeau, pretraining lead for Gemini 3, provides inside perspectives on team structure, research trends, the shift from “infinite data” to “data limited” regimes, architectural decisions, synthetic data, future research directions, and his own storied journey into the heart of AI innovation. The conversation is both technical and accessible, reflecting on what makes leading-edge AI work—and where it’s headed next.
Key Discussion Points & Insights
1. The “Secret” Behind Gemini 3’s Leap
Timestamp: 00:58 – 03:04
-
Not One Trick, But Many "Knobs":
Bourgeau emphasizes that Gemini 3’s improvement over previous models isn’t the result of a single radical change, but instead comes from the aggregation of numerous small and medium advancements, all delivered by a large, coordinated team.“It’s really a culmination of many, many changes and many, many things from a very large team that actually makes Gemini 3 so much better than the previous generations…” (01:31)
-
From ‘Model’ to ‘System’:
The distinction is made between building AI models and now constructing AI systems, reflecting the complexity of modern state-of-the-art work.
2. Measuring Progress: Intelligence or Benchmarks?
Timestamp: 03:04 – 04:36
- Beyond Benchmarks:
While the ability of models to pass increasingly difficult benchmarks is important, internal usage and productivity gains are now a major confidence booster for real capabilities.“...the amount of time people spend using the model to make themselves more productive internally is increasing over time. Every new generation...can do new things and help us in our research...much more so than the previous generation.” (03:35)
3. The Pace of AI Progress
Timestamp: 04:36 – 06:35
-
Faster Than Expected:
Bourgeau openly admits that the current progress has exceeded his own and peers’ expectations from a few years ago.“If I’m being honest with myself, I think we’re ahead of where I thought we could go.” (04:56)
-
Looking Forward:
He anticipates “large scientific discoveries” by AI models within the next few years and is excited about models aiding and accelerating both research and engineering.
4. AI for AI: Automation & Productivity
Timestamp: 06:35 – 07:45
- Not Full Automation, Yet:
The focus is on AI augmenting and accelerating researchers and engineers, rather than fully automating their work. Expectations are for even faster iterative cycles thanks to AI-assistance at higher levels of problem formulation and experimentation.
5. Industry Race & Differentiation Among Labs
Timestamp: 07:45 – 10:19
-
Common and Divergent Paths:
Labs train on similar architectures (e.g., Transformers), but specialize in different research branches. DeepMind, for instance, excels in multimodal and vision capabilities. -
Team Scale:
Building a leading model now requires hundreds of researchers, not just small teams; nevertheless, there's potential for surprising disruption from smaller teams if innovation reduces the resource demand.
6. The Power of Full Stack Integration
Timestamp: 10:51 – 12:01
-
Research ↔ Engineering Blur:
At Google DeepMind, the traditional boundaries between research and engineering have dissolved, with large-scale, reliable infrastructure now inseparable from cutting-edge research. -
Own Chips, Own Infra:
Gemini 3 was trained on TPUs, not Nvidia chips, reflecting true vertical integration.
7. Gemini Team Structure & Lead's Role
Timestamp: 12:01 – 13:39 & 25:51 – 26:13
-
Large, Highly Coordinated Teams:
150–200 people work on Gemini pretraining alone, across data, models, infrastructure, and evaluation. Bourgeau’s job is as much about integration and team enablement as it is about research design. -
Team Structure:
Organization into pretraining, post-training, and alignment teams; internal evaluations are crucial and growing in sophistication.
8. Bourgeau’s Personal Journey
Timestamp: 13:39 – 18:06
-
Cross-European Upbringing, Serendipitous DeepMind Entry:
Moved from the Netherlands to Switzerland and Italy; chose Cambridge on a whim; joined DeepMind via a fortuitous referral after university. -
Early Projects:
Started in reinforcement learning but quickly pivoted to work with real-world data and large language models (LLMs)—motivated by practicality and impact. -
Representation Learning & Big Model Scaling:
Participation in Gopher, Chinchilla (the critical finding that data scale is more important than previously thought), and Retro (architectural innovation enabling retrieval-augmented models).
9. “Research Taste” & The Art of Systems Research
Timestamp: 20:29 – 22:05
-
Integration Over Standalone Success:
Research ideas must play well with others; robust progress is driven not by maximizing individual benchmarks but by system cohesion and simplicity.“Your research has to play well with everyone else’s research and has to integrate.” (20:40)
-
Allergic to Complexity:
Preference for lower-complexity, maintainable improvements even at a small cost to immediate performance, anticipating long-term progress.
10. The Shift: Infinite Data → Data-Limited Regime
Timestamp: 33:07 – 36:16
-
Synthetic Data’s Limits & Shifting Paradigm:
While synthetic data is researched heavily, its effectiveness is nuanced. The big change is cultural: models are now leaving the “unlimited data” world for a “finite data” regime, affecting research strategies.“...kind of a shift in paradigm, where before we were kind of scaling in the data unlimited regime and we’re kind of shifting more to a data limited regime, which actually changes a lot of the research...” (33:35)
-
Model Advances Now Drive Efficient Learning:
Model architecture now aims to either “do more with less” or get the same quality with less data; nonetheless, there’s still a gap between the data hunger of LLMs and how children learn.
11. Architectural Details: Mixture-of-Experts (MoE) & Multimodality
Timestamp: 26:50 – 30:03
-
MOE at the Core:
Gemini 3 has a transformer-based mixture-of-experts architecture, dynamically routing computation across experts for efficiency and power. -
Native Multimodality:
All modalities (text, images, audio, video) are processed by the same model. This increases complexity and compute cost, but the unified approach is deemed critical for advanced capabilities.
12. Continued Relevance of Scaling
Timestamp: 30:03 – 32:42
- Scaling Laws Not Dead:
Contrary to “death of scaling” claims, scale still matters in pretraining, but architecture and data innovation increasingly dominate model performance.“Scale is a very important aspect ... but...architecture and data innovation...probably even more so than pure scale these days.” (30:42)
13. Key Challenges in Synthetic Data and Evals
Timestamp: 33:07 – 35:39 & 39:21 – 41:48
-
Synthetic Data Usefulness and Traps:
Using strong models to generate synthetic data for smaller models is common, but generating synthetic data that improves a future stronger model is an open challenge. -
Internal Evals are Crucial:
External benchmarks quickly become “contaminated” and must be replaced with carefully held-out, internally created evals to maintain reliable measurement.
14. Ongoing Research Themes
Timestamp: 37:18 – 38:40 & 48:17 – 49:32
-
Long Context Windows:
Increasing model context length allows for handling more complex tasks (e.g., multi-file codebases). -
Attention Mechanism Advances:
Recent breakthroughs on the attention side are anticipated to shape coming research. -
End-to-End Retrieval and Search:
The future likely lies in making retrieval part of the differentiable pretraining process, not just a post-training add-on—“learning search” within the model, albeit this is still an emerging area.
15. Alignment & Data Curation
Timestamp: 42:28 – 43:32
- Alignment Is Mostly Post-Training:
But some elements affect pretraining, especially in data selection:“At a fundamental level you do need the model to know about those things. So you have to train a bit, at least on those, so that it knows ... to stay away from those.” (43:07)
16. Chain-of-Thought, Agentic Models, and Vibe Coding
Timestamp: 43:45 – 46:26
-
Model “Thinking” Now Explicit:
New research (e.g., deepthink, agentic systems) allows models to explicitly generate "thoughts," hypotheses, and tool calls before answering, moving beyond simple token streaming to a more deliberative, agentic mode. -
Vibes & Subjective Feel:
The elusive concept of “Vibes” (model “feel” or personality) is influenced more by pretraining than post-training, though opinions differ widely.
17. The Future: Continual Learning & Practical Concerns
Timestamp: 46:26 – 49:32
-
Continual Learning:
Desire to update models incrementally as world knowledge shifts—currently best approached via retrieval and long-context, with true continual/streaming pretraining as a research frontier. -
Cost & Efficiency at Deployment:
With usage exploding, serving costs and inference efficiency are chief concerns for pretraining teams as well.
18. Startup & Career Advice
Timestamp: 49:32 – 53:35
-
System-Minded Researchers Have a 'Superpower':
The biggest skill gap is blending research acumen with systems (hardware, infra) knowledge. -
Beware Niche Models... Generalist Models Are Catching Up:
Many tasks that required specialized models are rapidly being absorbed by ever more capable generalist models.
19. Personal Motivations & Team Dynamics
Timestamp: 53:40 – 54:28
- Collective Brilliance:
Bourgeau stays motivated by daily collaboration with world-class colleagues and the non-stop march of progress; he sees no sign of a slowdown.
Notable Quotes & Memorable Moments
-
On Team Collaboration:
“Being able to get progress out of everyone is really what makes us make the most progress, rather than enabling maybe one or two or a small group of 10 people to run ahead of everyone else...” (12:26)
-
On Research Taste:
“Being allergic to complexity...we have a certain budget of complexity we can use...so oftentimes, we don’t necessarily want to use the best performance version...but rather trade off some of the performance for a slightly lower complexity version because we think that will allow us to do more and more progress in the future.” (20:40)
-
On AI Acceleration:
“If we had a lot more compute, I think we’d make a lot more progress a lot quicker.” (22:05)
-
On Fast-Changing Industry:
“Now, kind of believe that for generalist task or tasks which...don’t require extremely specialized models, trying to use a generalist model...the next version might be able to do that.” (51:48)
-
On the Continuing Journey:
“There are just so many different things that will compound and different things where there’s headroom to improve. I’m really curious because right now I don’t really see an end in sight...” (53:40)
Timestamps for Major Segments
- 00:58 — Gemini 3’s “secret” and leap over predecessors
- 03:35 — Measuring real intelligence versus benchmarks
- 04:56 — Are we ahead or behind expectations in AI progress?
- 06:59 — AI augmenting AI research; productivity acceleration
- 07:45 — Industry race: differentiation and convergence of labs
- 12:01 — Gemini team structure & pretraining lead responsibilities
- 13:39 — Bourgeau’s personal background and DeepMind journey
- 18:10 — Chinchilla & Retro: critical research on scaling and retrieval
- 20:29 — “Research taste” and the art of productive, integrated research
- 26:50 — Gemini 3 architecture deep dive: mixture-of-experts, multimodality
- 30:42 — Are scaling laws still valid?
- 33:35 — Synthetic data, data-limited paradigms
- 39:21 — The challenge of evals for large models
- 42:28 — Team structure: pretraining, post-training, alignment
- 43:45 — “Deepthink” & agentic, chain-of-thought models
- 46:26 — Continual learning as a future paradigm
- 49:32 — Career advice: the importance of systems skills
- 51:48 — Over-invested areas: generalist models supplanting niche approaches
- 53:40 — What excites Bourgeau about the future—collaboration and compounding progress
Conclusion
This is a must-listen episode for anyone interested in how cutting-edge AI models are truly built. Bourgeau provides rare, grounded insights into the realities of large-team, system-level research, the direction of the broader AI field, and the enduring importance of research taste, simplicity, and systems engineering. Gemini 3 is presented not as a stroke of genius, but as a testament to relentless, cumulative progress, a reminder that in AI, the future will likely be built one carefully considered tradeoff at a time.
