The MAD Podcast with Matt Turck
Episode: Open Source AI Strikes Back — Inside Ai2’s OLMo 3 ‘Thinking’
Date: November 20, 2025
Guests: Nathan Lambert & Luca Soldeny (Allen Institute for AI)
Host: Matt Turck
Episode Overview
In this episode, Matt Turck welcomes Nathan Lambert and Luca Soldeny from the Allen Institute for AI (AI2) to discuss the launch of the Olmo 3 family of open-source AI models. The panel explores the current landscape of open source AI, contrasting US and Chinese efforts, and gives an unprecedented walkthrough of Olmo 3’s full transparency: from open weights to open data, recipes, and intermediate checkpoints. The discussion dives deep into model architecture, training processes, the significance and inner workings of "thinking models,” and the broader context of global AI competition.
Key Topics & Discussion Points
1. The Shift in Open Source AI Leadership
(00:00–00:39)
-
Nathan Lambert notes a significant change in the "balance of power" in open source AI:
“There was a big change in leadership at Meta and Llama's future is unknown...there’s this big vacuum of influence which has been absorbed by the likes of Quen, Deepsea, Kimmy, Moonshot...” (00:00)
-
The emergence and growing influence of Chinese open-source model labs.
2. Olmo 3: What's New & What's Open?
(01:27–03:36)
-
Announcement:
“We’re launching Omo3 family today. So this is our latest family of open source models...we’re releasing the entire recipe we follow to get this model. So the data, the intermediate states, the evaluation frameworks, all the details...”
— Luca Soldeny (01:27) -
Model lineup:
- 7B base
- 32B base
- 7B and 32B 'think' (reasoning) models
- 7B instruct (low latency, instruction-following) model
-
Transparency:
“It’s the first fully open reasoning model where we show doing RL, base models, and distilling from bigger thinking models. And there’s a lot of discussion within the US that there’s good reason we should own the whole technological stack and that includes open models.”
— Nathan Lambert (00:22)
3. Deep Dive: Model Checkpoints and Their Use
(02:14–03:36)
- Base models prior to instruction fine-tuning—ideal for customization.
- Reasoning (“think”) models capable of sophisticated multi-step inference.
- Instruct model for faster responses and lightweight deployment.
“These are models that can spend compute at inference time to sort of think through a problem and solve it and then give you an answer at the end.”
— Luca Soldeny (02:14)
- True openness means releasing not just final weights, but also:
- All data used
- Evaluation frameworks
- Intermediate training checkpoints
4. What Makes Olmo 3 Stand Out?
(03:36–05:48)
-
Benchmarking:
“This base model is similar in quality to the best available, which is like Quen 2.5-32B... the upside is we have all the data so people can actually do some sort of continued pre training and... modify and understand the behavior.”
— Nathan Lambert (03:36) -
Claims Olmo 3 7B outperforms Llama 3 8B on certain tasks, and stands among best worldwide in standard size categories.
-
All datasets and post-training recipes are open, enabling continued innovation and reproducible research.
5. Data for Training: DOLMA3
(05:48–08:07)
- DOLMA3: Custom pre-training dataset:
- 10T token pool, 6T sampled using a published open-source algorithm for optimal document repetition.
- Mid-training phase leverages focused datasets leading to better reasoning/math performance.
- Significant collection of long-context documents (PDFs)—over 600B tokens of >8,000 tokens length.
“Historically of the data that is available openly out there for people to build a language model, you don’t have a lot of long document data...these are really good for people to develop other ways for models to understand very long inputs...”
— Luca Soldeny (05:53)
6. Performance & Efficiency
(08:07–10:25)
- Base models are at parity with leading models like Quen 2.5 and Gemma 3.
- Instruction-tuned and reasoning models are directly competitive with Quen 3 and outperform many in specific tasks.
- Open-source enables rapid iteration and encourages others to build on Olmo 3.
7. What Does “Fully Open” Mean? Flavors of Openness
(10:25–12:51)
- Most open-source releases are actually “open weights”—no recipes, data, or intermediate states.
- AI2’s ambition: if it can be released, it is released—including all data, training steps, and intermediate checkpoints.
- These allow deeper research and easier model customization.
“If we can release it, we will release it...if people ask, hey, you described this part of your pipeline, but you haven’t put it out, we’ll release that part as well.”
— Luca Soldeny (10:38)
8. The 2025 Open Source Model Race: US vs China
(12:51–18:32)
- DeepSeek’s 2025 breakthrough sparked a surge of Chinese models.
- “Quen is widely used in a way that people may not have completely realized.”
— Matt Turck (16:37) - US companies, wary of using Chinese models for strategic and trust reasons, are seeking homegrown alternatives.
“A meaningful amount of people are trying open models for things and most of them are using Quen.”
— Nathan Lambert (16:55)
9. Economic & Cultural Ecosystem Differences
(18:32–20:22)
- US companies pay for services and APIs (leading to closed, commercial models).
- Chinese companies more likely to release open models to achieve distribution and influence, knowing US companies are reluctant to buy closed-source APIs from abroad.
- The “DeepSeek Standard” in China: open releases become the norm.
10. The Atom Project and Organized US Response
(20:22–21:45)
- Atom: a US multi-stakeholder initiative for open models—media interest is surging as industry adapts.
11. What Are “Thinking Models”?
(22:11–24:16)
- Trained to use more inference-time compute, resulting in multi-step, highly capable outputs.
- Key for tasks requiring reasoning, math, code generation, and future tool use/agentic behaviors.
- Distinction vs. instruction-tuned models: slower generation, more powerful answers.
“A thinking model is really a way to train the model to exploit [inference time scaling]... the model therefore kind of has a step change where it’s way better at math tasks, coding tasks, agentic tasks.”
— Nathan Lambert (22:33)
12. AI2’s Origins, Mission, and Projects
(24:16–35:55)
-
History: Paul Allen founded AI2 in 2014 to build AI systems for science.
-
Projects:
- Olmo (language/model family),
- Tulu (post-training suite),
- Asta (agentic tools for scientific tasks),
- Malmo (multimodal/robotics),
- AI for the Environment.
-
AI2’s size: Approx. 200 staff (research, engineering, comms, support).
-
Both guests drawn by the impact and openness of working at AI2 vs. closed, commercial labs.
13. Full Olmo 3 Pipeline Walkthrough
(37:11–78:12)
1. Pre-Training (47:00)
- Massive scale, methodical preparation.
- 6T token corpus, deduplication, domain-specific sampling.
2. Mid-Training (50:26)
- “Tail patching:” targeted data injection to correct knowledge gaps, without leaking test data.
3. Long-Context Extension (52:06)
- Extends model’s context from 4k/8k up to 65k or more—vital for reasoning chains and processing large inputs.
- “If you set up your model wrong, you’re never going to recover it.”
— Luca Soldeny (53:18)
4. Supervised Fine Tuning (SFT) (55:28)
- Distillation from stronger, larger models—especially Chinese models like Quen and DeepSeek—for reasoning, math, and code expertise.
- Data generated/sourced includes 2.5 million reasoning traces.
5. DPO/Preference Tuning (65:02)
- Direct Preference Optimization: pairs of responses rated to guide model output ranking.
- Data generated by sampling diverse strong models to ensure meaningful deltas.
“Sometimes there’s low hanging fruit and doing the obvious thing yields a lot of results...then this RL stage is extremely hard technical grinding.”
— Nathan Lambert (65:02)
6. Reinforcement Learning with Verifiable Rewards (RLVR) (71:18)
- Applies RL using objective/automatic reward signals (correct answers, passed code tests).
- Solving hard engineering problems around distributed training and high-memory, long-context generation.
“RLVR, verifiable rewards is in the name...the reward you get...is whether or not you got the problem right.”
— Nathan Lambert (73:14)
14. Complexity and the Reality of AGI Progress
(78:12–87:26)
- Despite sensational headlines, true progress is incremental and engineering-intensive.
- Key Quotes:
- “Pre-training is scientific, post-training is the wild west.”
— Paraphrased from Nathan Lambert (78:12) - “I definitely describe myself as lightly AGI pilled…but I’m very far from believing in any sort of singularity…all these things are plateauing and you 10x the compute and you get a big jump, but you can’t do this forever... There’s going to be some physical constraint that kicks in at some point...”
— Nathan Lambert (80:01) - “People are going to be disappointed if they want to see a moment where like one day they log into Twitter and AGI is there. Like it's messy and it's fun working on it because it's messy.”
— Luca Soldeny (83:51)
- “Pre-training is scientific, post-training is the wild west.”
- Consensus: AGI may be achieved through current paradigms, but as a gradual, non-singular leap.
Notable Quotes & Moments
-
On openness:
“If we can release it, we will release it.”
— Luca Soldeny (10:38) -
On the importance of open data and intermediate checkpoints:
“Now we have intermediate checkpoints during our supervised fine tuning for reasoning and for instruct and then also for our multi-day RL rounds. …this is all there.”
— Nathan Lambert (12:27) -
On growing Chinese influence:
“A meaningful amount of people are trying open models for things and most of them are using Quen.”
— Nathan Lambert (16:55) -
On the challenging reality of building new AI capabilities:
“You’re building the tracks as the train is going down at incredible speeds, and you have to figure out ways to fix some parts of your pipeline so you can work on the rest.”
— Luca Soldeny (63:24) -
On the future of AGI:
“I have very high probability, barring extreme geopolitical situations, that Big Tech executes on this vision across the two to five years to build 95 to 98% of the way there of what you can do with our physical power constraints and what an LLM’s ability is. ...Whether debating whether or not it’s AGI is kind of secondary to the fact that this is coming and we want people to study and understand what is happening.”
— Nathan Lambert (84:15)
Timeline of Key Segments
| Timestamp | Topic | |-----------|------------------------------------------------------------------------------------------------| | 00:00 | Industry shift, Llama's uncertain future, rise of Chinese open models | | 01:27 | Olmo 3 launch, openness beyond weights | | 05:53 | DOLMA3 dataset, emphasis on long-context data | | 08:07 | Model performance & benchmarking | | 10:25 | What “open” really means in AI | | 12:51 | Recap: 2025 and rise of China's open source surge | | 16:37 | US/China ecosystem and business differences | | 18:32 | Economic drivers for open vs. closed model release | | 20:33 | Atom project and US response | | 22:11 | What are "thinking models"? | | 24:51 | AI2 history and mission | | 37:11 | Full Olmo 3 pipeline: from pre-training to RLVR | | 53:32 | Long context extension explained | | 61:36 | Supervised fine tuning, tracing, and teacher models | | 65:02 | DPO/preference tuning and its impact | | 71:18 | Reinforcement learning with verifiable rewards (RLVR) and system challenges | | 80:01 | The complexity tax of AI, reality vs. AGI hype | | 83:51 | Gradual progress; why there may never be a sharp AGI "singularity" moment | | 84:15 | Big Tech's trajectory vs. AGI definitions | | 85:29 | Importance of broad public engagement, policy, and non-technical contributions to AI progress | | 87:26 | Closing thoughts: open source, impact, and needing “many more hands” to shape the future |
Conclusion
The episode provides a rare, transparent look into the making and ambitions of a cutting-edge open-source AI project. Olmo 3 is more than just another model release—it embodies a commitment to full openness and scientific reproducibility, opening new possibilities for the global research community. The conversation also brings sober clarity on the realities and limits of AI progress, emphasizing the necessity of community, infrastructure, and open collaboration as the pace of innovation accelerates worldwide.
This summary is designed to be comprehensive, detailed, and faithful to the conversational style of the episode, enabling those who haven’t listened to quickly grasp the depth and highlights of the discussion.
