Dwarkesh Podcast – Thoughts on AI Progress (Dec 2025)
Host: Dwarkesh Patel
Date: December 23, 2025
Episode Overview
In this episode, Dwarkesh shares an audio narration of his blog post, "Thoughts on AI Progress (Dec 2025)." He critically analyzes recent advances and challenges in AI, especially around reinforcement learning (RL) atop large language models (LLMs), the trajectory toward AGI (Artificial General Intelligence), and the gap between impressive benchmarks and true economic impact. Dwarkesh also assesses common arguments about AGI timelines, discusses current industry practices, and projects plausible future developments in continual learning and generalization.
Key Discussion Points and Insights
1. Short Timelines vs. RL Scaling: The Contradiction
-
Many expect rapid AGI development (short timelines) while also investing heavily in RL to teach LLMs practical skills (e.g., using web browsers, Excel).
-
Critique:
- If AGI is imminent, training for verifiable, narrow outcomes seems pointless—human-like learners would figure things out themselves, much like people do "on the job."
- The existence of a vast RL "supply chain" suggests that true AGI is not close, as current models require explicitly pre-baked skills.
-
Notable quote:
"If we're actually close to a human like learner, then this whole approach of training on verifiable outcomes is doomed." (00:00)
-
Reference to Baron Milledge:
- Benchmark improvement is not just computation and algorithmic advances; it’s also the product of massive investments in curated data and expert labor.
2. The Limits of RL for Generalization
- Robotics example: If true human-like learning existed, robotics would be mostly solved algorithmically. However, AIs currently require massive data collection for every micro-task (e.g., dishwashing, laundry) because they lack robust generalization.
- Argument from AGI optimists: Building a "superhuman AI researcher" via RL is meant to eventually solve robust, efficient learning from experience—but Dwarkesh finds this notion impractical.
- Labs’ actions (pre-baking consultant skills, PowerPoint expertise, etc.) betray little faith in broad generalization; implies models will continue to struggle unless trained on specific, economically useful skills.
3. Context-Specific Skills and On-the-Job Learning
-
Company, task, and even day-to-day specifics matter: Most work requires adaptation; humans excel because they don't need hundreds of custom training loops.
-
Anecdote: Dinner with an AI researcher and a biologist who described a micro-task (identifying macrophages in slides), with the AI researcher suggesting it’s solvable via deep learning. Dwarkesh argues this misses the crux: economic value in human work lies in generalization and context-adapted judgment.
-
Notable quote:
"It's not net productive to build a custom training pipeline to identify all what macrophages look like... for every lab-specific micro task and so on." (12:20 approx.)
4. AGI Economic Impact: What's Missing?
-
Present models’ lack of AGI-level generalization and learning capabilities means they cannot provide firm-wide or economy-wide value as true AGI could.
-
Diffusion dynamic:
- If AGI existed today, adoption would be fast—AI "employees" could onboard instantly and scale immediately. Unlike hiring humans (which faces lemons-market issues), AGI deployment would be low-risk and fast.
-
Economic analogy:
- Human knowledge work is worth tens of trillions of dollars annually; AI labs are nowhere near capturing that value, suggesting a significant capability gap remains.
-
Notable quote:
"If the capabilities were actually at AGI level, people would be willing to spend trillions of dollars a year buying tokens that these models produce." (18:00 approx.)
5. Shifting Goalposts and the Nature of AI Progress
-
Past AI criteria (reasoning, few-shot learning, pattern recognition, etc.) were surpassed, yet "AGI" still elusive.
-
Justification for goalpost shifting: Real-world results have revealed intelligence is more complex than expected; new capabilities (X, Y, and Z) are recognized as necessary.
-
Notable quote:
"I think it's totally reasonable to look at this and say, oh, actually there's much more to intelligence and labor than I previously realized." (22:00 approx.)
-
Dwarkesh predicts labs will keep making impressive progress (possibly hundreds of billions in revenue by 2030) but will still fall short of full automation of knowledge work.
6. Scaling Laws: Pretraining vs. RL from Verifiable Reward
-
Pretraining (self-supervised scaling) has shown highly predictable improvements, akin to physical laws.
-
RL from verifiable reward, which current bullish projections rely on, shows no similarly strong or public scaling trends.
-
Reference to Toby Board: To get RL improvements on par with a single GPT leap, a millionfold increase in RL compute may be necessary.
-
Notable quote:
"People are trying to launder the prestige that pre-training scaling has... to justify bullish predictions about reinforcement learning from verifiable reward for which we have no well fed publicly known trend." (32:00 approx.)
7. The Real Driver: Continual Learning and Collective Experience
- Speculates that transformative progress will come from continual learning—AI models that, like humans, improve chiefly through relevant experience and sharing that experience across "copies."
- Reference to Baron Milledge and Karpathy:
- Envisions future with specialized agents contributing back to a core "hive mind," distilling collective experience.
- Progress won't be sudden or a "one-and-done" event; continual learning will likely mirror the incremental improvements seen in in-context learning since GPT-3.
8. The Competitive Landscape and AI Lab "Flywheel"
- Expectation that breakthroughs (like continual learning) will quickly spread, as competitor labs replicate and iterate.
- Observes a persistent, strong competition among major labs—any would-be runaway advantage is neutralized by reverse engineering, talent poaching, or simple competitive dynamics.
Notable Quotes
-
Dwarkesh Patel:
"If we're actually close to a human like learner, then this whole approach of training on verifiable outcomes is doomed." (00:00)
-
Baron Milledge (as quoted):
"...not just about the increased scale and the clever ML research ideas, but the billions of dollars that are paid to PhDs, MDs and other experts to write questions and provide example answers and reasoning..." (01:30 approx.)
-
Dwarkesh Patel:
"It's not net productive to build a custom training pipeline to identify all what macrophages look like... for every lab-specific micro task and so on." (12:20 approx.)
-
Dwarkesh Patel:
"If the capabilities were actually at AGI level, people would be willing to spend trillions of dollars a year buying tokens that these models produce." (18:00 approx.)
-
Dwarkesh Patel:
"I think it's totally reasonable to look at this and say, oh, actually there's much more to intelligence and labor than I previously realized." (22:00 approx.)
-
Toby Board (as referenced):
"...we need something like a million x scale up in total RL compute to give a boost similar to a single GPT level." (34:30 approx.)
-
Dwarkesh Patel:
"People are trying to launder the prestige that pre-training scaling has... to justify bullish predictions about reinforcement learning from verifiable reward for which we have no well fed publicly known trend." (32:00 approx.)
Timeline of Important Segments
- 00:00 – Introduction: The contradiction in short AGI timelines and RL scaling.
- 03:00 – Baron Milledge's critique: Benchmark improvement and data curation efforts.
- 07:30 – Robotics as a test case of generalization problems in AI.
- 12:00 – Anecdote: AI researcher and biologist debate micro-task learning.
- 16:30 – The true value of human-like on-the-job skill acquisition.
- 18:00 – Why true AGI would instantly transform economic outcomes.
- 21:30 – Goalpost moving: Justified and necessary as AI’s capabilities grow.
- 27:00 – Projected trends for 2030: Impressive progress, partial automation.
- 31:00 – Pre-training’s predictable power law vs. RL’s murky returns.
- 34:30 – Toby Board’s millionfold RL compute estimate.
- 36:00 – The future: Continual learning, specialized agents, and collective experience.
- 40:00 – Competitive AI landscape: No runaway lab victories anticipated.
Episode Takeaways
- Progress toward AGI is limited by models’ current inability to generalize and learn on the job, as humans do.
- Economic transformation from AI will require much more than incremental RL-taught skills or scaling up current LLMs.
- Observed industry behaviors (custom skill pre-training, slow adoption) contradict short AGI timelines.
- True breakthroughs will likely come through continual learning and the collective experience of adaptable, specialized agents.
- Even so, progress will be incremental, with fierce competition ensuring knowledge rapidly diffuses across labs.
For essays and more, visit: www.dwarkesh.com
(Summary covers main content only; intro, ads, and outros are omitted.)
