Podcast Summary

How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era

Podcast: AI + a16z
Host(s): Martin Casado (a16z)
Guest: Omar Khattab (Assistant Professor, MIT; Creator of DSPY)
Date: November 18, 2025

Episode Overview

This episode explores the evolution of foundation models within artificial intelligence, delving into how the field has shifted from simply scaling language models to a more nuanced focus on building programmable, composable “AI systems” rather than monolithic “models.” Martin Casado interviews Omar Khattab, who shares insights from his research and development of DSPY, a leading open-source framework for prompt optimization and system abstraction for LLMs. The conversation challenges both the mythology of "model God" and the utility of AGI as a north star, advocating for richer interfaces and abstractions enabling human intent to be specified rigorously, but flexibly, in AI systems.

Key Discussion Points and Insights

1. Why Intelligence Alone is Not Enough

Omar Khattab opens with a contrarian view:
- “Nobody wants intelligence, period. I want something else, right? And that something else is always specific, or at least more specific...if you over engineer intelligence, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better.” [00:00]
- The real problem isn’t simply scaling intelligence, but how to specify what we want the AI to do.
Historical context:
- Early AI (2019-2022) focused on scaling up models (e.g., BERT) with the notion that more parameters and data would bring generality and power.
- The field now recognizes that this “pretraining + scale” playbook is incomplete; new “human-designed, carefully constructed pipelines for post-training” are now common, focusing on retrieval, web search, tool use, agent training, and post-processing (see [00:00]-[01:14]).

2. Reframing the AGI Debate: API over AGI

Khattab is skeptical of AGI as a meaningful goal:
- “AGI is fairly irrelevant. Like it's not the thing I'm interested in. I joke sometimes...I'm interested in API—or artificial programmable Intelligence. And the reason I say this is...fundamentally it’s...a way of improving and expanding the set of software systems we can build..." [09:27]
The need for systems, not just models:
- “...the problem is just not a problem of capabilities, it's a problem of actually we don't necessarily just need models, we want systems." [04:44]
- Rather than building ever-larger oracles, the field should focus on programmability, interpretability, and modularity—characteristics valued in classical engineering and software.

3. The Myth of “God Model” and the Need for Abstractions

Scaling does not solve specification:
- “Scaling language models makes this...bet that anything people want...is just a few keywords away... It's an incredibly limiting abstraction.” [04:56]
Ambiguity as a fundamental limitation of natural language:
- “Natural language is too ambiguous. Code is too rigid. We need something in between, a new abstraction layer that lets us declare intent without drowning in implementation details.” [01:14]
Why “one big model” won’t cut it:
- The “absolutist end-to-end argument”—one giant model to solve it all—has been abandoned, even by major labs, in favor of system-building (pipelines, retrieval, agents).
- “That idea that scaling model parameters and scaling just pre training data is all you need exists nowhere anymore. Nobody thinks that...” [00:00]
- Instead, the need is for systems built from modules using composable, formal abstractions ([04:44]-[09:27]).

4. DSPY: A New Abstraction for LLMs

Motivation and design:
- DSPY borrows the idea of signatures from programming, focusing on abstraction at the level of function signatures—structured inputs/outputs and clear typing, but with flexibility in natural language descriptions [28:07].
- “Our most fundamental idea is...that interactions with language models in order to build AI software should decompose the role of ambiguity, should isolate ambiguity into functions.” [28:07]
Declaring intent:
- “What if we could try to capture your intent in some kind of purer form? And that intent has to go through language...You're trying to defer some decision making. I don't know how this function should exactly behave in every edge case, but please be reasonable is what you're trying to communicate with these types of programs you're building.” [23:27]
DSPY as abstraction vs. optimizer:
- Many see DSPY as just a “prompt optimizer,” but Khattab insists it's more foundational; it's the “C language” for LLMs, defining how to specify, decompose, and compose AI system components, independent of any one model's quirks [39:56], [42:38]:
  - “I want them to express their intent that is less model specific and not worry that they're losing or leaving a lot on the table.” [41:18]
Key features:
- Composability, modularity (functions/modules), formal signatures with typed objects, isolation of ambiguity, support for various optimization/inference strategies, and the ability to evolve optimizer algorithms as models improve, without rewriting abstractions ([28:07]–[36:19]).

5. Prompting, Code, and Specification: The Abstraction Spectrum

Natural language isn’t enough—code isn’t enough either:
- “Programming languages are fundamentally limited in that you have to over specify what you want...No ambiguity is allowed... That forces you to think through things, you know, maybe you don't even know how to do.” [26:34]
- “In the general case, [intent] cannot be reduced below three forms. Some things are really best expressed as code and no amount of automation can remove that... No amount of automation can remove the fact that for some classes of problems you really need a more RL like standpoint...” [32:53]
DSPY integrates all three:
- Code for structure and control flow, natural language for intent and ambiguity, and data/RL for learning & optimization (“three irreducible pieces of an AI system”: signatures, structured control flow, and data/optimization) [47:38].

6. Practical System-Building and the Role of Optimization

Use cases in DSPY are both design-time and run-time:
- Supports both up-front specification (as in prompt engineering) and run-time optimization (as in RL or inference modules) [36:19–38:50].
Signature stability, optimizer churn:
- “We build these algorithms (optimizers) to expire. As models get better, we can actually come up with better algorithms... But the abstractions that we promised, and the systems that people express in those abstractions, remain as unchanged as possible.” [35:46]
A living, open-source software engineering experiment for AI systems:
- “We spend a lot of time, this is why it's a big open source project. We want to see what people actually build and learn from that... what are the AI software engineering practices that we should encourage and support?” [50:15]

7. Philosophical: Human Intent and the Limits of Automation

Will future agents subsume all user intent, or will humans always need to specify?
- “As models get smarter and smarter, I imagine that a lot of the problems people write programs for today could get a lot simpler... But the human...condition is that we will just want more complex things. And once you want these complex things in a repeatable way, you gotta build a system.” [53:43]
- On programming for systems versus simply “prompting and waiting”: “If you're comfortable with it, actually that's endorsed by me. The problem is...most things people want...there is no five word statement that even the best intelligence in the world is going to do for you.” [15:26]

Notable Quotes & Timestamps

“Nobody wants intelligence, period. I want something else. And that something else is always specific, or at least more specific.” — Omar Khattab [00:00]
“AGI is fairly irrelevant. Like, it's not the thing I'm interested in. I joke sometimes...I'm interested in API—or artificial programmable Intelligence.” — Omar Khattab [09:27]
“Natural language is too ambiguous. Code is too rigid. We need something in between, a new abstraction layer that lets us declare intent without drowning in implementation details.” — Host [01:14]
“Scaling model parameters and scaling just pre training data is all you need exists nowhere anymore. Nobody thinks that. People deny they ever thought that at this point.” — Omar Khattab [00:00 & 04:44 repeated for emphasis]
“Interactions with language models in order to build AI software should decompose the role of ambiguity, should isolate ambiguity into functions... And what do you want to specify a function?” — Omar Khattab [28:07]
“It's what a prompt is supposed to be or what a prompt wants to be when it grows up. It really is just a cleaner prompt.” — Omar Khattab [30:32]
“If you know what you want, nothing can express it better than just you saying what you want. The data based optimization is there to smoothen the rough edges.” — Omar Khattab [34:01]
“We build these algorithms to expire. As models get better, we can actually come up with better algorithms... but the abstractions...remain as unchanged as possible.” — Omar Khattab [35:46]
“The human, philosophically, as you say...we will just want more complex things. And once you want these complex things in a repeatable way, you gotta build a system.” — Omar Khattab [53:43]

Key Timestamps for Important Segments

00:00 – 01:14 – Intro, why “intelligence for its own sake” is not enough; the shift to post-training, system-level thinking
04:44 – 09:27 – The limits of scaling, why programmable intelligence matters more than AGI
14:29 – 19:23 – Why “prompt and wait” fails for most real-world use-cases; ambiguity and the irreducibility of specification
23:27 – 28:07 – Deep dive into DSPY’s design: signatures, abstractions, composability
35:46 – 39:56 – Abstraction stability vs. optimizer evolution; DSPY’s vision as “the C for LLMs”
47:38 – 50:15 – Modern optimization strategies and recursion, open-source engineering in practice
53:43 – 56:07 – Philosophical ending: the conditions under which humans will (and must) keep formally specifying intent for AI

Tone and Takeaways

The conversation is technical but highly reflective and philosophical, critical of current AI hype, and consistently focused on what actually enables humans to build meaningful, maintainable, robust AI-powered systems.
Key takeaway: The next leap in AI isn’t making models bigger—it’s creating richer, more ergonomic programmatic abstractions that let humans declare “what they want,” balancing ambiguity and structure, so AI is composable and trustworthy in the software stack.