Columbia CS Professor: Why LLMs Can’t Discover New Science

Podcast Summary

a16z Podcast: Columbia CS Professor — Why LLMs Can’t Discover New Science

Date: October 13, 2025
Host: Andreessen Horowitz
Guests: Prof. Vishal Misra (Columbia University), Martin Casado (a16z General Partner)

Overview: Main Theme and Purpose

This episode dives deep into the capabilities and limitations of Large Language Models (LLMs) with Columbia Professor Vishal Misra, whose formal models provide a rigorous way to understand how LLMs reason—and why, despite impressive advances, today’s models are fundamentally constrained. The conversation traverses the history of retrieval-augmented generation (RAG), the mathematical structure of LLM outputs, and the crucial distinction between synthesizing existing knowledge versus true scientific discovery (AGI). The episode features rich technical discussion, real-world analogies, and arguments about the future—and limits—of current AI paradigm.

Key Discussion Points and Insights

1. Defining the Limits of LLMs and the Path to AGI

LLMs as Bayesian Manifold Navigators
- LLMs compress complex information spaces into “geometric manifolds” over which they reason, predicting next tokens based on statistical likelihood within a training-derived state space.
- As long as an LLM is “traversing through these manifolds, it is confident and it can produce something which makes sense. The moment it sort of veers away … it starts hallucinating … confident nonsense, but nonsense.” — Vishal Misra [04:18]
LLMs Can’t Make Paradigm-Shifting Discoveries
- “Any LLM that was trained on pre-1915 physics would never have come up with a theory of relativity. Einstein had to sort of reject the Newtonian physics … AGI will be when we are able to create new science, new results, new math.” — Vishal Misra [30:26, 00:00 repeated]
- LLMs can connect known dots (“exploring all sorts of solutions … by doing those steps you arrive at the answer”), but they cannot create new dots — fundamental breakthroughs require going beyond their training data.

2. Origins of Retrieval-Augmented Generation (RAG) and Vishal’s Journey

From Cricket Stats to AI Innovation
- Misra developed an early, production RAG system when trying to fix the complex interface of Stats Guru (Crickinfo), inspiring modern approaches before the “RAG” term existed.
- “In trying to solve this problem, I accidentally invented what’s now called rag.” — Vishal Misra [13:08]
- He built a system to map user queries to relevant DSL/API calls, using LLM-generated completions to bridge natural language to structured queries.

3. Matrix Models, Bayesian Reasoning & In-Context Learning

Matrix Abstraction of LLMs
- Every possible prompt is a row; every possible output token is a column; each cell is the probability of that token following the prompt.
- “Models get trained on certain data … Whenever you give the prompt something new then it’ll try to interpolate with what it has learned.” — Vishal Misra [21:17]
Bayesian Posterior as Core Mechanism
- LLMs update distributions based on context, mimicking Bayesian reasoning—context (“evidence”) adjusts the “posterior” distribution for next-token prediction.
- Formalizes why in-context learning works: “The underlying mechanism is the same whether you give a set of examples … or just give it some prompt for continuation…” [26:15]

4. Entropy, Token Prediction, and Chain-of-Thought

Entropy as Confidence Gauge
- “The trick is the distribution that is produced, you can measure the entropy of the distribution … A high entropy distribution means that there are many different ways that the LLM can go.” — Vishal Misra [04:43]
- Information-rich prompts and chain-of-thought reduce prediction entropy, channeling the LLM to confident, correct answers.
- Chain-of-thought prompting leverages the model’s ability to break problems into steps — each step has low entropy, increasing confidence.

5. Why Self-Improvement is Not Possible within LLM Bounds

No Recursive Self-Improvement Without New Data
- “The output of the LLM is the inductive closure of what it has been trained on.”
- Even with multiple LLMs talking, without external new information, models cannot create new knowledge; they only combine and interpolate within known data. [29:19–31:00]
- “That kind of self improvement is not possible with these architectures. They can refine these…where the answer already exists … but creating new dots, I think we need an architectural advance.” — Vishal Misra [32:01]

6. Computational and Architectural Constraints

Adding More Data Is Not Enough
- “You need a little bit more than that. The way human brains learn with very few examples, that’s not the way transformers learn … There has to be an architectural leap that is able to create these manifolds.” — Vishal Misra [36:46]
Multimodal Inputs, Simulation, and Next Jumps
- Giving models “eyes and ears” will help, but a true leap (“approximate simulation” ability, non-language representation of reasoning) may be needed for AGI. [38:10–39:17]

7. Evaluating Progress & Setting AGI Benchmarks

Coding as a Frontier Task
- “The day an LLM can create a large software project without any babysitting is the day I’ll be a little bit convinced that it’s towards AGI… But again, I don’t think it will create new science.” — Vishal Misra [44:31]
Definitional Markers of AGI
- If an LLM “produces something that’s outside of that distribution, then clearly we’re on a path to learning new things.” — Martin Casado [45:55]
- So far, models only “navigate” the existing manifold, not create new ones. “If LLMs really created this new manifold, then I would be convinced.” — Vishal Misra [47:07]

Notable Quotes and Memorable Moments

On the Ultimate Measure of AGI
- “AGI will be when we are able to create new science, new results, new math. When an AGI comes up with a theory of relativity … it has to go beyond what it has been trained on…” — Vishal Misra [00:00, 33:46]
On LLM Self-Improvement
- “Even in the case of like n number of LLMs … you just aren’t getting any information entropy.” — Martin Casado [29:19]
On the Plateau of Progress
- “It’s like the iPhone … the last seven, eight, nine years it’s maybe the camera got a little bit better … but there has been no fundamental advance in what it’s capable of. You can see a similar thing happening with these LLMs…” — Vishal Misra [17:48]
On RAG’s Accidental Origins
- “In trying to solve this problem, I accidentally invented what’s now called RAG…” — Vishal Misra [13:08]
On Prompt Engineering
- “One term I really dislike is prompt engineering. Engineering used to mean sending a man to the moon… Prompt engineering is prompt twiddling…writing lots of papers, just changing a prompt this way, that way...” — Vishal Misra [43:06]

Important Segment Timestamps

| Timestamp | Segment / Quote | |-------------|------------------------------------------------------------------------| | 00:00 | Definition of AGI — “create new science, new results, new math” | | 04:18–04:43 | LLMs confidence, manifolds, and hallucinations | | 13:08 | Misra’s accidental discovery of RAG via Crickinfo | | 21:17 | Matrix abstraction of LLM reasoning | | 26:15 | Bayesian mechanism underlying in-context learning | | 29:19–31:00 | Mathematical proof that self-improvement is not possible without new data| | 33:46 | Misra’s definition of AGI | | 36:46 | Limitation of LLMs and the need for architectural leaps | | 38:10–39:17 | Need for simulation and other modalities | | 44:31 | Coding as a true AGI benchmark | | 47:07 | Creation of new manifolds as a marker of true intelligence |

Takeaways and Forward-Looking Insights

LLMs are statistical, predictive engines whose power comes from compressing massive knowledge into navigable “manifolds”—but they cannot create new scientific paradigms or mathematical frameworks.
Recursive self-improvement or AGI cannot be achieved by current architectures; true progress likely requires architectural breakthroughs allowing for creativity beyond inductive closure.
Current research should aim both to formalize the boundaries of current models and to imagine new architectures, possibly inspired by human perception, simulation, or other non-language forms of cognition.
Benchmarks for “AGI” should look for novel creation (new “manifolds”), not just more proficient navigation of existing knowledge.

Additional Resources

Token Probe (tool for visualizing model token confidence): Available for public use.
Vishal Misra’s Papers: Highly recommended for those with an information theory or systems background.

Summary compiled in the spirit of the technical depth and candid humbleness of the conversation. For listeners interested in the mathematical underpinnings, formal boundaries, and future prospects of AI and AGI, this episode is a must-listen.

Podcast Summary

a16z Podcast: Columbia CS Professor — Why LLMs Can’t Discover New Science

Date: October 13, 2025
Host: Andreessen Horowitz
Guests: Prof. Vishal Misra (Columbia University), Martin Casado (a16z General Partner)

Overview: Main Theme and Purpose

Key Discussion Points and Insights

1. Defining the Limits of LLMs and the Path to AGI

LLMs as Bayesian Manifold Navigators
- LLMs compress complex information spaces into “geometric manifolds” over which they reason, predicting next tokens based on statistical likelihood within a training-derived state space.
- As long as an LLM is “traversing through these manifolds, it is confident and it can produce something which makes sense. The moment it sort of veers away … it starts hallucinating … confident nonsense, but nonsense.” — Vishal Misra [04:18]
LLMs Can’t Make Paradigm-Shifting Discoveries
- “Any LLM that was trained on pre-1915 physics would never have come up with a theory of relativity. Einstein had to sort of reject the Newtonian physics … AGI will be when we are able to create new science, new results, new math.” — Vishal Misra [30:26, 00:00 repeated]
- LLMs can connect known dots (“exploring all sorts of solutions … by doing those steps you arrive at the answer”), but they cannot create new dots — fundamental breakthroughs require going beyond their training data.

2. Origins of Retrieval-Augmented Generation (RAG) and Vishal’s Journey

From Cricket Stats to AI Innovation
- Misra developed an early, production RAG system when trying to fix the complex interface of Stats Guru (Crickinfo), inspiring modern approaches before the “RAG” term existed.
- “In trying to solve this problem, I accidentally invented what’s now called rag.” — Vishal Misra [13:08]
- He built a system to map user queries to relevant DSL/API calls, using LLM-generated completions to bridge natural language to structured queries.

3. Matrix Models, Bayesian Reasoning & In-Context Learning

Matrix Abstraction of LLMs
- Every possible prompt is a row; every possible output token is a column; each cell is the probability of that token following the prompt.
- “Models get trained on certain data … Whenever you give the prompt something new then it’ll try to interpolate with what it has learned.” — Vishal Misra [21:17]
Bayesian Posterior as Core Mechanism
- LLMs update distributions based on context, mimicking Bayesian reasoning—context (“evidence”) adjusts the “posterior” distribution for next-token prediction.
- Formalizes why in-context learning works: “The underlying mechanism is the same whether you give a set of examples … or just give it some prompt for continuation…” [26:15]

4. Entropy, Token Prediction, and Chain-of-Thought

Entropy as Confidence Gauge
- “The trick is the distribution that is produced, you can measure the entropy of the distribution … A high entropy distribution means that there are many different ways that the LLM can go.” — Vishal Misra [04:43]
- Information-rich prompts and chain-of-thought reduce prediction entropy, channeling the LLM to confident, correct answers.
- Chain-of-thought prompting leverages the model’s ability to break problems into steps — each step has low entropy, increasing confidence.

5. Why Self-Improvement is Not Possible within LLM Bounds

No Recursive Self-Improvement Without New Data
- “The output of the LLM is the inductive closure of what it has been trained on.”
- Even with multiple LLMs talking, without external new information, models cannot create new knowledge; they only combine and interpolate within known data. [29:19–31:00]
- “That kind of self improvement is not possible with these architectures. They can refine these…where the answer already exists … but creating new dots, I think we need an architectural advance.” — Vishal Misra [32:01]

6. Computational and Architectural Constraints

Adding More Data Is Not Enough
- “You need a little bit more than that. The way human brains learn with very few examples, that’s not the way transformers learn … There has to be an architectural leap that is able to create these manifolds.” — Vishal Misra [36:46]
Multimodal Inputs, Simulation, and Next Jumps
- Giving models “eyes and ears” will help, but a true leap (“approximate simulation” ability, non-language representation of reasoning) may be needed for AGI. [38:10–39:17]

7. Evaluating Progress & Setting AGI Benchmarks

Coding as a Frontier Task
- “The day an LLM can create a large software project without any babysitting is the day I’ll be a little bit convinced that it’s towards AGI… But again, I don’t think it will create new science.” — Vishal Misra [44:31]
Definitional Markers of AGI
- If an LLM “produces something that’s outside of that distribution, then clearly we’re on a path to learning new things.” — Martin Casado [45:55]
- So far, models only “navigate” the existing manifold, not create new ones. “If LLMs really created this new manifold, then I would be convinced.” — Vishal Misra [47:07]

Notable Quotes and Memorable Moments

On the Ultimate Measure of AGI
- “AGI will be when we are able to create new science, new results, new math. When an AGI comes up with a theory of relativity … it has to go beyond what it has been trained on…” — Vishal Misra [00:00, 33:46]
On LLM Self-Improvement
- “Even in the case of like n number of LLMs … you just aren’t getting any information entropy.” — Martin Casado [29:19]
On the Plateau of Progress
- “It’s like the iPhone … the last seven, eight, nine years it’s maybe the camera got a little bit better … but there has been no fundamental advance in what it’s capable of. You can see a similar thing happening with these LLMs…” — Vishal Misra [17:48]
On RAG’s Accidental Origins
- “In trying to solve this problem, I accidentally invented what’s now called RAG…” — Vishal Misra [13:08]
On Prompt Engineering
- “One term I really dislike is prompt engineering. Engineering used to mean sending a man to the moon… Prompt engineering is prompt twiddling…writing lots of papers, just changing a prompt this way, that way...” — Vishal Misra [43:06]

Important Segment Timestamps

Takeaways and Forward-Looking Insights

LLMs are statistical, predictive engines whose power comes from compressing massive knowledge into navigable “manifolds”—but they cannot create new scientific paradigms or mathematical frameworks.
Recursive self-improvement or AGI cannot be achieved by current architectures; true progress likely requires architectural breakthroughs allowing for creativity beyond inductive closure.
Current research should aim both to formalize the boundaries of current models and to imagine new architectures, possibly inspired by human perception, simulation, or other non-language forms of cognition.
Benchmarks for “AGI” should look for novel creation (new “manifolds”), not just more proficient navigation of existing knowledge.

Additional Resources

Token Probe (tool for visualizing model token confidence): Available for public use.
Vishal Misra’s Papers: Highly recommended for those with an information theory or systems background.

wavePod

Powered by Wave AI

Summary

Podcast Summary

a16z Podcast: Columbia CS Professor — Why LLMs Can’t Discover New Science

Overview: Main Theme and Purpose

Key Discussion Points and Insights

1. Defining the Limits of LLMs and the Path to AGI

2. Origins of Retrieval-Augmented Generation (RAG) and Vishal’s Journey

3. Matrix Models, Bayesian Reasoning & In-Context Learning

4. Entropy, Token Prediction, and Chain-of-Thought

5. Why Self-Improvement is Not Possible within LLM Bounds

6. Computational and Architectural Constraints

7. Evaluating Progress & Setting AGI Benchmarks

Notable Quotes and Memorable Moments

Important Segment Timestamps

Takeaways and Forward-Looking Insights

Additional Resources

Summary

Podcast Summary

a16z Podcast: Columbia CS Professor — Why LLMs Can’t Discover New Science

Overview: Main Theme and Purpose

Key Discussion Points and Insights

1. Defining the Limits of LLMs and the Path to AGI

2. Origins of Retrieval-Augmented Generation (RAG) and Vishal’s Journey

3. Matrix Models, Bayesian Reasoning & In-Context Learning

4. Entropy, Token Prediction, and Chain-of-Thought

5. Why Self-Improvement is Not Possible within LLM Bounds

6. Computational and Architectural Constraints

7. Evaluating Progress & Setting AGI Benchmarks

Notable Quotes and Memorable Moments

Important Segment Timestamps

Takeaways and Forward-Looking Insights

Additional Resources