Summary7 min read

Dwarkesh Podcast: Grant Sanderson – AI and the Future of Math

Date: June 30, 2026
Host: Dwarkesh Patel
Guest: Grant Sanderson (3Blue1Brown)

Episode Overview

This conversation between Dwarkesh Patel and Grant Sanderson explores the rapid progress of AI in mathematics—one of the domains witnessing the fastest and most visible breakthroughs. The discussion unpacks what recent milestones like solving Math Olympiad problems mean (and don’t mean) for general intelligence, the slippery nature of AI benchmarks in math, and the challenges of evaluating contributions like new definitions or conceptual frameworks. They dive deep into the historical context of math, theory-building, AI's comparative advantages in scalability and parallelization, and speculate on future roles for humans in an AI-dominated mathematical landscape.

Key Discussion Points & Insights

1. Why Does Math Matter as an AI Benchmark?

Spiky Nature of AI Progress: Math is seen as a “spiky frontier,” where AIs have made outsized advances (e.g., almost winning gold at IMO 2024) but with uneven performance across subfields (e.g., geometry solved quickly, combinatorics remains stubborn).
No Single “Aha!” Benchmark for AGI: Getting gold at the IMO was once theorized as an AGI milestone, but Grant reflects that it proves to be just another benchmark—a point of progress, not a fundamental breakthrough.

"There won't be some aha moment when this happens." — Grant [00:34]

2. Creativity, Benchmarks, and the Limitations of Training

Dirty Secret of the IMO: Many problems can be “brute forced” both by students and AIs; combinatorics and true creativity remain elusive.
On Benchmark Creep: As AIs cross one benchmark (e.g., finding connections between fields), we move the goalpost to more creative tasks like definition generation, theory-building, or problem creation.

"The greatest mathematicians come up with definitions." — Grant citing a popular saying [09:33]

3. The Historical Perspective – Galois, Abel, Lagrange

Long Verification Loops: Real mathematical innovation, like Galois’ group theory, takes decades or even centuries to be recognized, let alone become “useful.”
Challenge for AI Evaluation: Coming up with valuable abstractions or definitions isn’t immediately benchmarkable. Utility is often recognized much later.

"There's a 100-year verification loop of why is this a productive concept in the first place." — Dwarkesh [13:31]

"That’s the premium tier mathematician… the definition generator." — Grant [09:15]

4. Human Understanding vs. Automated Proof

Proof vs. Explanation: Even if AI proves a famous conjecture (e.g., Riemann hypothesis), the challenge remains: will humans be able to understand why the proof works?

"There is a difference between proof and explanation." — Grant [32:23]

5. The AI Talent Stack: From Connecting Ideas to Creating Theories

Connecting Ideas (“Lightning Bolts”): AI’s power lies in bridging disparate concepts (e.g., random matrix theory and the Riemann Hypothesis), but the field will truly transform when AIs are also good at creating entirely new conceptual mountains.
Mountain-Building: The process of synthesizing whole new theories (mountains), not just linking fields, remains a critical human trait and the next frontier for AI.

6. Metrics and Training for “Higher Tier” Mathematical Creativity

Hard to Quantify Value: How do you “reward” AI for Galois-like instinct or for creating a fertile abstraction? Typical RL (reinforcement learning) benchmarks can’t capture long-term value or conceptual elegance.
Compression as a Proxy: Short, elegant explanations might signal intelligence:

"The smaller expression that's more predictive feels more intelligent." — Grant [22:31]

7. Process, Parallelization, Context Escape, and Systematic Diversity

AI's Digital Advantages: Unlike lone geniuses, digital minds can scale, parallelize, and systematically explore diverse approaches—trying to prove, disprove, or reframe a problem in ways humans cannot.
Escaping Context: Both AIs and humans can get stuck due to accumulated biases; deliberate context-resetting (e.g., flipping between proving and disproving) could unlock new results.

"Sometimes just being able to say, refresh your thinking, come at it completely differently... actually be one of [AI’s] advantages." — Grant [48:12]

8. Entropy, Heuristics, and Multiple Research Programs

AI’s supposed “collapse to sameness” can be counteracted by engineering prompt-level diversity and parallel ideation, akin to the diversity of human heuristics.

"You actually just need multiple independent research programs with their own heuristics." — Dwarkesh [52:03]

9. Why Is Math Making Progress Faster Than Other Domains?

Verifiability and Grindability: Math and code can be “grinded”—simulated with endless rollouts and clear success/failure criteria. Real-world tasks don’t have this kind of deterministic repeatability, slowing progress.

"Coding and math are exceptions... you can containerize them, parallelize them, and evaluate clearly." — Dwarkesh [55:10]
Process-Based Supervision: Formal systems like Lean provide automated, endlessly scalable checking, which may become even more important for unsupervised theorem exploration.

10. AI as a Mathematical Research Engine

The Mathlib Metaphor: Imagine AI agents endlessly extending a formal math library, exploring logical consequences, inventing definitions, and branching into unknown territories—some trash, some gold.

"That's a very unique thing that math has that nothing else has, where you could press go and then just pour, compute at it and look away for 10 years and then come back and say, what do you have?" — Grant [58:05]

11. AI and Explanation, the Role of Curation

Exposition and Curation: Even if AIs are better at both proving and explaining, the uniquely human role may shift toward curating, motivating, and socially selecting which math is worth focusing on—like an art curator.

"What's left for mathematicians might actually just shift subtly into that curation direction." — Grant [37:19]

12. Learning with AI, Limitations and Best Practices

AI as Super Wikipedia: Currently, LLMs are most useful for surface-level explanations, not for deep conceptual reframing or for knowing when a learner’s thinking is off-kilter.

"LLM explanations feel to me at the moment a lot like Wikipedia... but nevertheless, what's the most useful part? The references at the bottom." — Grant [77:45]
Best Learning Approach: Use LLMs in tandem with curated human resources (books, lectures), relying on LLMs for pruning, clarification, and supplementary explanation.

13. Career Advice for Students in a World Dominated by AI

Understand the Value Chain: Focus on how your role creates value and where your salary actually comes from; teaching and curation could become increasingly vital.
Teaching as a Stable Career: Socially embedded, relational roles are more resistant to automation.

"Teaching is one of the most stable post-AGI jobs that there is because it’s so relational." — Grant [85:55]

Memorable Moments & Notable Quotes

Grant on Proof vs. Explanation:
"There is a difference between proof and explanation." [32:23]
On Shifting Human Roles:
"What's left for mathematicians might actually just shift subtly into that curation direction." [37:19]
Describing the Mathlib-as-Civilization Metaphor:
"It’s just like for millennia humanity is building this corpus of knowledge... at some point the models will just extend that arbitrarily." — Dwarkesh [66:52]
On a Potential Future for Math Jobs:
"If there's any jobs whatsoever, surely distilling what the AIs have learned will be one of them." — Dwarkesh [88:25]

Timestamps for Key Segments

00:00 — Framing the episode: Why AI progress in math matters; revisiting the “AGI by IMO” question
01:11 — What makes IMO problems tough for AIs? Brute force vs. creative domains; spikiness of AI intelligence
07:13 — What’s the next benchmark? Generating conjectures, definitions, and new objects
13:41 — Group theory origin story: Lagrange, Abel, Galois; slow verification loops and hindsight value
24:23 — Hard to quantify the “right” abstraction; compression, concision, and elegance as proxies
27:24 — How “explanation” differs from “proof”; will AI solutions be human-parsable?
33:04 — The incentive for explanation and higher-level distillation
37:19 — The future of human mathematical work: curators, not creators
45:43 — The true edge of digital minds: parallelization, systematic context resets, and agent diversity
54:46 — Why math/coding progress is faster than web/computer use; verifiability vs. “grindability”
58:05 — The unique potential of AI-driven auto-math research
76:45 — Best practices for learning with LLMs
83:06 — Career advice: value chain, the importance of teaching; resilient human niches in an AI world
89:06 — Will a math singularity spill over into other fields? Economic utility & plausible impacts

Final Thoughts

This episode is a must-listen for anyone interested in the cutting edge of AI, the philosophy and sociology of mathematics, and the possible shape of a future where machines not just solve old problems but begin to reshape the landscape of knowledge itself. The dialogue is thoughtful, historically informed, and balances both optimism and humility about the promise and limits of AI in intellectual domains.

Listen or read more at: www.dwarkesh.com

Loading summary

Transcript92 lines

[00:00]
A
Today I'm chatting with Grant Sanderson, who runs through Blue and Brown and is now working on a new project documenting the progress AI is making in math. And I wanted to talk to you about this because AI has been making the fastest progress in mathematics as of any other field. So whatever is happening here and whatever way we're seeing AI progress happen or not happen, would tell us about what will happen to the rest of the world as AI gets better and better. So I wanted to start with this question I asked you when I first interviewed you three years ago, and I asked you, once we have AIs that can get gold in the International Math Olympiad, wouldn't that just be AGI? Wouldn't this just be able to do anything any human can do, given how hard these problems are? And you had an answer which, in retrospect turned out to be very wise and correct, which is like, it'll be another benchmark, like all these other benchmarks that AI are passing. Obviously, AI has gotten better in general ways since then, but there won't be some aha moment when this happens. First, I. I think I'd be curious to get your heuristics on why that turned out to be true. And second, I'm curious how long you think this narrowness can continue to be true. So, by the point that AI has solved the Millennium Price Problem, do you think it's still possible that at that point, there's lots of tasks that humans are doing that AI still can't automate in the economy?
[01:11]
B
It's an interesting question, because it's hard to answer without knowing what the solution looks like ahead of time. I mean, if we take the imo, that's something where I think the spirit of your question three years ago was in looking at how some of the solutions to these problems really seem to require creativity. And the designers of these problems, they'll try to have them come up with things that you can't train for as easily. I think the dirty secret with the IMO is that you really can train for a lot of them. And so with the whole AI and math project undergoing, I think, as you point out, one of the reasons it's interesting at all is that there's a spiky frontier to AI. Math is just right there in one of the spikes. But there's kind of a fractal nature to that spikiness, because when you zoom into the specific progress within math, you have some things that are a lot easier than others. So if we just think about imo, which is Old news at this point. It's kind of like two years ago. They're really doing quite well. They would have gotten a gold in 2024 if for not the following reason. They're very good. They're just like cold solved geometry, basically. And the IMO has these four categories of problems. That's geometry, number theory, algebra and combinatorics. So like geometry just solves in like 19 seconds in 2024 because it's kind of a brute force solver. And the dirty secret is for students, there's also sort of a brute force way that you kind of can go edit. Combinatorics is the one that's the wildcard of much more like playful puzzle y seeming problems. And there were two combinatorics problems on that year's test. There's not always. There's four categories, six different problems. So it's kind of a toss up which one is going to have two questions. Had it been more geometry questions, they would have gotten a gold that year. But it struggles on those combinatorics ones. And someone who's trying to keep that torch of the last holdout of math for humanity might say, well, those are the ones that require the more creativity. Even then, though, I think the spirit of your question, if they're solving a Millennium Prize problem, does that also service a lot of white collar work? It suggests that whatever the rate limiter is between where we are now and that is the same as the rate limiter for making things better at white collar work, we could maybe paint a couple different ways that if we focus on, I don't know, Riemann hypothesis, what would it look like to solve that? One possibility would be these things are extremely good at a specific domain of knowledge and just knowing it very deeply and then knowing another domain and knowing another domain. And you've pointed this out, it's bizarre to have something with this superhuman breadth that knows all the fields so well. That's not just finding those lightning bolts that connect them. I think we're starting to see sparks of that of actually finding connection between the things that it's an expert at. I'm sure we'll talk about it. If the nature of the solution to the Riemann Hypothesis was something like that, that feels pretty distinct to me than what's necessary to get good at white collar work. And there's a reason to believe actually that that might be the nature of the solution. I don't know if you know the story of Hugh Montgomery and Freeman Dyson at the ias, this is a side tangent. But it's just kind of a fun story on how, I don't know if it was over lunch or something like that. Basically you have this number theorist who is pointing out, just trying to understand the statistical correlation between pairs of zeros of the Riemann zeta function. So the Riemann Hypothesis is all about, do all these zeros sit on a straight line? And he's finding this quantitative question you could ask about. And he writes down a formula. It looks like one over sine squared or something like that. Freeman Dyson, a physicist, is like, I know that expression. That expression comes up in studying the eigenvalues for random Hermitian matrices, which was something that comes up in studying the energy levels of a nucleus. And the idea that the statistics of those two seemingly different things were the same sort of prompted a potential exploration on, hey, are there aspects of random matrix theory that might be relevant to Riemann zeta function? And I think it's a little bit of an open question like, is there fruit to be had there? But that kind of bridging together from two different fields. If it turned out that the solution to the Riemann hypothesis was exploring an idea like that even further, that has this character of kind of how you expect LLMs to be good at math. It's like they're an expert at the quantum physics, they're an expert at the analytic number theory. They should be able to see that similarity in a way that doesn't require Montgomery and Dyson to be having lunch and happening to talk about that. That's totally different from white collar work in terms of the extent to which you maybe have a hard time using an AI as an editor. It's not because they know everything and you just need them to find that lightning bolt in different possibility would be, what's the right analogy? Maybe if we think of Fermat's last theorem. Between the moment of Fermat phrasing the question and then what the solution itself looks like, where ultimately the solution involves such heavy machinery in math. Right. So the beauty of that problem is you can phrase it so simply. You ask about X to the n plus Y to the N equals Z to the N. Do you have integer solutions for this? When N is bigger than three? And it's something you might expect there to be an elementary number theory approach to it, but just as far as we can tell, there's just not. Whereas the actual solution, maybe there is something simpler, but this might be what it has to be. There's such a complicated set of ideas that build on centuries of work centered around elliptic curves and then this other mountain of ideas centered around these things called modular forms. And both of those mountains have to be built before you can ask the right question that connects it. So if the solution to the Riemann Hypothesis hypothesis involved building a new mountain, that's a kind of skill, the ability to come up with the right new ideas that feels sufficiently different from the character of how they're intelligent right now, that it's not like that's what you need from your hired video editor per se, but that if it's capable of building mountains that are the correct new theory that crystallizes how we should be thinking about a subject that's just such a level of intelligence that then it starts to feel. It would be surprising if that didn't permeate into other aspects of the economy besides just the mountain building for math itself.
[07:13]
A
Yeah. Or at the very least, even if it couldn't literally do every single thing white collar humans can do, it would just have transformative effects in the way that getting gold in the IMO did not have transformative effects on the world. First of all, I do want to point out that I'm totally moving the goalpost here, because when I interviewed Dario about two, three years ago, I asked this question about why haven't they been able to build two, use their vast knowledge to connect ideas together and come up with a new discovery that way? That seems like the kind of thing, even if a moderately intelligent person knew this much information, they'd be able to come up with a medical diagnosis from the fact that this drug causes migraines and this other thing, whatever, does this. And maybe it's the same drug that can cure both things. And yeah, I don't know. From an outsider's perspective, mathematics seems clearly like a field where finding this counterexample to the unit distance problem conjecture was an example of this kind of thing. And so total goalpost moving. But then we can ask, okay, what is the next Benchmark now that AIs can do this thing that we should have thought they should be able to do, what is the next thing? That would be quite impressive. And there's a couple of candidate ideas here. So one could be coming up with interesting problems in the first place, and the other is coming up with new kinds of objects or conceptualizations that create or unify fields on the first one. Right now we're just training these models to like, we have these millennium price problems because, I don't know, mathematicians have noted, Riemann came up with this idea of this Riemann zeta function, because he thought that it would have some connection with the density of prime numbers or if the zeros on this function would have some connection to prime numbers. And so figuring out that, why do we think this is an interesting thing to study in the first place? Why are we building this object and trying to answer questions about it and answer this particular question about it seems like the kind of thing that would be the next benchmark.
[08:59]
B
I mean, you highlight two pretty good examples there. For anyone curious about the unit distance conjecture, there's this really nice video by a math channel called Polylog where they talk about it. And one of the people in that, because all of these discussions, it causes people to reflect on the process of doing math, they're like, ah, this thing can do this impressive stuff. What does that mean for us? And he highlights this quote, how good mathematicians prove theorems. Good great mathematicians come up with conjectures and the greatest mathematicians come up with definitions. And that's more or less exactly your framing here on those two. We need the conjecture generator and then the definition generator. That's the premium tier mathematician. I don't understand how exactly you would make that a benchmark in the sense that usually when I think of the word benchmark, I'm thinking something that you have, it's a goalpost, the ball is through the goal, or it's not. You can clearly say, yes, this is done partly to be able to do things like our lvr, but also partly just to be able to know that you haven't moved the goalpost. And answering. OpenAI can have their headline on disproving the unit distance conjecture because it's a clear, distinct. It's like it did it, right? Whereas imagine trying to have a headline on like VPT 5.4 came up with a really good conjecture, right? Like, we promise everyone thinks it's a good conjecture. It just doesn't. It doesn't land the same way. But maybe that doesn't negate the fact that that's the right thing to be thinking about. So I would be surprised if it ever took the form of looking like a benchmark. And like, we have a score saying that it's past this benchmark because we can quantify how good a conjecture it is. But probably the nature of what it would take is that you would feel a tone shift in conversations with mathematicians about the way that it's useful to work with this series that you referenced that is not at all produced yet and probably won't be for a couple months, takes the form of us interviewing a lot of mathematicians. And what's interesting is we started doing this over a year ago, and it's fun to see a little bit of a tone shift in the way that they talk about AI between mid 2025 and where we are now in 2026. In the real world, that's a very short amount of time. In the AI world, that's eons. And we're able to see over those eons like this tone shift. I think the way that you'd measure conjecture generating ability is going to be more subjective on that tone shift, where it'll be mathematicians saying they're not just using it to solve their problems, but as they step back and decide what their research field should even be. That a conversation with such and such model was genuinely helpful for that. I don't, I don't think it's likely that you'd see it in the form of a headline saying that this was yet another benchmark knocked down.
[11:32]
A
Right. And so it's very interesting. The kinds of things you can't make benchmarks for are also the kinds of things, at least in the current paradigm, you can't easily train for. Right. Because there's really no fundamental difference between a benchmark and a training environment. I think it's very easy to come up with some dichotomy of like, here's a deep reason why AI can't do a certain thing, and then it turns out, well, you're just thinking about it the wrong way and actually AI can do it pretty soon thereafter. But I'm going to come up with.
[11:59]
B
You're going to come up with a couple anyway.
[12:01]
A
And I think that this will probably. It'll probably turn out that there's ways in which we can train AIs to do these kinds of things in the relatively near term, but it seems like it would have to be different from current ROVR training. So the thing I'm curious about, and the thing, it seems to me that drives a lot of the big progress in mathematics and in science generally, is coming up with a new way to think about a problem or the new way to understand the world that then unifies different fields, spawns entire new fields, solves problems we weren't even thinking we were trying to solve in the first place. The reason Einstein was thinking about GR is not because he wanted to explain why light bends or why black holes exist. These are phenomena he didn't even know needed to be explained in the first place. But in mathematics, it often seems, okay, a total outsider. I Don't even know the details of what I'm talking about here. From the outside, it seems like there's often ways to, say, prove a specific problem that can motivate a new conceptualization, one which results in a whole new field, a whole new way of thinking which is immensely productive and one which doesn't. I think I'd be curious to hear you talk about whether Galois coming up with group theory and distinguishing his solution to the quintic, having no formula for the roots, and Abel coming up with a different proof a few years earlier that didn't come up with group theory. But then if you wanted to do a verification loop on. Is group theory an interesting concept? Concept that was like, was something useful done here? Why is this proof better? Potentially, that verification loop is 100 years long and it involves the cryptography coming around and physics making progress and the ideas in group theory being relevant and understanding symmetries in physics and all those kinds of things. It's like 100 year verification loop of why is this a productive concept in the first place?
[13:42]
B
Yeah, boy. Yeah, you struck a nerve. Because I had this project about Galois I was going to do in 2022 that I put on the shelf, but I spent a year of my life thinking a lot about what he did, so. So there's a risk of me accidentally talking too long on the specifics you can hold me back on. It's a perfect example for your case because describing why it was a valuable insight does not come from immediate utility. And so certainly if you're thinking about RLVR environments, it's like, okay, this is gonna be really hard to do. But it's interesting to note how even with human verifiers at the time, like, it took a really long time to recognize it as being useful. Like, I think Einstein with gr, people sort of felt, you can, like, feel this feels like a good theory right away. Like, the. What makes the Galois theory such an interesting example is you have literally this 100 year segment of, like, an idea that, like, flows through many different people's heads before it, like, settles into something that the math community, like, agrees is good. So to back up a little bit, I mean, do you want the background on the problem at all? All right, well, so we all learn about the quadratic formula in school.
[14:51]
A
I thought you were going to say we all learn about group theory in school. I missed that class.
[14:54]
B
We all learn about group theory, quadratic formula. So this was known in some sense. Like, Greeks could solve quadratics, but they didn't really write things in algebra. And so it's really more like the Arabs that wrote down that formula. There's this delightful story around some dueling Italian mathematicians. Not real duels, just like intellectual challenges, who secretively found a formula for the cubic and then very shortly thereafter found a formula for degree four polynomials. So the natural open question for mathematicians is, can you find a formula that solves degree five equations? Now the degree four, it's monsters. It would be wild to write it down. You usually don't really write it down in full. You break it up as a procedural thing. So you might believe these things have this exponentially increasing complexity so many hundreds of years, nobody is really answering that question. Usually we say Abel was first to prove it. He was this young, precocious Norwegian mathematician and he showed it's simply impossible. It's not that you can find a quintic formula. He thought he found one, but he showed it's impossible. I think the real credit though, you have to back up a little bit and talk about Lagrange, where Lagrange found the right kind of question to ask about this. I can go into the details if you want, but I'll give it a very high level. He. He was studying the question and he recognized being able to solve these polynomials is actually very related to understanding the way that certain algebraic expressions are symmetric, more or less. So if I write down A plus B plus C plus D, just adding four variables, if I permute those, it doesn't change the value of the expression. Whereas if I write A plus B multiplied by C plus D, some of the permutations don't change it, but some of them do. And he had this really, really nice insight about how if you can find expressions like this that have four free variables, but all the permutations take on three distinct value that had this unexpected relationship with being able to reduce degree four into degree three. So he started approaching the like, can we find a quintic polynomial by saying, I wonder if I can extend that? And to extend that method, you would have to have an expression that has five free variables, such that as you permute them over all the five factorial permutations, it takes on only four values or fewer. So that's like you could put that in a puzzle book. You could put that in a brain teaser that like a 12 year old can engage with. And it's not too hard to find yourself feeling like that's an impossible task. And so Legrand just sitting here saying, here's a strategy that I'm trying to solve this problem. Can I find a quintic polynomial? This strategy doesn't. It seems like it might be impossible, at least from this strategy. But that was the first time in history that people had the instinct that some kind of question about symmetry was the right way to be studying these polynomials. In his mind it was just a way it had yet to be discovered that actually there's a tighter connection. And also maybe rather than searching for the formula we should be asking the opposite question. Can you prove that it's impossible? So he sort of planted that seed around 50 years later. Abel definitely read Lagrange and was influenced by it. Gaulois, we know that he loved Lagrange when he was falling in love with math. And so it's very hard to imagine that these two young geniuses, the fact that they both come up with pretty similar insights around that problem, it's not born from Lagrange. But to your question on are you able to verify that this was a good idea? There wasn't any result that Lagrange came to. There's never like he solved the problem. And therefore we know that that was the right question to ask. He asked it. There's some intrinsically interesting thing. It also wasn't very important for math at the time. Most people were more interested in the applications to physics. This is almost in that side, almost recreational hobbyist type thing like Abel, he started working on quintic stuff but then he was advised to spend more of his efforts studying elliptic functions. And so more of his work was on that before he died young. He died at 26 from tuberculosis. And then Galois, he pushed both of those ideas in the right direction where he really understood the nature of abstraction. And so he had this really nice piece that he wrote while he was in prison actually. He was like we could talk all about his life story. It's pretty wild. But he's like this teenager, he's in prison. He had tried to submit his math papers and they had been rejected. So again it's like verifiable reward. The verifier function that is the academy at that time is rejecting what he wrote because frankly it was not very coherent. It wasn't a complete proof. He wasn't giving a clear thought of what the theory actually was. He was just like a young fledgling mathematician getting his bearing. So it's like the verified reward there is like eh, no good. But he has some instinct that there's something there. So he's writing this diatribe on the nature of Math being something which undergoes these shifts over time. And he talks about the advent of just algebra itself and going from just thinking in terms of numbers to having a certain fluency just with pure algebraic expressions where you're not tied to interpreting those expressions. And he has this instinct that there is another layer of abstraction that seems like what we should be doing, where rather than thinking about the formulas themselves, thinking about what symmetries underlie those formulas. But it was still a pretty ill defined theory. So if you're trying to say okay is the verified reward that he has solved a problem that other people haven't, it's like, well, Abel proved that quintics are unsolvable. And you say, what was Galois doing? Well, in principle, the thing that Galois theory will let you do is take a specific polynomial and it gives you the rules to say, does that specific polynomial have roots that you could write down? For example, like x to the fifth minus one, you know that a solution is one, or x to the fifth minus two, you can write down fifth root of two. So it's not that every quintic polynomial you can't write down the solution, but could you find a specific one where you prove you can't write the solution using radicals? He also didn't even solve that exactly. Like he has a much more abstract. He didn't show for a specific example that he couldn't. So even describing what problem did he solve is very tricky. So then he dies. It's this very romantic story of he has this duel. We can get more into it. There's a lot of myth around. Like supposedly he writes up all his ideas the night before the duel. Really he tried to get them published.
[20:51]
A
Working on the quintic doesn't seem to be good for your health.
[20:53]
B
It's very bad. Yeah, yeah, yeah, yeah. If you're a young genius, don't work on the quintic. And so he asks his brother and his close friend, like, get these notes to Gauss, get these notes to like the important mathematicians of the day. Because I think there's something here. Even then it didn't really take, like. So his brother and his friend like tried to get them out. It wasn't another 20 years until Louisville sees these notes, sees that maybe there's something in them, and tries to clean it up and understand what was Galois getting at. And then, even then it was another 20 years or so until Jordan actually puts together a, something like a modern treatment of group theory that they attribute it to Galois. You could easily imagine History turning differently, where these ideas were kind of coming about from other points in math. And Galois could have been forgotten in history if he was the less florid character. But between the time of Lagrange, having this inkling of maybe symmetries of roots is the right way to go to where at all it looks like modern group theory, you've got this long Spanish. A lot of the time it's not even passing the verified reward of human reviewers because it gets on someone's desk. They say, I don't really know if there's anything here. It gets on someone's desk. They don't. You have to have this one person sort of recognizes it and then even then it's not really solving practical problems at that point. Like you point out cryptography and physics and things like that. You have to get into the 20th century before you have gel. Mann thinking maybe understanding the nature of how certain groups break down, has this relationship with what particles are made out of. And he anticipates quarks based on a purely group theoretic question. And that's one of the more interesting applications of group theory is that to even predict the existence of quarks is a group theoretical question that's so long after Lagrange before you have anything like that. And so you have to ask what is the way of measuring progress that's not based on solving a problem and that's somehow capturing what is the instinct that's inside Galois mind when he says I think there's something here, what's the instinct that's inside Lagrange's mind when he says I think this is the right way to think about it, what's the instinct inside Louisville's mind when he says these scattered notes from this long dead youngster might have something to them? It's so hard to put a finger on that. But I mean, a different series of videos I'm making right now is about the whole compression is intelligence idea. And even though this isn't really the angle I'm taking, you know, there is something to the idea that the smaller expression that's more predictive feels more intelligent. And so I wonder the extent to which you can give some kind of verifiable reward around not just like did you solve it? Or what is it solving? But around the smallness of the concepts required to do it. I mean, going back to Riemann hypothesis solutions, what would that look like if an AI solves it? I think a third way that it could happen is it just straight up works harder. Right? In the same way that you could maybe have an elementary proof of Fermat's last theorem that's just like spelled out over like thousands of pages. That would be incoherent. But the cleaner way to view it is with elliptic curves and all that. Maybe there's some thousand page proof of Renman Hypothesis that's like no one's really getting anything out of it. And what you actually want is what are the succinct compressed versions of those ideas that would then lend themselves to human understanding? I don't know. Kolmogorov complexity. Maybe you throw that into your attempt to quantify what you mean by elegance. But I don't think it's easy. But I do think it's something you would have to do in order to reward the Galois like instinct, rather than just rewarding. Have you solved a problem?
[24:23]
A
It's very hard to come up with the heuristic for science, but it's clear humans have been doing this somehow and obviously AIs will do it at some point.
[24:33]
B
Well, it's relevant also, not just in terms of verified reward, but presumably the end goal is understanding, like human understanding. And so even if you do have some thousand page proof of some math thing or some grand new physical theory, the goal is understanding. Maybe if the goal is predictiveness, you can just have automated engineers go off and build rocket ships or something. We're like, we have no idea how these work, but we can get between stars. But there's going to be a lot of people who want to understand. You're still going to want whatever the concision function is that distills down. Here's this complicated way of thinking into the right one, the equivalent of the universal law of gravitation for Newton. You would still want to train AIs to be able to do that and find the compressed representation.
[25:17]
A
I grew up in India till I was eight and so in addition to English, I also speak Gujarati. And since Google just released Gemini 3.5 live translate, I thought it'd be fun to put it to the test in this mid roll. 3.5 Live Translate.
[25:29]
B
3.5 Live Translate automatically detects more than 70 different languages and translates them in almost real time into the target language. Live Translate is your original speed and format while speaking, just like it's doing right now.
[25:45]
A
I visited China back in 2024 and I remember thinking at the time that this trip would have been so much more productive if I could have been able to live translate the conversations I'm Having with researchers and random people I meet on the street now we have that technology. So if you're building an app that needs live translation, you should 100% check out Gemini 3.5 live translate. It's available now via the Gemini Live API and an AI studio. Go to AI Studio Live to get started. So people have this worry about mathematics in particular that the AIs will prove the Harrymann hypothesis and our understanding of mathematics won't be any the better for it. I have a couple of questions about this. The first one is whether this is a thing you should expect. Isn't the reason humans come up with general natural objects and sub goals and whatever when we're working on a big problem is that this is just useful when you're trying to work on a complicated important problem. And so we can just think about theoretically, would this even be a simpler way to solve the Rayman hypothesis as opposed to just coming up with the natural abstractions that are relevant to thinking about the problem and then two empirically? Is this what we observe when AIs do make progress on problems? Today, when the AI came up with that counterexample to the unit distance problem conjecture, you can just read its chain of thought. And it's not understandable to me because I don't know anything about mathematics. But it seems to other mathematicians it was understandable and it made use of known concepts in mathematics and proved relationships between them and all in natural language and as a result accelerated our understanding of the connection between this object and this conjecture. So is this even a thing empirically? Is this a thing we should be worried about?
[27:24]
B
I think it depends on the nature of. Yeah, again, if we sort of break down the three possible ways of solving the Riemann hypothesis, that one and the other big one from this year was a certain Erdos problem numbered like 1196. But these things called primitive sets, but basically it had that character of bringing an idea from a seemingly different field. As soon as you just present the basic idea to a mathematician you say like what if we use this try to Markov chain process where we show that this thing is won from the bottom up probabilistically rather than the top down. And use the von Mangold function. If you say that to someone in the know, they'd kind of know how to run with it. So we have this very small idea that has the form of expertise in one field, expertise in another. Draw a little lightning bolt between them. Those are going to be very human parsable. Right. Because all you have to do is just show the start and end point of what those connections are. If the character of it is a mountain building, you have to put in a lot more time to understand that new mountain that was built because it's a new thread that's not just lightning bolt between them. And if the nature of the progress was just raw hustle, it's just this super long thing. There's no new theories, but it's just long, long, long chain of reasoning answer, then you would have that word of like, okay, there's this whole digestion process. So I don't think there's one clear answer. I think it depends on what the solution there would look like. And on the mountain building side, that would actually be really interesting to see. Is it by default a very human understandable the way that we see new theories from great mathematicians? Or is it an alien different kind of mountain being built where we even have to reprocess the kinds of abstractions that we engage with? Well, the closest example here would be the attempted solution of the ABC conjecture. We maybe shouldn't get into that one, but it's probably not a correct solution. But basically it's this whole new way of thinking that this otherwise reputable mathematician in Japan had come up with. And it just took mathematicians a long, long time to even parse what he was saying. But it had the feeling of just an alien bit of mathematics that's theory building. It's not just long chain of reasoning. He called it inter universal geometry or something. And so the fear that you would have is that, yeah, it does that. The biggest fear would be that it does that. And then much like the ABC conjecture, people work for years to go up the mountain and they're like, dang it, this just isn't right. And if it turns out to be wrong, but it really looked right. But even if it was right, there's just a lot of effort to hike up a new mountain if we end
[29:55]
A
up in that situation. David Bessis had a really great blog post called the Fall of the Theorem Economy where he's talking about this. Historically, as you were saying, mathematics is coming up with these definitions and problems and it's about proving theorems about them. And that really the theorem proving stuff is what gets all the credit. But it's like really a parasite on coming up with the definition stuff. And historically it's not even a problem in terms of credit apportionment, because if you come up with a definition, you're probably going to be the guy who comes up with a theorem. But now we're in a situation where if the valuable work is coming up with the insight and an AI just automates the latter part. Okay, imagine a scenario where we have AI comes up with the abo, like direct arguments about a bunch of important conjectures in the world, and then we just have these proofs, and now it's up to humans or the future AIs to then consolidate. I mean, I'm sure if you had access, again, having no object level understanding of this argument whatsoever, I'm sure if you had access to it, it would make it easier for you to then think about like, well, what is going on here? Is there some deeper way in which you can understand why this proof works that would make it easier to come up with the ideas behind group theory?
[31:11]
B
Yeah, I think it would be hugely helpful. Right, because I mean, so much of trying to discover new math is mostly being wrong. You're trying to solve a problem, it doesn't feel like constantly taking the correct step up the mountain. Mostly it feels like a random drunken walk where you're doing a thing and then, oh, you're wrong, and constantly just going around. So if at the very least you know that trying to digest what you know is ultimately leading to a correct solution, that feels like progress simply because it's providing a sense of knowing that it leads to a solution. And there's plenty of instances in the recent history of math where it feels like the reach has sort of exceeded the grasp, where there's things that are proven long before they're understood. And I mean, one of my favorite openings to a paper, it's not even like a research paper, it's more like an expository one, is from the mathematician named Timothy Chow, who was trying to understand a concept called forcing. And so there's this problem called the Continuum hypothesis that more or less asks, you have a size of infinity for the natural numbers, you have a size of infinity for the real numbers. Is there something in between? And the answer is both yes and no. It depends on your axioms. It's sort of outside the scope of our usual axiom systems, which is an interesting answer. But the method to describe it is just really, really hard to understand. It's the thing called forcing. And in the beginning of this paper he writes like, everyone knows the idea of an unsolved research problem. I want to propose the idea of an unsolved expository problem, where, sure, we've proven it, but we don't really know why it's true. And so then he proposes a partial solution to that expository problem. You can imagine why I loved that framing, because this is my whole life. I don't do research math. It's just wholly about what's the most clear way to understand this. Even if it's proven. There is a difference between proof and explanation. And so on that side, I think that you are basically getting to the importance of that distinction.
[33:04]
A
Yeah. And that will be the main incentive for. Or the incentive would have to change in not just mathematics, but in other areas of science, from proving things about the world to consolidating proofs into problems or higher level insights. But we were having a discussion earlier at lunch about a recent talk you were giving about design and how it helps us understand things. And then in the limit, is there really a difference between the conceptualization for an idea and the idea itself? So if you think about special relativity and spacetime diagrams and Minskowski spacetime, is it like. Yeah, this is like a way in which we illustrate this idea of why there's length contraction and time dilation. But is that. That is the reality. So the exposition does seem to be the explanation in some sense here.
[33:58]
B
Yeah, I mean, there's a couple interesting things there. One is, it seems like there's a really strong correlation between the people who come up with genuinely novel insights and also who are actually quite clear in their communication of it. You might imagine, given that the experience of a university student is often that the expert there teaching them is not necessarily the best explainer of that topic because they are so spoiled by their expertise. But what seems, at least in some cases to be the case is how the people who are really coming up with something quite novel. So you've got like Einstein or like Claude Shannon or something there. You read their papers, they're really lucid papers. Right. It doesn't feel like, oh, this is just for the experts and you have to chop through it with a machete to get. They're like very good expositors. Like Feynman has this characteristic too, like very good expositor. And so maybe the same part of the brain that comes up with the correct new way of thinking about it at a research level also has this knack for good explanation. And I think this is pertinent to the AI one, where I kind of used to think that AIs will become these automated theorem provers, but the role of the mathematicians is going to shift towards my job explain these things. I kind of suspect that actually they'll also be quite good at doing that and probably Just better than most humans are at doing the explanation half and distilling half. And that's actually not. What's left for the mathematicians is digesting and explaining what was going on. Probably the nature of how these things are going. I could envision we can talk about ways this might not be it, but probably the same thing that is coming up with the really good new idea that solves some new problem is just also good at explaining it. That's the way I think beliefs have changed.
[35:38]
A
What's the last thing you think you'll be doing? Both you and then also with the mathematical community, the human mathematical community will be doing.
[35:45]
B
I will probably be doing something like what I am until I die.
[35:53]
A
And if the doomers are right, maybe that'll be the same. Exactly. It'll be for the same reason.
[35:58]
B
Yeah, yeah. You build a man of fire and he's warm for one night, but set a man on fire and he's warm for the rest of his life. So that's where I am with AI, because some of the. Some of the function of an explainer or a teacher is to add clarity to a thing that someone's curious about. That's one thing. But some of it is a little bit more relational and a little bit more providing motivation, providing a sense of curation. One interesting take that I've heard about what mathematicians will end up being is actually more analogous to art museum curators than anything else. Where the A is solved. The thing so the art exists. They even know how to explain it really well. There's. But you still want someone to help you navigate in this nearly infinite space of what ideas are worth engaging with. Someone kind of doing that. And that one. Even if AIs were in some sense better at that, I think we would always still prefer a human that we had a relationship with, because the way that we get motivated to be interested in things is a social phenomenon. If you have some specific technology you're trying to build, that might be different, you need to know. There's. But I think like the people listening to this podcast, they sort of trust your curation on, like, what's an interesting topic in the first place. It's not that they're landing on here because whatever your next topic is, that's like what they in a prior sense wanted to understand. They're trusting you as a curator.
[37:19]
A
Yeah.
[37:19]
B
So my role, and arguably that of like other mathematicians, might actually just shift subtly into that curation direction of what ideas are worth displaying. And that's a lot of my job Right now, even now, is basically like. I think people think a lot of the time for a video goes into the visuals, like, sure, a little. It is not, like, immediate, but, like, actually a lot of it is just deciding what's worth saying in the first place or what's worth putting there. And because that is. That's just. I want to engage with that. And I think I have a trust with certain people and they are curious what I would choose to put forward. Even if the AIs are better than that. In the same way that, like, human musicians are always gonna have a role because of that, like, social function of the story behind them. Even if they're like, objective quality of the MP3 file coming out is better from some model. That's kind of what I see happening to my job.
[38:07]
A
Yeah. I want to go back to this question of earlier, we were sort of just as AI has crossed this threshold, this important benchmark of being able to connect existing ideas, to come up with a new discovery or prove or disprove something. Just as this crosses threshold, we're like, okay, but what's the next thing? I want to just.
[38:27]
B
There's a lot more to do on that one, by the way. Just because a couple lightning bolts have been. I think there's like, this flourishing future over the next couple years of really connecting.
[38:36]
A
And so in the limit, you could even say. I don't know if this is accurate to say, but potentially a lot of maybe the biggest breakthroughs look like this. At some level. It's just general relativity. Oh, he's just connecting together. Romanian geometry and special relativity. Right. And so as AIs keep getting better and better at this connection thing, maybe a lot of big breakthroughs are not really of a different qualitative nature. I don't know if you have a take on that.
[39:03]
B
Well, I mean, a lot of the conversation focus has been on problem solving and that nature of math, like taking off Erdish problems or something. I would say it's not even a majority of mathematicians who would maybe characterize their work as really targeting the next problem to take down. Are you familiar with the Langlands program? No.
[39:21]
A
Okay.
[39:21]
B
So this is like, it's not even a field of math so much as it is a research ethos where Fermat's last theorem is one inkling of this. You had these two different seemingly disparate things, and a connection between them led to a solution. So Languens was a mathematician. He has this famous letter now essentially spelling out how it seems likely that there's a lot more connections like that. And even got a little bit more specific about the nature of the connections, such that you might imagine this large map and you've got this valley over here and this mountain over here and this set of plains over there. And there's a lot of mathematicians who would characterize their work as being part of trying to understand the threads on this map and the progress there. It's not even like, here's this one specific problem that we know will be solved by that connection. It's more that there's been enough time and time again cases where big problems were knocked down by finding connections that it's almost preemptively finding the connections.
[40:17]
A
Interesting.
[40:18]
B
And so you could have. Interesting, yeah, it's actually very interesting like this. Anytime you run into a mathematician, ask them whether the character of their work is more akin to Langland's program, or if it's more akin to targeting one particular problem and you get a certain bifurcated split there. But the possibility of AIs being supercharged connectors feels like it might be an amplifying tool in that pursuit. It's hard to measure, though, because this cuts to what we were saying earlier. How do you assign a score to say, yes, you've done it? If it's knocking down a problem, you have a clear way of saying, yes, you've done it. You can write the headline, you can have your PR move as the AI company to say, we did it. Whereas if it feels like that was the right connection drawn, you can write theorems around it. And this is the nature of what the papers in that field look like. But I think it will require a lot more human in the loop to basically say, was it the kind of connection that we're going for? But that's my guess on what most of the useful progress from these models will look like in the next five years is just really filling in that landscape of connections that you can draw. If you're an expert in multiple fields, like you've pointed out, it's kind of surprising we haven't already had this. And what I'd be curious. I would be curious to know at a technical level what causes the unlock there. Because on the one hand, you can kind of paint an explanation in your head for why you could be an expert in all of these things and not be drawing those connections, which is when the thing is reasoning, the method of reasoning is this autoregressive chain of thought phenomenon. Autoregression is actually a really, really weird way to produce Stuff. I think if you think about it, you're an intelligent person. Imagine I've locked you in a box, and then the only way that you have of interacting with the world is that you receive a slip of paper, and then someone says, can you predict what will come next? And then you predict what will come next, and then your memory's wiped, and then you get another slip of paper and you go, imagine that was done a whole bunch. And then what comes out on the other end? They're like, look at this essay that you wrote. You might look at that and be like, this is awful. That's not the essay that I would have written. Because the process of repeatedly predicting something is just pretty different from how you would think as a writer to compose it and think it through and everything. And in particular, what would probably happen is you're sort of a slave to your context, where you might be answering some question about some particular field. And so you draw on all the context around that and you're going there. The connection that actually is where all the substance is going to come from is by its nature a very unlikely one. And you can do all the RL that you want to try to get better in some way, but what's the thing that's specifically upweighting and incentivizing making these unlikely connections when the vast majority of them aren't the predictable, you know, next token that would come in there? And so it might be the case that you just have this intelligence that's sort of locked in there inside that box, but it's just a weird way of interacting with it. So the thing I'm curious about is, do you ever get any fruit by just questioning the premise of how tokens are generated every now and then in some way? Right. And I don't think it would be as simple as you manipulate the temperature or something like that, but are there any things that you can do that take the existing level of intelligence, but, like, find the right ways of sparking those connections that, like, unlocks these sorts of things that we've seen? Or do you need just a little bit more intelligence, such that at the level of prediction, it's kind of predicting that it should be making that lightning bolt to another field?
[43:55]
A
I think it's more predictive to reason instead of architecture or even loss function, to reason about data? Like, I don't know, we have diffusion models that do text, and they're like, the kinds of things they produce are not of a wholly different character. They've just not been explored as Much. I think the more relevant thing is what is the data on which whatever architecture, whatever loss function you have is incentivizing you to produce. And it does seem like they're getting better at like, okay, forget about math. I mean, we did have a couple of examples of this kind of thing, but if you just look at. Why are they getting better at being autonomous agents? I don't know. They're in an environment where autoregressively producing the step that says, let's step back and do a search over the whole code base, and then let's step back and assess. My mistake is the thing that works. I assume what happened in the case of progress in science, or maybe in math, is you have frontier math, like problems which require. Mathematicians have specifically designed them because they require connecting together two different fields. And I'm guessing there's all kinds of clever, partially synthetic ways in which to make harder and harder problems like that that require these kinds of connections. For example, by eliminating assumptions and still requiring the AI to continue to get to the answer. And then it doesn't really end up mattering what the loss function is. It's really about, can you come up with an environment which incentivizes this ability?
[45:29]
B
Yeah, it feels like you should be able to. I certainly can't speak to the correct ways of doing that that unlock all this, but it would just be pretty surprising, don't you think? It would be kind of surprising if over the next three years there's not just a lot more of those lightning bolts.
[45:43]
A
So this, I think, is an important thing to think about, which is we often think about how smart a single system is, and we don't think about AIs having advantages that are more the result of other facts about them. So in this context, the key fact about them is that we can just paralyze and arbitrarily scale them so that whatever level of capability they have, it's not just like one idiosyncratic genius in the history of mathematics who makes a few connections and then dies in a duel. It's just universally applying the waterline across all problems that are accessible at that level of capability. I feel like this is among the many advantages that digital minds inherently have that we don't think enough about the fact that you can. The other ones being the fact that they can merge all the knowledge together. At least there will be techniques that allow this to happen, that you can spawn off copies with identical levels of knowledge. But yeah, I feel like this parallelization is quite an important property. And I'd be curious about your predictions of even if they're not as smart as human mathematicians, the fact that they are just billions of for PR reasons, that the AI companies are dumping billions and billions of dollars at this would have a quantity has a quality all of its own.
[46:57]
B
That seems in the right direction. I mean, if we take that conversation between Montgomery and Dyson at the ias, that suggests some connection between Riemann Hypothesis or Riemann zeta function zeros and random matrices. That feels like the kind of thing that you could try to automate in that you have agents representing expertise in all these and basically having. Okay, we all know that an institute is smarter than an individual and that, like, the reason for having people all in the same geographic location is because you want those, like, serendipitous conversations to happen. What does it look like to sort of engineer those between agents? I mean, it's interesting because you sort of point out like, you can sort of pool all your knowledge. I actually wonder if one of the advantages is that you can do the opposite of that. Where you have. Sometimes when an AI is failing, it's because it sort of gets into a bad chain of thought and it's really hard to get it out of it, right? So you're like, I'll just like, start again. Same deal with humans, right? Like, sometimes you start thinking about it in a certain way, and actually what's required is to just, like, back up maybe sometimes the form of that. You know, there's stories about people trying to prove something for a long time, and then at some point they say, hang on a second. What if I tried to prove that it's impossible? Like, prove the opposite and that, like unwinding your own context and going at it with a fresh mind. You could imagine systematizing that or having multiple different agents deliberately given different pieces of context and try to compare and contrast there. We don't have the same level of manipulation on our own context. In this AI and math series, the first episode will be about when they solve the imo. And I want to focus on one specific IMO problem that they failed on, which is one that a lot of very smart students failed on. Terry Tao also failed on it. And the nature of it is basically that people were very mad at the problem because they called it a troll problem. I almost don't want to spoil it, because I want to construct the episode around leading someone in without knowing that it turns out to have a simple solution, because you can really empathize with what it's like to be a student solving this, basically there's a really elegant way of going down what you really feel like is going to be the solution based on the context of being the international Math Olympiad problem positioned as it is. The character of the solution is really enticing, but it's kind of hard to prove that it's the best. The reason is that it's not. There's this almost brain dead solution that is the best. And so the relevance of that to the whole AI story is for a human, what's required to answer that question is to escape your context. Escape the context that you're in the imo, escape the context of the way you've been trained to solve these contest math problems. And if you just approached it like a brain teaser that I throw someone off the street, they'd probably answer well. And you sort of want the same sometimes for human research in other contexts where sometimes just being able to say, refresh your thinking, come at it completely differently. So of all the advantages that digital minds have, that might actually be one of them, like a little bit more of a systematic what does it look like to refresh your thinking? Try answering two separate questions, like spin off two agents, One who's trying to prove it, one who's trying to disprove it, one who tries it this way, one who tries it. And they deliberately have different contexts. I would be curious to see if we're having this conversation three years from now, how many of the significant results that make headlines have that character of basically erasing the context, previously trying a bunch of different things, as opposed to merging the results of a bunch of different.
[50:25]
A
I think this is incredibly interesting because a common concern people have about AIs is this entropy collapse where they all think the same way because they're trained in similar ways. This is why they're bad at writing. They kind of just go down the same path and have similar patterns of speaking and so forth. But maybe actually the key advantage AIs have is that you can systematically. It sounded like one of the reasons the unit distance problem conjecture took so long to be disproven was because people assumed the conjecture was actually true. So mostly they were trying to figure out ways in which to prove it. And so maybe one of the key advantages the AIs will have is actually to increase the entropy by systematically trying out both the negation and trying to prove the positive of any given statement, or being able to systematically give different agents different biases.
[51:16]
B
That's a good point.
[51:17]
A
It seems like an important thing in the history of human science is that Einstein is just really motivated by this bias that things should look the same in different reference frames. And then he had multiple other biases like these. But that is just very formative in his thinking. And you can just systematically survey a bunch of heuristics and see which ones are being productive at a given problem.
[51:36]
B
Yeah. And so you would suggest basically systematically increasing entropy at the prompt level, even though you have this inevitable collapse at the autoregression level. Einstein would be an interesting example because it's like he's got this bias towards things should be relative. He also has a bias towards God should not play dice. And it's almost like you want to make sure that you don't accidentally have all of your LLMs or Einstein, because you might halt on quantum mechanics progress. Right.
[52:04]
A
Which actually goes to show you that there's not a correct heuristic for science. You actually just need multiple independent research programs with their own heuristics.
[52:11]
B
Yeah. And that feels like old school software. Right. As long as you're able to describe that in some way, you have old school software that amplifies that entropy in some way. And if you're able to put a clear ontology to the distinct ways of thinking that you want to prompt, you explore that full ontology. And then each individual one runs off doing what it is. But I think there's a certain design question there on how exactly do you describe the different approaches? The easy one is, are you trying to prove it or disprove it? The harder one would be to say, what are all the tactics that you could take to prove this? And make sure that you're sufficiently applying sufficient breadth to exploring that.
[52:50]
A
I don't think people appreciate the kinds of things that these models can just go handle for you when you equip them with a good harness. Like Kershaw, for example. I started publishing my episodes on bilibili for a hopefully burgeoning Chinese audience. But everything I upload there needs the sponsored segments cut out. Normally, that would have meant that I would have to ask my editors to go back through all the old episodes, cut out the ads, and re export everything. But in about just as much time as it would have taken me to send them that slack message, I can just tell Cursor to do it instead and spare them. And for research for the podcast, I have a whole repo that I've set up where I've just put every single book and paper that's been relevant to prepping for any of the Recent episodes. And I've been able to hodgepodge everything because the cursor harness is just extremely good at helping the model figure out exactly what information to pull, whether that's from my repo or from the web, in order to answer the questions I have while I'm doing research. So whatever you happen to be working on right now, just try pointing cursor at it. Go to cursor.comlorcache to get started. Obviously, AI for math is making a lot faster progress than everything else. And people point to verifiability of the domain as the key reason this is happening. I think that's one of the two important reasons, but I think people really neglect the other one. And I'm outside the labs. I don't know what's actually going on, but this is totally naive theory. Okay, a tangential question to why AI is making so much progress in math. Why has it been so slow at computer use? Which is what you, you know, computer is actually very verifiable. It's like, you know, is my Etsy package coming? Or like, is my event booked? You know, whatever. These are extremely verifiable things to survey. What computer use lacks is grindability. So because websites have, like, bot detectors and also it takes a tremendous amount of compute to run parallel rollouts, it's very hard to just run like 1000 parallel rollouts of the same checkout flow on Amazon because you'll get like, shut down by Andy Jassy. Right.
[54:47]
B
And so you can personally presses the red X on doorkesh button.
[54:52]
A
Exactly. And so you could try to build clothes every single website. This is very labor intensive and slows you down. And the reason, by the way, you need to do so many parallel rollouts in order to learn a skill currently with deep learning is that we haven't solved sample efficiency.
[55:08]
B
Sucking supervision through a straw. Yes, what he says.
[55:11]
A
Of course, people are working on many different techniques, but fundamentally there's this big problem and there's this big constraint in the way we train AI with code. Also, you can containerize a given level of progress in a repository and then just spin out thousands of parallel containers or hundreds of parallel containers and say, try to implement this feature. And it's totally deterministic. And because it's deterministic, you can solve the credit assignment problem because you know that whatever caused this rollout to succeed and this one to fail, the diff is the thing that worked. And this way you solve the credit assignment problem if you have situations that are starting off at different Starting points. This credit assignment problem becomes much harder to solve. But most of the things in the real world are just very hard to containerize in the same way. Coding and math are exceptions to this rule. But if you're just trying to figure out how do I build a new business that succeeds, how do I go trade in the markets for a day and make money? The fact that you had to interact with the real world and things change day after day means that you can't keep replaying and grinding and farming the simulator. But the math of course is the exception. And I feel like this is actually an important driver of progress in this domain and also in coding. It's not just verifiability, it has to be grindable. The third reason that people point out that AI is making fast progress is they focus a lot on Lean and formalization. Again, I have literally no idea what's going on in the lab. I feel like Lean just doesn't matter that much for the current level of progress in AI. Or why is AI able to solve the unit distance problem? Sorry, disprove the conjecture by the Unison problem. They released the chain of thought, or at least a rewrite of the chain of thought didn't have any Lean in it. I think it's just like the process based supervision that Lean provides where you know each step is correct. Seems like less relevant than just having this grindable outcome that is verifiable.
[56:59]
B
That's an interesting point. Like grindability mattering more. I guess I will say on the. Yeah, okay. So naively you might think Lean provides something unique for math because you're able to see if it can prove it. You have old school software that can tell you yes or no. You use that as your VR. I mean, so what would corroborate your point is the idea that like the initial attempts. Again, I'll just circle back to imo. It's like initially DeepMind basically does that. It's like everything in Lean and then the next year it's all in natural language. So to your point, not needed. I think there is a yet to be explored benefit of that formalization domain, which is at the moment you still need ultimately a human is reviewing that counterexample to the unit distance conjecture to say looks good. And that provides a certain bound on how endlessly explorable things are. If you consider AlphaGo, AlphaZero style stuff where they're just off in their own universe just playing a bunch of go and exploring themselves, just completely going potentially off the rails of what any human needs to look at, but they still have this automated, verifiable reward. It's not just that, hey, you can do RL on that. It's also you basically never have to check in and you can just pour compute at them. Exploring the universe of go what stands to be interesting. Maybe this won't pan out, but I think the jury should still be out on whether this will yield anything. With Lean, you could imagine having a basically endlessly running program that's constantly trying to extend MathLib. So MathLib, it's this GitHub repository that's basically like all of math written in code. It's very far from all of math, but they want it to be all of math written in code that you can ask like, is this proof correct? It's very labor intensive to write these proofs. There's like a whole sub community around it. But you could imagine what if you just had an AI where you say, simply try to extend mathlib. Maybe it's a fork of it so that it doesn't have trash in it, because people have certain taste for what they want to be in there. So you have your fork of the pure AI mathlib, and it just goes and it just doesn't stop. It doesn't need anybody to check in on it. It could just keep going. It might come up with its own conjectures, it might come up with its own theories and different definitions. Maybe many of them are useless, but it just has this infinite tree that it can grow out. That's a very unique thing that math has that nothing else has, where you could press go and then just pour, compute at it and look away for 10 years and then come back and say, what do you have? And there's going to be something. And then there's a question, is it useful or not? How do you suss that out? That's just an interesting thing to be able to do. It would be very surprising if that didn't yield some sort of interesting mathematical insight from it. So I think that's the real case for. Okay, there's two different ways that Lean is important in this story. That's the first one of them, basically, is how it's like you could let go, not even check in, and progress will be made. You can do that with Go. I don't think you can do that with natural language math.
[59:59]
A
That's very interesting. Did you see Karpathy's autoresearch idea? He wrote this basically one Python file that does basic LLM training and then just had a repo where LLM agents would try to make modifications to the file if it sped up the speedrun. The model modification stays. Eric Jang, who came on to explain how AlphaGo works, did a similar thing when he was trying to build in a very strong Gobot and he had interesting observations about the kinds of it's really good at just running an experiment and going down that path, but it's bad at stopping at dead ends and just doing extremely parallel things. Anyways, this will change the future. It's very interesting to think about what it looks like in the limit. I mean this is fundamentally what the human institution of mathematical research is, right? It's just like this is a library extended it in interesting and useful ways. And this way you don't have any outcome based supervision. No, there's no outcome that you're trying to incentivize, but you have a process. You know the steps are correct. You just don't know if it's going in an interesting direction.
[61:04]
B
But yeah, like if you were doing that, you don't want to completely go off the rails and like do a random walk through the space of logic. You'd probably want some like supervisor model that's trying to provide heuristics on whether it's useful or not. But yeah, something of that character. I mean, you know, people are working on it and like that's one of those, like five years from now. I'd be curious to be able to get the future version of us like talking about whether, like maybe that goes nowhere. But Terry Tao was talking about one research project that's basically try to exhaustively search the space of possible, like algebras. Like you could imagine different like axioms that you apply to algebraic systems. And so when we come up with group theory, there's a certain axiom system that has this flavor of they kind of look like arbitrary rules unless you know the motivation. But it's basically like, what if you tried all of them? Do any of these yield useful things? And the vast majority of them is just trash in some way. It all collapses to no interesting results. But every now and then there would be this little island of a completely different type of axiom system that at the very least seems rich in terms of the number of theorems that can come out of it. And that's like bread and butter for what you would imagine automated provers being good for exploring that space and seeing which one of them turns out to be something. And maybe one of those islands actually turns out to be something you can retroactively put Motivation on to say this is the kind of structure that's trying to get at in the same way that you could imagine looking at the axioms for a group not knowing that it's about symmetry, but retroactively realizing like, wow, this is very relevant to studying symmetry. So you could imagine results of that flavor. But instead of just exploring possible algebra systems, it's like all possible logical consequences of any kind of axiom.
[62:39]
A
On the point about whether you can provide process based supervision without Lean. So Deep SEQ had their Deep SEQ math model and they released a paper on how they trained it and it was quite interesting. So the problem with having natural language proofs is you don't know if it's correct or not. And so they have a verifier and then the verifier is trained by a meta verifier that makes sure that all the problems that they're training this model to solve and the art of problem solving, that the verifier is giving good feedback on that and it works. And so it's this interesting natural language verification with some sort of metaverification kind of works at least seems to work so far in the published literature and also it seems to work in the published products that we're using. If you look at coding agents, they're getting better and better at writing clean code and refactoring code and stuff like that. And I'm sure that there's process based LLMs judge kinds of things which are trying to provide taste and say, hey, is this a clean way to write this function? Are there duplicates of the same kind of modular forms and so forth? I feel like that should also work for mathematics. Right?
[63:46]
B
It doesn't seem, it seems more plausible for math than anything else, even if you're only working in natural language, that you could trust a verifier. I mean, you and I were talking earlier about why they're bad at writing and I was asking why you can't just. They seem to be good judges. If I give them two essays that students write, they'd be able to say which one's more accurate and insightful. So why can't you just have a verifier saying is this a good piece of writing or not? And maybe the ultimate failure there is even if they're good at discriminating between a B essay and an A essay, they're not actually good at discriminating between an A essay and a A thing you actually want to read. That would be followable on substack and insightful and all of that. They actually end up preferring just uninsightful pieces of writing. And so on the math front, I guess the question would be that step to simply know is this a correct proof or not that lends itself to an automated verifier. Even in natural language. You could probably still make a ton of the progress. I still like the sort of tree of logic out of Lean front, just in that you can really go off the rails. There's just no constraint on the previous way that things had been phrased before. In the same way that everyone talks about move 37 in AlphaGo and such, what is the thing that lends itself to just going outside the prior heuristics? It seems productive to have a disconnection from the rest of the world in that exploration as a complementary research pursuit to the natural language math frontier. I mean, the other relevance of Lean there would be like, okay, let's say you have your pure natural language RL environments and you have a pure natural language set of proofs and people have this set, like precede AI mathematicians and they go and they generate like 10 papers a day that produce a bunch of stuff. If the error rate, if there's any error rate to that at all. So Alex Kontorovic has talked about this. It becomes insufferable as a mathematician because you would basically be like, every single time I see one of these, I kind of don't know if it's worth my time. Even if 99 out of 100 of them are right, I don't know if it's worth my time to even go through it because it's really labor intensive to find what that error would be. And it's like really frustrating if it turns out you spent all your time on a paper that was trash. And so having anything that's able to give you that green check mark that says even if this is going to be complicated to understand, even if it's going to be a pain, you at the very least know it is correct. Like every other field would kill for that. Right. And math has that if the models are also able to take their natural language proofs and formalize them. And so that seems huge, right? The ability to have that. Every field would love to have something like that. And so I think you are right that Lean is maybe overrated on the side of the importance of it being used as a VR environment for any kind of just progress in math generally. But I definitely wouldn't write it out of the story.
[66:43]
A
Yeah, I also love this extension of Matlab as a metaphor for what's going to happen to our civilization pretty soon.
[66:52]
B
Sure, right.
[66:53]
A
It's just like for millennia humanity is building this corpus of knowledge and understanding and everything that we have now distilled into these models. And at some point the models will just extend that arbitrarily. By the way, on the writing front, I have a theory of why writing is making worse progress than these other domains. So I think one of them is what you said, that they're bad at judging not only A versus B, but they get just totally derailed by B, which is this shitty essay that just hits all the bells and whistles that A is supposed to hit. And then so the reward hack thing just totally goes off the rails. But I think the other important thing is that writing is not modular in the same way that code and math are. You can write a function many different ways and they kind of do the same thing. And of course you want it to be very clean and stuff, but at the end of the day, it works. It works. Same with lemmas and mathematics. And then you can have some end product that is different from the way it is produced. So the code is the thing that produces some end product, and you want a functional end product. Whereas in writing, the end product is directly the thing the AI is producing. And each paragraph, sentence, word matters because that is a thing that is like, that is the substance. It's not some separate thing that is produced out of the writing. And so it can't be slop in the way that code can be slop and still produce some outcome that you want.
[68:25]
B
But you were just pointing out how actually we've gotten much better at agents writing not just functional code, but clean code. Why is it not the case that the same progress that allows you to go from merely functional to clean and a mergeable PR doesn't also result in clearer writing?
[68:42]
A
Yeah, that's a good point. Also, has it not? I agree there's many ways in which they're terrible writers, but for a lot of writing I consume, I find it's better to just copy paste it into an LLM and just say, explain this to me. The explanation will be better than the thing that is produced by the human. So it's funny that we say these are such terrible writers. And also my reveal preference is just like, can I just have LLM explain it? Even when I'm talking to a human expert live on a call, if it's a piece of knowledge they have that only they have, that's not encoded in the distribution, I want them to explain it to me. But then if in order to understand that I need to understand a more basic concept, I would prefer if it was socially acceptable for me to just be able to say, let's pause there. I'm just going to ask NLM how that works and then we can come back to your special piece of knowledge.
[69:31]
B
Well, it sounds, I mean that's distillation and explanation. And so if I'm thinking of like quality of you as an essay writer, if it's that I give you a book to read and I want a book report. Right. Then I might believe that, okay, the LLM maybe gives me a better book report. But I think what people are really getting at when they say it's bad at writing, like what is writing? It's not just distillation of pre existing ideas. It's not just like how do you explain? Clearly, because they are good explainers. It's like what is the insight? And this is where it gets like just autoregression is a very weird way to generate stuff because when you're writing you sort of know in order for it to be good you have to have an element of the unpredictable. And it's not just like increasing temperature in your mind or something. Right. It's like knowing exactly the correct point when you want to make an unpredictable move and that that's going to be what's more insightful. And so even if it's better at explaining a pre existing thing, it's like what generated that book that you wanted distilled in the first place. Right? It wasn't an LLM that generated it and you just needed it. It's like some author who through a lot of exploration of ideas in the world and then deciding what aspects of it were interesting and which ways of presenting it were like the coherent, well motivated narrative. It's like they put that all together in some way and if they're a good author, it's probably one that actually you would err on the side of reading their book instead of the distillation. But still what makes it worthwhile to explore at all in the first place? And you're uploading it at all. I think it's all of that side of it that's the like when people cite them being bad at writing. And it's that element of unpredictability, of being deliberately choosing something that's novel that's like very directly contradictory to the way that things are being produced.
[71:13]
A
Yeah, that's a good Point. I think they're also really bad at building really good mental models of people, which I think is a very important skill in writing. So Annie Matuschak and another collaborator whose name I'm forgetting right now, did an interesting report where they tried to teach LLMs to write good space repetition prompts. And I really like this because even though it seems like a really totally random skill, people are talking about recursive self improvement in a year and we can't get these things to write good flashcards and what's going on there. Right, right. They tried many different kinds of techniques and they're like sophisticated people. They tried to rl open source models. They tried all kinds of including chain of thought in the big prompt. They sent to the best closed source model, et cetera. And the key constraint, it seemed to me, was that writing a good card is about projecting somebody's mind in three months and what is the way in which they will associate the question what kind of answer will be thinking by the moment? And is the elicitation that inspires the detail you actually want to take away from the passage you're trying to make cards about? I think writing also is similar to this where if you're writing something, the reason it's such an enervating process that takes so long is each word you should be thinking or each sentence should be thinking. What is happening in my reader's mind right now? Even if I flip the phrasing around so the end phrase goes to the beginning and this is the first image that comes to your mind before you read the rest of the sentence happens. Maybe autoregression is bad at that kind of. There's maybe a more diffusion like property of considering the whole rather than going sentence by sentence. But also I think that requires a lot of mentalizing, which these models weirdly struggle at.
[72:55]
B
Well, I mean, interesting question. Is it weird that they struggle at that? So I might butcher this. You know how when you cite studies that you once read and it's like maybe the study wasn't real or something? There's one very memorable one on. Okay, so let's say you want to quiz people's eq, you show a flashcard of someone's facial expression and someone's trying to describe what's that emotion? So I think there's really good tests online that'll have a face and then four possible emotions. And it's surprisingly hard to describe exactly the correct emotion. But you also get the sense there really is a correct answer. And if you try this with people in your life, you'll notice that the ones who actually are pretty plugged in socially do really well on it. And the ones who are a little bit more left brain don't. Okay, so that is a kind of test you can do. I vaguely remember an experiment to this effect where they took people who had freshly gotten Botox in some way and they did a pretest and a post test and post test, they were just much worse at reading people's expressions. That feels kind of weird.
[73:58]
A
They got Botox.
[73:59]
B
So the person taking the test, it's like, so you do the test and then you go and you get Botox and your face is all frozen. And now you are worse at understanding the emotions of what you see. Right. And the thought is that part of, part of understanding, like this emotion that you're looking at is doing it yourself.
[74:15]
A
That's crazy.
[74:16]
B
Like at a facial level, like, you know, moving your face muscles and it's like, you see that, you mimic that and you're like, oh yeah, that's anxiety. Right. At some like, very subconscious level. So in that sense, if it is the case that models have bad theory of mind, sure, they know everything because they read what everyone wrote, but at a level of actually able to put themselves in your shoes in the same way that my face muscles are mimicking your face muscles, that's what helps me understand how you feel. Not surprising at all. They don't have face muscles. Their brain works completely different. It's just like an alien trying to empathize. How could it have theory of mind? It would be this very emergent thing to have theory of mind, whereas we can just plug it into our own minds and it's like we've got the ready made hardware to just place it in.
[74:59]
A
That's very interesting.
[75:01]
B
From that lens, it's not that surprising.
[75:03]
A
Okay, Grant, we are both partners with jamestreet. I'm sure over the years you've interacted with a lot of James Streeters. What have you found that's unique about them or their culture?
[75:11]
B
I mean, I did this interview with them this year that partly was interesting because they don't usually have anything outward facing. I mean, in the industry they're known as having a pretty wild retention rate. People just stay there. And I think getting an inside view of that. I remember one of the comments someone was saying, even though the people have role titles like, you know, researcher or trader or engineer, they often don't know what their colleague's actual role is because everyone's doing a little bit of everything else. Like even if you're officially a trader, you're doing a lot of research, even if you're officially a researcher, you're doing a lot of coding. And I suspect maybe that's part of why they have the insane retention that they do. Because anyone who wants to be growing, they just have the chance to do a lot of different kinds of things.
[75:50]
A
All right, Grant, I'll do the plug for you this time. If you want to watch this full sit down interview that Grant did with some of the folks there, go to 3b1b.co JaneStreet all right, Grant, let's talk more about AI and math. What advice do you have about using LLMs to learn? So as I was describing for a lot of well known concepts, I find them very helpful but often just a couple of further messages down and I'm trying to understand something and they're so confused themselves or confusing me and they don't explain it the right way. And then I know that talking to the right human could clear up my confusion in three minutes. I don't know. And then I feel like more and more we're going to want to use these things. As somebody who's taught a lot about education and representation and stuff, we're going to want to use these things to learn things. Have you noticed the ways to use them more productively to understand concepts?
[76:46]
B
I'm curious to hear your take on this. I mean, I'll give mine even pre LLM, I feel like a relevant insight in learning was recognizing that who matters more than what. So advice to any college student when they're choosing what courses to take care a little bit less about your pre existing interests because they're kind of arbitrary right now and care a little bit more about whether the person teaching it is a good educator and someone you resonate with. I think in choosing what to read, like what books to read, who the author is maybe matters more than if it's a prior interest. So if there's a book you've liked before, read what else that author has written rather than reading another thing on that subject. And I'm getting to LLMs on this, there's a difference in feel for trying to learn something if you look at a Wikipedia page of it versus if you look at, let's say it's a philosophy topic and you go to the Stanford Encyclopedia of Philosophy, or if it's a math topic, you go to the Princeton Compendium of Math where the difference there is the articles are deliberately written by one individual who tries to actually craft a motivation around it and everything. Whereas Wikipedia, it's this local minimum that's reached where basically every sentence has to be correct. And I think a good exposition. You care a little bit less about correctness on the way, but you can deliberately craft things that are a little bit wrong that you correct along the way that gets edited out in a crowdsourced environment. So LLM explanations feel to me at the moment a lot like Wikipedia, which is to say amazing, right? Imagine world before Wikipedia, how long it would take to find and Sussen and everything. But nevertheless, what's the most useful part of a Wikipedia page? It's often just the references at the bottom. You look at the key references and you go to them and you read them. It's like actually sometimes that gives a much better overview of it. Often I like to just ask an LLM who should I read? And maybe I can even give some specifics on ways I want to learn. I actually got gaslit by this once where I remember trying to learn about semiconductors or something. I was like, this feels very visual. This is all text. I'm like, is there any really good well visualized math video or not math, Sorry, a well visualized video kind of explaining the concepts that you're getting at. And Claude was like, yeah, here's a couple. In the top one, it was like, here's one from three blue and brown. I'm like, I can guarantee that there's not. And it was an actual video, an actual link, but it just had misattributed someone else's to me. And it was good. And it was like I had a much better experience clicking over and watching that video to learn about the thing rather than trying to proceed forward with questions there. So in that sense, basically using it like a very souped up version of Google on zero in on the right human written resource. What about you? You engage with these a lot. What's the best way to engage?
[79:31]
A
I think you put your finger on it. The most productive learning sessions I've had is when there's some artifact that a human has produced, whether it's an article, a book, a video that organizes the relevant concepts in the correct way and builds up the motivation of why building up the next idea would be relevant to solving the next problem you did encounter and the next idea and the next idea, and then using the LLMs to just do a little bit pruning around this branch that the book has identified. So I was going through, I think you might have recommended Stephen Strogatz's textbook
[80:08]
B
on the chaos one.
[80:09]
A
Yeah.
[80:10]
B
Chaos and Nonlinear Dynamics. I love that book.
[80:12]
A
And so I was going through it and it was like bliss. It was like your videos in a book form.
[80:18]
B
He's so good.
[80:19]
A
It was super fun. And the way I was learning it is like I'd have on one third of the screen his lecturer from university. On one third of the screen, I'd have that part of the textbook. And on one third of the screen I have an LLM. And I was actually thinking, if I was back in college and watching this lecture live, it would just totally go over my head. These kids must be really smart because I'm pausing and reading the textbook and talking about LLMs and then restarting again, but with him curating. What is the right order to understand concepts? What is the right problem to motivate understanding a concept? Oh, Also another thing LLMs are really bad at is a thing a really good human can do is when you ask a question, they say, actually, you're just not really thinking about this topic the correct way. The question you want to be asking the correct way to organize these concepts is x. And LLMs just can't really do that.
[81:09]
B
Yeah, it's a little too placated. I mean, this is ultimately the supplicants. And that's very. Oh, what an insightful question. That kind of thing. You want to strip that down. That's a good point. And I think that cuts to theory of mind a little bit. Recognizing that to ask a certain kind of question reveals that the mental structures are not, at least not the same as what the explainer has. And sometimes people do this to a fault. I think a really good teacher. Let's say you have a middle school math classroom or something. If a student asks a question that suggests they're thinking about it in a different way, it's actually really hard to take seriously in the moment. Hang on. Could you get to a right answer with that before you say, oh? Instead of that, let's do this. And the really good teachers are able to jujitsu the creative way that the student was thinking about it and bring it in. I mean, LLMs aren't doing that when they are not reframing your question. Instead, they kind of run off. But at the very least, it feels like there's three levels here. And so LLM is at one, good explainer is at another. But then the A explainer is the one who can jujitsu your way of thinking and say, oh, that's where that's Useful. And so maybe there is a certain cycle all the way around where again, five years from now, the LLMs will still be doing that, but in the better way.
[82:31]
A
What is your recommendation to students who I'm sure email you this question all the time? Look, I was curious about doing mathematics. I'm really passionate about the subject, but seeing all the progress AIs are making, I don't know if it makes sense for me to pursue this as a career. And this is not relevant, not only to people in mathematics, but I'm sure to people who are noticing that their field is more and more getting productivity gains or whatever from AI. So coding is very adjacent to this. Yeah. What advice do you have for people?
[83:06]
B
I wouldn't trust any advice that I give. It would maybe be how I'd couch it. But even pre AI, it feels very important for any job that you're going to go into to really understand if we're talking about a job we're not talking about, you're a gentleman scientist and you want to engage with the math world or something, you should understand where the money's coming from and what value you're actually adding and the connection between those two. And I think often a surprisingly small amount of thought is put towards that, especially students. They're in this environment where they probably want to go into math because they've always been good at it and they've just been rewarded in life for proceeding through the next hoop correctly and next step. And when they think they want to be a mathematician, it's because it's a version of getting to continue to engage with that. It's like, well, I'll go, where do people get to do this? Rather than thinking, what value am I adding to other people? And to what extent is that the reason that salary is flowing in my direction? Because it's actually quite different in different cases. In some cases it's a very prestigious mathematician and their presence at a university lends a certain brand value and that's why the university wants them. In some cases, it's like the NSF grant is given because you've got this public good belief that we have that basic science has, and you've got this institution around that and there's going to be this whole bureaucracy around trying to act as a proxy for what we think that public good is. And a whole song and dance around how to correctly make them predict that your progress will be in the spirit of that funding. Sometimes it's just straight up teaching, right? It's like people like to Send their kids to an institute that has experts teaching them. And that's what you're doing. And you are providing the brand value by being an expert and then the direct value by being a teacher. So regardless of whether AIs are proving theorems or not, or whether we're talking in 2016 or 2026, that is a thing that not enough students thinking I want to be a mathematician think about. But I think it's worth thinking about. For me, I think that I just wasn't necessarily thinking about it and kind of stumbled into this career path where basically math exploration can be monetized as entertainment. And I stumbled into that. I'm very grateful that I did, but it was an accident. It wasn't this deliberate thing. And I think I could have avoided relying on serendipity and maybe done that a little bit more by design and had I been thinking critically about it. So to your question, if it's the case that you have almost automated theorem proving and then let's say it's the case they're also really good explainers. So it's like even to get the human understanding, I think a lot of the social role that mathematicians serve actually doesn't change that much. Right. You still have a sense of as a public, we sort of feel like there's value to basic science and we're trusting in the judgment of mathematicians to determine where their time is best spent. And the prestige comes from within that community. It's like other members saying that this was a really good result more than it is the grant writer who really understands algebraic number theory to understand that it's a good result. And so there's going to be some inner culture of what constitutes valuable contributions. Maybe it shifts away from theorem proving and maybe it shifts towards good definition writing. Maybe it's that museum curator idea. But you're going to have that same community. And as long as society as a whole is still valuing the premise of basic science, and if we're in the abundance world of what AI brings, probably there's more funding in that direction in some sense right on the side of prestige to institutions for who their lecturers are. I actually think teaching is one of the most stable post AGI jobs that there is because it's so relational. It's so this is where parents want to spend their money if they have an abundance of wealth is on good teaching and good educating. And it goes so far beyond explanations. Even if LLMs are good explainers, the thing that a teacher is doing is such a Social coaching, mentor type thing that that's probably one of the most stable careers that's going to exist over the next 50 years. And so insofar as what a lot of mathematicians role is overlaps with that, you as the prospective student going into it, you could lean into that. Actually, I think a lot more students should think about and pay credence to the idea of being just a math educator and the value that that can serve towards the next generation. So I'll couch again on. I don't think I'm the one to say here, prospective young mathematician, here's how you should think about the future. Because I'm like a YouTuber, right? I'm someone who is not in the institution that they are thinking of going into. And so I'm speaking as an outsider looking in, but it feels like, like generally good universal advice. Know where the money is coming from, know where you plug into that. And like if you're just asking those questions, you're actually already like steps ahead of all of the other, like fledgling prospective mathematicians.
[87:51]
A
Yeah. And in fact, I think in the crazy world, in the world where within 5, 10 years the AIs are coming up with not only solutions to the Millennium Prize problems, but coming up with just totally novel problems to be solving in the first place, novel mathematical fields and objects and stuff, it is in that world where first of all, there's a ton of abundance. And two, the things that AI minds will have gone furthest in, where they will have seen furthest beyond our horizons will be mathematics. And there will be so much demand of what have the AI seen? Can you explain it to us? Yeah, I feel like in that world, if there's any jobs whatsoever, surely distilling what the AIs have learned will be one of them.
[88:36]
B
Also, it's funny because all of this sort of presumes that it's useless, Right? Like we're not talking about the actual practical applications of what math is being done. So insofar as there's any economic utility to it, you would imagine that the people who understand it and are able to make the decision of where it should point, they actually have a lot more economic value. By being able to make that judgment as curator and point this behemoth of new math pointed in a useful direction, suddenly that's a much more levered move to make than it had been previously.
[89:07]
A
Can I actually ask you about that? Obviously one question for AI, for math, is not only can it do it, but is it any good or is it any good for anything? You were describing all the ways in which group theory were trying to solve this. We were trying to figure out random facts about the roots of different kinds of functions. And now it's all these different applications that are practical across many different fields. Do you have some sense of if we just totally get to a place where mathematics is, the field of human mathematics is accelerated 10x or 100x that some crazy shit happens, or are we just actually going to be bottlenecked by other fields or.
[89:50]
B
I think there's some fields that probably will. I mean, it's super spiky, right? I think progress in algebraic number theory, it feels unlikely that that then unlocks something. But I don't know. I remember talking to this mathematician who does more dynamics and PDE solving type stuff, and he was referencing, basically his group had some ideas that. Let me see if I summarize this right. It's like the way that Boeing would make planes is they would make it and then they would do a bunch of tests and they had to disassemble it and reassemble it based on those tests. And they essentially had some insights on how to do more things in simulations such that you don't have to deconstruct and rebuild it. And it saved Boeing just like billions of dollars or something. And then they just started funding that group. It's much more obviously application adjacent because PDEs just sort of are that. So progress in that domain, you would imagine actually do unlock some things. And I don't know if it's these step changes, but maybe it's more on the side of engine design becomes just a little bit more fluid or coming up with the right wing shape instead of running a whole bunch of complicated cfd. Or maybe you're able to speed up your CFD simulations because of certain pure math insights that makes those more efficient. I bet you'd just see a lot of great incremental improvement there. It seems less likely that the massive breakthroughs in math immediately turn into this massive economic breakthrough. You solve the Navier Stokes problems and then that unlocks an ability to simulate more things. But you probably will see at those fringes just some meaningful leakage outside of the pure math insights into other things also. I mean, there's a ton of people working on things like AI engineers, like physical engineers, like material science and things like that. That would be. You have to imagine that they would be in a good position to look at the AI math insights and decide if they're relevant in some way or not. And so it's another one of these things where I'm not going to sit here and put a flag in the sand predicting that there will be. But it would be a little bit disappointing and a little bit surprising if there weren't over the next five years, like, economically valuable improvements that were made that were directly referable to AI progress in math. That just would be kind of disappointing if it was just taking down a bunch of Erdos problems and none of them actually, it wasn't doing any of the math that actually directly touches physical worlds.
[92:24]
A
Yeah. To your point about. Well, a lot of history and mathematics is about building up these piles of concepts and connections and whatever, and sometimes the piles connect with each other or you discover an application somewhere else. At the very least, you just build up this huge pile. And then as broader progress in society happens during the singularity, when we get to the industrial part of the singularity, you just have all these different ideas that you can hopefully are useful in other parts of the world.
[92:55]
B
Like I said, one of the interesting things about what's happening is it causes people to step back and ask, what is math? And maybe one of the awkward conclusions of it will be revealing. Like, oh, man, over the last. It's just become wholly useless. Like, the kind of questions being asked have become, like, so divorced from things that are physically applicable that that's one of the things mathematicians have to come to terms with, where everyone will look and be like, hang on a second, weren't you guys supposed to, like, if there's so much that's like 10x progress there, why aren't we seeing it over here? And then Matt Church is like, every time we wrote those grant proposals and said, like, trust us, like, the elliptic curve progress is going to help with, like, cryptography. Like, it shines a light on the fact that maybe it doesn't. So that's one possibility.
[93:36]
A
Grant, this is super fun. Thanks so much for doing it.
[93:38]
B
Absolutely, my pleasure.