Summary7 min read

OpenAI Podcast – Episode 20

How a Reasoning Model Cracked an 80-Year-Old Math Problem

Date: June 4, 2026
Host: Andrew Mayne
Guests: Alexander Wei, Hung Xing Wu, Li Jiechen – members of the OpenAI Reasoning Research Team

Episode Overview

This episode features an in-depth conversation with the OpenAI team behind a math breakthrough where a general-purpose reasoning model solved (disproved) the longstanding Erdos unit distance conjecture—an 80-year-old open problem in combinatorial geometry. The team delves into the evolution of reasoning models at OpenAI, the significance of the breakthrough, the verification and reactions from the mathematics community, and what this advance hints at for the future of AI-human collaboration in research.

Key Discussion Points and Insights

Background: AI and the Benchmark of Mathematical Reasoning

Reasoning as a Grand Challenge in AI
- High-school competitions like the International Math Olympiad (IMO) and the International Olympiad of Informatics (IOI) served as milestones for AI development ([01:28]).
- Alexander Wei notes, “For a long time these were sort of an implicit grand challenge in AI. When would we be able to get models that could perform as well as the best humans on these exams?” ([01:28] – Alexander Wei).
Progression from Immediate to Deliberate Reasoning
- Previous AI models responded instantly, “without thinking.” Introducing more test-time compute allowed models to reflect and try multiple approaches ([02:54]).
- “What inference time compute, test time compute does is you now… give the model a chance to think and improve its answer and try different things before having to finally output something.” ([02:54] – Alexander Wei).
Unexpected Speed of Progress
- The team originally targeted IMO gold “maybe…by April.” Instead, the breakthrough happened by June, much faster than anticipated ([03:39]).

The Erdos Unit Distance Conjecture: Problem and Proof

Nature of the Problem
- The Erdos unit distance conjecture asks: Given N points in the plane, how many pairs can be exactly unit distance apart, and how does this number scale as N increases? ([06:42] – Alexander Wei).
- The prevailing assumption was that a square grid was optimal; the model disproved this ([07:45]).
How the Model Cracked the Problem
- The team stress-tested their model by inputting a suite of Erdos problems, hitting the conjecture almost as a “capabilities benchmark" ([08:19] – Li Jiechen).
- “We both saw some crack solutions. It was really, really exciting for us.” ([08:46] – Hung Xing Wu).
Verification and Community Response
- The proof was first validated by the model itself, then skeptically reviewed by expert mathematicians in-house. Initial disbelief (“there’s no way this can be true”), then reluctant acceptance after scrutiny ([09:02–09:39]).
- “After just they think about it for a day, they couldn't figure out any mistake. Then they became more convinced... everyone had a hard time sleeping because it's so exciting.” ([09:07–09:39] – Hung Xing Wu).

The Model’s Capabilities and Approach

General-Purpose Reasoning Model
- Not a math-specific model; it’s general-purpose and can “code, look at websites, find information” ([07:45], [17:28]).
- “This is something that can be published in the best journal of math. It is way beyond IMO level.” ([10:46] – Hung Xing Wu).
Model’s Methodology and Grounding
- The model checked definitions (“looked up the word unit in the Cambridge dictionary”) to ensure task comprehension ([17:46–18:03]).
- Frequently restated definitions to validate its understanding ([18:14]).
Proof Creativity and Human Readability
- Applied class field theory from abstract algebra to combinatorial geometry—an unexpected, creative crossover ([16:01] – Alexander Wei).
- “Being able to make the connection requires quite a bit of insight and creativity… to execute the proof is also a very delicate, careful affair.” ([16:01] – Alexander Wei)
- The model’s full solution spanned 125 pages of reasoning steps, some surprisingly inventive even if not all led to the final answer ([29:22]).

Meaning and Reactions in the Broader Community

Community Excitement and Application
- Announcement triggered researchers to send the team their own open problems ([12:13] – Li Jiechen).
- Within a week, human mathematicians used concepts from the AI’s proof to solve further open problems, demonstrating AI-accelerated breakthroughs ([18:38], [35:40]).
Empowerment, Not Intimidation
- “I think it should not be intimidating. I think it should be empowering.” ([18:34] – Li Jiechen).
- Human intuition and long-term theorizing still essential: “Currently, it seems AI cannot build a new theory for math… but with the help of AI, [humans] can just grab all the ideas from distant fields.” ([19:18] – Hung Xing Wu).
Advice for Researchers
- Get a GPT Pro subscription, ask the boldest questions possible, don’t be overly conservative ([22:36] – Li Jiechen).
- Allow AI to surprise you—human decomposition of problems can sometimes limit the solution space ([23:07]).

The Future: Implications for Science and Mathematics

A Virtuous Cycle in Model Improvement
- More test-time compute, better reasoning, more discoveries—paving the way for self-accelerating scientific progress ([32:44]).
- The acceleration seen in coding (with Codex) predicted to soon be mirrored in theoretical research ([26:21]).
Role in Cryptography and Quantum Computing
- Potential for AI to stress-test or validate cryptographic foundations—finding proofs or loopholes ([37:45] – Hung Xing Wu).
- While distinct, AI can accelerate the research and error-correction techniques in quantum computing ([39:21] – Li Jiechen).
Limits and Challenges
- For the hardest mathematical problems (e.g., Collatz conjecture, P vs NP), full solutions may remain distant—forming the next frontier ([27:35]).
- Novel branches of mathematics or theory generation by AI “is still very, very open” and further off ([30:00] – Li Jiechen).

Notable Quotes & Memorable Moments

Surprise and Optimism

“Maybe this is one in a hundred times where it’s too good to be true, but it’s actually true.” ([00:30], [40:25] – Alexander Wei)
“It feels a bit natural that this model would do something amazing… It’s like living your dream.” ([09:44] – Li Jiechen)
“When Codex becomes much better, we can do so much more for you. You would expect you'll work less…but somehow you actually work more because there's way more thing you can do.” ([26:21] – Hung Xing Wu)
“You could be an astronomer and not use a telescope, but you kind of have to ask why.” ([34:23] – Andrew Mayne)

Collaboration

“I think the dream world will be everyone have some access to the top-level reasoning ability…OpenAI will accelerate science a lot because you are empowering every scientist to accelerate science worldwide.” ([21:14] – Hung Xing Wu)

Empowerment

“Model can make good breakthrough on some very hard questions we don’t know how to solve, but then how to digest that idea, how to use that method for other good things. I think human still has a role in this.” ([18:38] – Li Jiechen)

Important Timestamps

00:18 – "Everyone had a hard time sleeping because it's so, so exciting." – Hung Xing Wu
01:28 – Context for IMO and IOI as AI benchmarks – Alexander Wei
02:54 – How test time compute changed model reasoning – Alexander Wei
06:42 – The Erdos conjecture problem and its significance – Alexander Wei
08:46 – Discovery moment: “We both saw some crack solutions.” – Hung Xing Wu
09:39–10:46 – Verification and excitement within the team and community
16:01 – On the creativity and execution of the model’s proof – Alexander Wei
18:34 – On AI as empowerment, not intimidation – Li Jiechen
21:14 – Vision for democratized scientific discovery – Hung Xing Wu
23:07 – Why letting AI directly tackle big questions can be more effective – Li Jiechen
26:21 – “Somehow you actually work more because there’s way more thing you can do.” – Hung Xing Wu
32:44 – Virtuous cycle of scaling models for scientific discovery
35:40 – Human mathematicians rapidly building on AI’s discoveries – Alexander Wei
37:45 – Models and the foundation of cryptography – Hung Xing Wu
39:21 – AI accelerating quantum computing advances – Li Jiechen
40:25 – “One thing you learn very quickly as a researcher is that if your results are too good to be true, you probably have a bug somewhere...but maybe this is that one in 100 times.” – Alexander Wei

Conclusion and Look Ahead

The episode paints a picture of rapid, exhilarating progress in AI-powered mathematical reasoning, underscored by the historic disproof of an 80-year-old conjecture. While the potential is vast, the team emphasizes collaboration, empowerment, and the ongoing importance of human insight—in mathematics, research, and beyond. The next frontiers will be theory-building, broader scientific disciplines, and making these tools accessible to accelerate discovery worldwide.

Loading summary

Transcript129 lines

[00:00]
Andrew Main
Hello, I'm Andrew Main and welcome to the OpenAI podcast. On today's episode, we're speaking with Alexander Wei, Hung Xing Wu and Li Jiechen from the reasoning research team behind a recent math breakthrough from an OpenAI model. They'll tell us the story behind the discovery and what stood out to them about the reaction.
[00:18]
Hung Xing Wu
Everyone had a hard time sleeping because it's so, so exciting.
[00:23]
Li Jiechen
Okay, this model is something that's really amazing.
[00:25]
Hung Xing Wu
I mean, this is something that can be published in the best journal of math.
[00:30]
Alexander Wei
Maybe this is one in a hundred times where it's too good to be true, but it's actually true.
[00:38]
Andrew Main
Leejay, tell me what you work on.
[00:40]
Hung Xing Wu
Oh, I work on reasoning with Alex.
[00:42]
Andrew Main
Okay, how did you find your way into reasoning?
[00:45]
Hung Xing Wu
Last summer, Alex had his breakthrough in IOI and imo, I used to be a participant in ioi and then I was like, oh, that's crazy. Model can already win medals, gold medals. At that time I was an assistant professor at UC Berkeley. But then I'm thinking, like, maybe I should try to rethink my career. And it seems like making the model smarter will maybe have some bigger impact on the world. And then I just kind of had a conversation with Alex back in last October, and then I got super excited about this thing and eventually I just joined OpenAI.
[01:24]
Andrew Main
We hear IOI and IMO come up a lot. Alex, you want to unpack those for everybody.
[01:29]
Alexander Wei
So IMO and IOI are these two competitions for high schoolers. They stand for the International Math Olympiad and International Olympiad of Informatics, respectively. And these are just devilishly hard math problems. You get two sessions for each of these exams that are like four and a half to five hours and you just have to do three problems. And so for a long time these were sort of an implicit grand challenge in AI. When would we be able to get models that could perform as well as the best humans on these exams?
[02:06]
Andrew Main
That was a pretty interesting starting point, I think, for measuring the success of the model. And we're here to talk about how far things have gone since then, which is pretty incredible. But how did you find your way into reasoning?
[02:16]
Alexander Wei
So I did my PhD in ML and towards the end of my PhD I got excited about this idea of spending more compute at inference time to solve harder and harder reasoning problems. At the time I was playing with GPT 3.5 Turbo in the API and I didn't really get any interesting results. But there was this team at OpenAI that seemed to be doing something pretty similar. And so I got super excited about it and was lucky enough to be able to join.
[02:49]
Andrew Main
So probably the simplest way to describe that is like letting it inference time is basically letting the model think longer about it.
[02:55]
Alexander Wei
Yes, that's right. So basically before this era of test time compute, models sort of answered immediately without like right off the cuff without thinking. And what inference time compute, test time compute does is you now let them all, give them all a chance to think and improve its answer and try different things before having to finally output something that obviously just helps make the model smarter, lets them do things that they wouldn't otherwise be able to do instantly.
[03:26]
Andrew Main
When you started to work on reasoning, did you have an idea of where you wanted to see this go? Like what your expectations were? Were you looking at it purely from, hey, this is very cool from an academic point of view or did you have some sort of other vision?
[03:39]
Alexander Wei
I think for me, the draw of reasoning when I first got excited about it was that this was something that models just obviously can't do right now. So this was like end of 2023, start of 2024, models were struggling with grade school math problems. And so at that time it was just like, can we just get these models to do something reasonable on math at all, let alone have them be much, much better than I am at it. I remember my first day at OpenAI. Nolan Brown asked me when I thought models would get IMO gold. That was just a benchmark we talked about. I think at the time a lot of people, even within research thought that IMO gold was out of reach this year. But maybe 2026, I felt like I had an idea that if we just pushed for it, maybe I thought we could do it by April. It took until June to get a really good model. And then IMO rolled around and we were able to get gold. And I think zooming out, I think this happened a lot faster than I expected. And it's crazy to me that progress since then has kept up at this same sort of blistering pace. It was just 10 months ago, but it feels like the IMO level of problem feels far in the rear view mirror of AI today.
[05:18]
Hung Xing Wu
No. Brun asked me the same question. I mean, not about IMO go, but about whether model can solve P versus np. I think P versus NP might be something quite hard because I think the reason is that I think for solving P vs NP you would need to build a new theory. Maybe you have to write many books of new ideas to get there. So currently it seems we are still far from that, but maybe who knows what will happen in the future.
[05:47]
Andrew Main
So yeah, what do you work on?
[05:49]
Li Jiechen
Oh, I was working on strategy of computer science. I was collaborating a lot with lijie in my PhD, I was at Berkeley and I remember when Owan came out I was talking to my advisor saying oh, there's no barrier in model solving math problems anymore. I think he just smiles and he knew that he was gonna lose a student.
[06:14]
Andrew Main
Oh wow. So let's talk a bit about that because it's an interesting point because as you said it went from the model would just have a moment to try to figure out the answer, then all of a sudden you've given it the ability to spend longer and to think about it, you know, reasoning and the results have come pretty quickly and I think surprising a lot of people. You had a model that was able to basically disprove one of the Erdos conjectures. Could you explain that just a little bit?
[06:42]
Alexander Wei
Yeah. So our models, last week they were able to produce a proof, or a disproof rather of the unit distance conjecture due to ERDS. And this was an 80 year old open problem in the field of combinatorial geometry where basically the question concerns if you have endpoints, let's say on a piece of paper, how many of them can be one inch apart? Exactly. And how, how many pairs can be one inch apart? Exactly. And how does this number grow asymptotically with the number of points on the piece of paper?
[07:23]
Andrew Main
This wasn't a trivial problem when Erdos put this together. The idea was to say that it could, I think ideally it had to be only done on a plane or something like this, but there was the idea that maybe there was no better way. And this has been out there because it's a very interesting problem and the fact that a model solved this is pretty profound. And also this model was a general purpose model, correct?
[07:45]
Alexander Wei
Yes, that's right. So Erdos original conjecture was essentially that the optimal solution to having as many distance 1 points on the plane was to arrange them in a square grid. And what the model proved was that the squared grid was not actually close to optimal at all and that you can do much better with a different construction using a lot of high powered number theory.
[08:17]
Andrew Main
How did you choose these problems?
[08:19]
Li Jiechen
I guess we, we didn't really choose the problem. What happened was we want to test the upper bound of our model's capability. So we just use a selected subset of Erdos problems and to test the capability of the model.
[08:33]
Andrew Main
I would love to know one who is the one that hit enter and asked the model the question?
[08:39]
Hung Xing Wu
I guess both of us like and
[08:41]
Andrew Main
Hongqing, you guys at the same time, like press?
[08:45]
Hung Xing Wu
Yeah, maybe.
[08:46]
Li Jiechen
I think what happened was actually we were testing like two side different internal models and we both saw some crack solutions. It was really, really exciting for us.
[08:58]
Andrew Main
How did you know that it worked?
[09:00]
Hung Xing Wu
Of course you first asked the model to check it.
[09:02]
Andrew Main
Okay.
[09:02]
Hung Xing Wu
But of course, you know, model, sometimes they are not reliable.
[09:05]
Andrew Main
I got it. It's good. Don't worry about it.
[09:07]
Hung Xing Wu
Yeah. So then we just. After we check it with the model, it seems plausible. Then we just ask a bunch of. Although our mathematics friends in the company, you know, MATAB and Mac Selki, and at first they were like, oh, there's no way this can be true. It's a major open problem. But after just they think about it for a day, they couldn't figure out any mistake. Then they become more convinced. Then eventually they're like, actually this may be correct. Then everyone had a hard time sleeping because it's so exciting.
[09:39]
Andrew Main
What was the conversation like when you started getting, you know, people saying that this was accurate?
[09:45]
Li Jiechen
For me, I was not that surprised because I guess when Matab first say, okay, first, what happened was first Matav said, this is definitely wrong. But I actually knew that he probably just spent like 5 minutes, 10 minutes looking at it. So in my heart, I don't really believe that. But later he told me it's 50%. I was thinking, okay, if we extrapolate the trend, maybe next night it will be 8, 100%. So, yeah, I guess it's a little bit dreamlike, but also was like, it feels a bit natural that this model would do something amazing. Later it just become more and more real that this might actually be correct. This might actually be a big deal. The first time Noddle can publish something that would get into top math journals. We knew this day is going to come, but never knew that it's going to become reality so fast. It's like living your dream.
[10:47]
Hung Xing Wu
I mean, this is something can be published in the best journal of math. It is way beyond like IMO level. So I only expect something to happen at some time, but at some point. But maybe not just this. May. Yeah.
[11:05]
Andrew Main
One of the things I think that we've seen emphasized at OpenAI is that OpenAI doesn't try to train to specific benchmarks and stuff, that OpenAI tries to build really good general overall models. And I think sometimes people say like, well, we just try to build a generally smart model and we find these things a lot of the way and when it comes to reasoning it's the same thing. Something that's really good reasoning overall, you find these capabilities. Does that ring true for you or.
[11:28]
Alexander Wei
Yeah, I think for this model in particular, I think it's one that I think all of us have also just used in lieu of the current model in codecs and it works quite well as just a general purpose model having the capabilities to do this unit distance result. I think people will be able to do this at home in the near future.
[11:57]
Andrew Main
It's been exciting to see people react to this and pay attention to this. We went from just a very short period of time ago where people said models weren't good at math and now models are doing this. What have been some of the more fun things you've seen online or reactions from people?
[12:13]
Li Jiechen
Ever since we announced the results, my friends in TCS started to ask me to try their open problems including my advisor gave me like two, three open problems to try on. I think the reaction was very positive.
[12:30]
Alexander Wei
I think people really get a sense that the frontier of AI today can really come up with research output that I think many human mathematicians would be proud to achieve. And I think it's really great that we're able to communicate this, that this is the frontier of progress to the rest of the world. I've seen people make these designs of trying to sketch out the model's construction and if you plot it on a grid it's actually this very pretty symmetric geometric design.
[13:07]
Hung Xing Wu
Yeah, I guess we are thinking maybe try to make one of the design, put them in a frame and put them on a desk or something to celebrate this moment.
[13:19]
Andrew Main
Yeah, I think it's going to be fun when we start seeing it. Things like tiling problems and other stuff where we can actually just look at the artifacts that we need. So we've been hearing more about Erdos problems lately and some seemed like they weren't as challenging to solve as perhaps as people thought. They just needed some attention. Yet this one seems to be a little bit more complicated. Where would you rank this Erdos?
[13:41]
Hung Xing Wu
I think he proposed like 1,000 questions or more. Right. So each problem is just collection of all the problems he has asked some problem he has offered some money for solution some problem he just noted. And this problem he offered I think $500 which is from last century. So it was a little bit. And also this is one of the central question in this field of discrete geometry and this has been heavily discussed by mathematician in many discrete geometric papers. And so it's kind of one of the question people have thought about a lot and really want to see the answer. So I would say this is more like a major open problem in a concrete field of mathematics instead of some just like many other question which may be just some. Something I just ask after lunch or something.
[14:40]
Andrew Main
So how do you collect that $500? Did it disappear when he passed away?
[14:46]
Hung Xing Wu
I think there's a special agency for that. But you usually just frame the check.
[14:51]
Andrew Main
Yeah.
[14:52]
Hung Xing Wu
So maybe we'll just frame the check in Sam's office. I don't know.
[14:56]
Andrew Main
How do you feel this proves that reasoning is effective?
[15:01]
Hung Xing Wu
I think the biggest proof is that if you look at the plot in the official blog, if you give model more time to think, the accuracy on its problem grows faster. Like if you give it a lot of time, it can get almost 50% correct. So more thinking, more correctness. I think that's really a proof of reasoning being effective.
[15:20]
Andrew Main
But Alex, to go back to this, this isn't a math model. This is a model that can do a lot of many different things. Do you see a correlation between, as these get better at solving things like mathematics, that it works with other general problems?
[15:33]
Alexander Wei
That's the hypothesis at least is that this model was not trained specifically for math. And we just wanted to, we had this new model how we came about this, we wanted to take it on a test drive essentially. And so we evaluated it on some very challenging math problems and to just see what can it do.
[15:56]
Andrew Main
When you go through the proof and you look at what it came up with were the things that surprised you, things that you would describe as creative.
[16:02]
Alexander Wei
So for some context, like the proof is like well above my own mathematical pay grade, but like just at a high level. My understanding was that, you know, this, this idea of taking class field theory and applying it to problems in combinatorial geometry hadn't really been done before though this was though there were like, you know, though some people like knew that there was there could be this bridge. But between these two fields, being able to do that and execute it requires first of all, to make the connection requires quite a bit of insight and creativity. And then to execute the proof is also a very delicate, careful affair that very few people would be able to do.
[16:48]
Li Jiechen
I think the most surprising thing for me is you tell model to do something and you went to have a lunch and when you come back you'll see that it actually does much better than you thought. And at that moment you feel like, okay, this model is something that's really amazing.
[17:06]
Andrew Main
So going Back to GPT 3.5 Turbo and working with that and looking at a model that was doing automatically instant sort of inference and figuring these things out to now a model that's able to do incredible mathematical proof. Is it using tools, is it using Lean, is it using some other things like that? Or is this doing purely inside the model?
[17:28]
Hung Xing Wu
For this particular case, the model basically is like codex. It can code, it can look at the website and find information. Yeah, so it's Basically a general ChatGPT setup. ChatGPT can also write Python and execute them. But I don't think the model writes anything.
[17:47]
Li Jiechen
I think Lijie has a story about the Cambridge dictionary.
[17:50]
Hung Xing Wu
Oh, okay. So, okay, the first thing the model do when it gets to the website is to check what unit means in the Cambridge dictionary. It's a little bit ridiculous. Yeah.
[18:02]
Andrew Main
So it like looked up the word unit.
[18:04]
Hung Xing Wu
Yeah. You also make sure it has the absolute correct understanding of what is unit.
[18:09]
Andrew Main
Have you seen it do other things like that where you're saying like, oh, it's trying to ground itself to make sure it understands the question.
[18:15]
Li Jiechen
And definitely a lot of time in the model answer, it will actually explain the definition again to show that it actually grounded the definitions.
[18:25]
Andrew Main
As people who are very knowledgeable about computer science, people who know a lot about mathematics, is it intimidating to all of us and see this happen?
[18:35]
Li Jiechen
I think it should not be intimidating. I think it should be empowering.
[18:38]
Andrew Main
Okay.
[18:39]
Li Jiechen
After the proof actually come out, mathematician has improved. First improved, the bound improved. And second, they use the intuition motivation of the construction to knock down other open problems as well. So I think the trend is going to continue. Model can make good breakthrough on some very hard questions we don't know how to solve, but then how to digest that idea, how to use that method for other good things. I think human still has a role in this.
[19:14]
Andrew Main
So what do you think the role of somebody working in mathematics is going to be like five years from now?
[19:18]
Hung Xing Wu
I think there will be a lot of AI and human collaboration because AI and now AI, they know a lot, they can connect distant ideas. But human can also think for longer. Currently, it seems AI cannot build a new theory for math, for example. But I guess human, once they have the help of AI, they can just grab all the ideas from distinct field of mass. I think they can empower human way more.
[19:44]
Andrew Main
Do you see this working into other fields? Are we going to see discoveries in physics?
[19:48]
Alexander Wei
So I can't speak for physics, but I guess we're all researchers in AI and I think definitely for me my day to day looks kind of completely different than when I first started doing research in this field. I think so much of my work is now done by coding agents I can just do so much more and I think that's been a sort of magical feeling that with AI you're really starting now to feel like you can use AI to build AI faster.
[20:27]
Andrew Main
How much has AI changed the way you do these sorts of things?
[20:29]
Li Jiechen
I think change it completely. Even when I just joined half a year ago I was hand coding the codes looking up the slack channels for directions but now the default is just ask codecs and I ask Codex do a lot of things and then I just go to launch, I just go to talk to people the world completely
[20:56]
Andrew Main
changes and now you use Codex on your phone and you can check on it. Yeah, it's interesting how much more I want to do things now that you have this sort of tool that can work all the time and do stuff Lijie how do you explain this to your friends who are sort of trying to understand what this means and how it's going to impact other fields?
[21:14]
Hung Xing Wu
So I mean I have some mathematician friends and I have some friends in other fields. Yeah So I think the way I want to tell them is that I feel like some may be afraid that AI will replace them AI will just replace mathematician but I think it's really about empowering every theoretical researcher because AI really have this advantage of knowing so many stuff and connect things Currently it seems like the problem had for human may not be hard for AI and that's a really great thing we can use AI to solve those problems, get new ideas and then we can digest them and make new discovery just like Hongqing said. So I think some of them get very excited about this and of course one thing is that currently is only en masse but I believe that because it's general reasoning model at least theoretical researcher they can benefit a lot from that. I think the dream world will be everyone have some access to the top level reasoning ability so other researchers can use them to discover whatever they want to discover and then basically OpenAI will accelerate science a lot because you are empowering every scientist to accelerate the science worldwide I think that's our mission so
[22:31]
Andrew Main
if I was a researcher how would I get started? What advice would you have to say? Okay, try this first we'll start with
[22:37]
Li Jiechen
you hongxing get GPT Pro subscription It's really, really much better than thinking without probably and because it's things longer and Try to ask the boldest question you can ask. I had experience that sometimes I try to decompose a problem into smaller problem and ask the model and turns out that it was not as good as just directly ask the question because my decomposition was not the best way.
[23:06]
Andrew Main
Why do you think that was?
[23:07]
Li Jiechen
I think because as human we have all kinds of priors on how problems should be solved and they are very helpful in reducing the thinking time. But very often the prior are wrong and there are blind spots. And AI models, they sometimes just can surprise us with discovering these hidden things.
[23:29]
Andrew Main
When I spoke to Alex Lubchaska, he talked about how treating it like a graduate student, not talking down too low but not too high, but at the right level so you could just understand that it knew the terms that worked for you. Alex, how about for you? What advice would you give somebody who's a researcher who wants to try to figure out how to be more effective with this?
[23:47]
Alexander Wei
Yeah, I think a lot of it is actually like, I think these days learning to trust the model and like figuring out how far you can go in trusting the model and also learning what's beyond what the model can do. Because if you don't have a sense of that, you don't maximally use the full capabilities of the model. I think Lijia's taught me a lot about how to use these tools better. I feel like I'm sort of a dinosaur in some respects in terms of adoption because I think I started working at OpenAI well before these tools existed. And so I think I have a lot of old bad habits where I don't trust the models enough. I still think it's like the models of six months ago or something.
[24:31]
Andrew Main
That's an interesting paradigm. Okay, so dj, what advice would you give?
[24:36]
Hung Xing Wu
Oh, I have this method of every time you double your trust on a model and see when it fails. And if it fails, you just go back and you do this every month, then you can quickly get to the point where you can maximally trust the model, but also not breaking your stuff. And apparently for the last five months it's going to really, you know, explanatory.
[25:02]
Andrew Main
Back in the GPT three days I had like a list of tests and things like this I would do and I'd watch them sort of incrementally get better than GPT4. And then by the time 01 came out, I had to throw it out because that was just toy problems at that point. And I feel like I have to continuously sort of adjust and kind of keep trying Bigger and more complicated things to do that with. Do you think that for somebody who's in mathematics or in a related field right now who's feeling a little bit concerned by this, do you think that they should be taking a more optimistic approach?
[25:32]
Li Jiechen
I think it's legit to feel concerned, especially when the field is a lot like a lot of the field is problem solving oriented because model are going to be really good at problem solving. But mathematics is really, really much more than problem solving. It's more about understanding the structure and building new theories like Lijie said. And I think we should try to figure out how to better use the model to help us in solving the problems that we met and then try to accelerate the speed that we build new theory and come up with new understandings. I think that's the more optimistic view.
[26:22]
Hung Xing Wu
When Codex becomes much better, we can do so much more thing for you. You would expect you will work less because Codec is good, but somehow you actually work more because that's way more thing you can do. So I actually hope this can happen for math as well. The model becomes so good. I must imagine you have 10 ideas. You can ask 10 model to try them and see one of them succeed. And they don't have to do tedious calculation by themselves. So I would imagine maybe what happened with coding can happen to mathematicians.
[26:55]
Andrew Main
It's interesting too because when we talk about the ERDS problems, Paul Erdosch was a very interesting person who found a lot of things curious and said, oh, this is neat. And we have this category of problems he put together, but there's not a lot of rhyme or reason to them. They were just things that he found curious or worked with other people on. I think that's a big thing that's neat about science in general is often we think that there are these real specific hierarchies, but it literally can just be things we're curious about. That being said, how long before there are no more unsolved Erdos problems?
[27:27]
Hung Xing Wu
Some of them are very, very hard. So yeah, I don't know.
[27:31]
Andrew Main
Do you foresee us maybe Alex needing to come up with a new category of problems?
[27:35]
Alexander Wei
I think probably the hardest problems on that list, I think that list includes the Collatz conjecture. These are problems that feel very, very far out of reach of the mathematical technology of today, even though many of them are quite simple to state.
[27:50]
Andrew Main
So we'll still have some more things to work on and continuously move things through. That's good to know. It's exciting though too to Start to think about what happens when you do start applying this to other areas in physics and astronomy and start looking at data sets and stuff and what kind of discoveries are going to be in store. Do you have any particular area that you're hoping to see?
[28:10]
Li Jiechen
Oh, I hope we just saw pbasmp.
[28:14]
Andrew Main
How about you, Alex?
[28:15]
Alexander Wei
I think the next milestone in my head is really AI that can do AI research. I think there are so many unsolved problems here. We are in a sense in many ways limited by all the limitations of just our own intelligences. I'm optimistic about just having AI broadly available as a technology because there's just so much more demand for intelligence in the world that humans can supply.
[28:49]
Hung Xing Wu
Lijing oh, I wanted to say p was then P2, but Hongjin said it. So I guess beyond that, one concrete thing I'm very interested in is currently it seems AI is trying to combine ideas from different fields and of course in a very novel and sophisticated way. But can AI actually generate completely new ideas from scratch? I mean, that's something we haven't really seen concretely in AI and that's something I maybe want to see next happening. And that can be very cool.
[29:21]
Andrew Main
Have you seen traces of that yet?
[29:23]
Hung Xing Wu
I think so, even in this adage problem. I think if you look at the chain of thought, which is 125 pages, I think some of the thoughts are pretty creative, although they didn't work out. I mean, the final idea is more like combining all the stuff, but some of it has some creative thoughts.
[29:42]
Andrew Main
But it is interesting. Early on, arguments were like these models weren't creative, but you could give it two ideas that had never been connected before and say what's the relationship? And that would be something very, very new. It felt like something different. And I feel like we'll probably be seeing more of that. Do you see us coming up with new forms of mathematics?
[30:01]
Li Jiechen
I think that actually will be further away down the line because next year maybe. I think because model now are very, very good at coming up with some idea to solve a problem. But it's not good at proposing a completely new different kind of math or proposing new theory. How to get model to do that is still very, very open.
[30:26]
Alexander Wei
How I would think about it is we see this like, you know, like Moore's law for the time horizon at which these models are effective. And I think you, you sort of feel that in math where, you know, there's like every few months, the amount of time these models can like sort of work independently for doubles at least the amount of human equivalent time. And so for solving problems, if you're really, really good at it, maybe some problems like you actually have pretty short paths for the solution, you don't need to take that long. But I think for inviting like new ways of doing mathematics, that's much more like a years or decades long process. And so I think, you know, it's, it'll still take a bit of time for that exponential to get there.
[31:18]
Andrew Main
This was done by an internal model that you guys worked on and since then 5.5 has been able to do the same thing. And we've seen other labs have said that they've been able to do this as well, but this was several weeks ago, which is now ancient history. What have we seen since then?
[31:31]
Alexander Wei
I think one difference between the original result and I think what the follow up findings have been is that actually for the original model there was no scaffolding needed. You just asked it to do the, do the problem and then it gave you the answer. And so actually this is all you can, you can read the original prompt and response in the, in this note we uploaded on the blog. Whereas I think the follow up efforts have had a little bit more like structure or like steering of the models. But I think one interesting data point here is that like, you know, it's really all about test time, compute scaling. After we initially solved a problem, as this is the plot Lizzie brought up earlier is that you know, with enough test time compute budget, the model is able to solve the problem around 50% of the time. So it's not surprising that like you know, you can, you can get, you can get there with other methods, you can find this with other methods as well. But I think what's, what's really important here is just that like you know, as you pour in more test time computer, you get better results.
[32:44]
Andrew Main
It seems like it's kind of a virtuous cycle where you take today's model, give it more compute, let it solve for that and you understand how these things can be solved. Next generation models can learn from that and just get more and more efficient and you just have this. Basically it seems like it just scales forever, right? What do you think we're going to see by the end of the year?
[33:03]
Hung Xing Wu
What I want to see is people use our model to discover lots of new stuff and not only in math, but also in all of science. Of course OpenAI wants to do some cool math stuff, but I think it would be better everyone can use the model to discover their own science. And I would expect many mathematicians will use the model. I mean, maybe not completely on the model, but collaborate with the model to discover a lot of more math results. I think that'd be really cool.
[33:32]
Andrew Main
I've talked to some mathematicians who know others or who are very reticent to even try using AI in mathematics. What is the best argument you can give?
[33:43]
Li Jiechen
I think I'll just show them the proof of the disprove of the conjecture. I think just about productivity, we do math not just to enjoy the pleasure of the problem solving, but also to advance the field and to understand the truth that we're looking for. And using AI is going to speed up that by a lot, and it's going to tell us what we are really struggling to find. And it will be hard to resist using AI. At some point, you could be an
[34:24]
Andrew Main
astronomer and not use a telescope, but you kind of have to ask why.
[34:29]
Li Jiechen
Yeah, exactly.
[34:30]
Andrew Main
I know one of the researchers here likes to watch computers play chess against each other, and he feels like he sometimes learns things from that. Do you think that we'll learn to be better mathematicians or researchers or scientists or just thinkers in general by watching the solutions the models come up with?
[34:46]
Li Jiechen
Looking at 125 pages of thinking, it's probably not very helpful for a mathematician. But just by looking at the answer, you actually do learn some idea that was not there that you didn't know before and that inspired the later mathematical works that knock down other problems. So I definitely think people learn some, like mathematicians learn something from AI solutions. Yeah. So some of the mathematician that we asked to review the proof, together with collaborators, they actually use the idea to disprove the sum product conjecture, but for real numbers. I think that's one very good example. AI can crack down important questions and give us ideas that we can apply elsewhere.
[35:40]
Alexander Wei
Yeah. I think it's remarkable that this group of mathematicians has already, just in the span of a week, already used it to disprove this result that I think is maybe of similar importance to the unit distance conjecture. So I think this is a wonderful example of mathematicians seeing this and using it as inspiration and bringing the ideas to bear on a different problem.
[36:10]
Andrew Main
What does this mean for the mathematical community?
[36:12]
Alexander Wei
I think for us, when we do these experiments, I think we want to make sure we empower the academic communities we interact with, where we don't just go to some community and from the outside try to solve a bunch of their problems and give them a bunch of AI slop. But what we really want to do is we want to make these tools available to researchers and let them direct all this AI test time, compute at the problems they think are important. I think it really shouldn't be viewed as a race to solve as many edos problems as we can, but more like we want to make people aware that the technology is out there, this is what it can do.
[37:08]
Andrew Main
You're not trying to solve every Erdos problem.
[37:10]
Alexander Wei
Yeah, I would not see that as our goal. I think this just happened to be a particularly significant result that we thought would be important to share with the world that this is the capability level of models today. But it's really not the goal to just go through the list as if it were a risk.
[37:31]
Andrew Main
Do you foresee things applying to cryptography? And there's also some debate too about do these models get so good that we kind of surpass even where quantum computing goes? Which sounds kind of crazy.
[37:45]
Hung Xing Wu
Yeah, I think cryptography is really an important topic these days because the foundation of cryptography is really about some problems like factoring. It's hard to solve by computers, but basically we only have conjecture, there's no mathematical proof of this fact. And the supposed model gets really good at algorithms. Maybe they will prove some of the cryptography conjecture and saying okay, those protocols, they're actually secure, we don't have to conjecture them to be secure or maybe they'll find some loophole. And that's also very important. I think we need to make sure the foundation of our security is good so the model can stress test the foundation of the cryptography to make sure we have better security.
[38:34]
Andrew Main
What about quantum computing?
[38:37]
Hung Xing Wu
I think that's a very different territory. Right, like quantum computing. Like okay, actually I used to study quantum computing. Like my first paper is on quantum advantage which shows like for some tasks quantum computer can do better than classical computers. But so far I think the models, I mean they are just classical computers. I mean they do what human can do. I mean maybe a bit better. But quantum computer, they can sometimes do like more fancy stuff like simulating some quantum effect in chemistry which we probably not. Okay, I'm not an expert on that, but that might not, it's unclear. Like it is just two different paradigms. So I'm not super sure how they compare to each other, but I think
[39:21]
Li Jiechen
AI is going to greatly accelerate the pace that we develop quantum computers. Like in recent, just in these years there's improvement in error correcting. You have error correcting code, quantum error codes that only uses simpler type of operations and that really speed up the physical implementation. So I expect more of these to come from collaboration with AI. That AI can propose new quantum error correction algorithms, and then we can develop the quantum computers much faster.
[39:58]
Hung Xing Wu
Once you ask the model to sort of question, you can of course follow up with, how did you solve it? Can you explain this part of the proof to me? And then the model will patiently try to teach you how everything goes line by line. So it's actually not just one shot problem solving. You can ask a follow up question to learn how the proof works. And I really like that.
[40:25]
Alexander Wei
One thing you learn very quickly as a researcher is that if your results are too good to be true, you probably have a bug somewhere. I think every researcher has had an experience where they see amazing numbers from their experiments, and it turns out the experiment was actually wrong. The numbers were wrong. When I first heard about this from Li Jien Hong Xing, that was my priority. I was like, oh, I'll wait for them to find the bug. But then I think as the days went on, you sort of had this growing optimism that, oh, Maybe this is one in 100 times where it's too good to be true, but it's actually true.
[41:13]
Andrew Main
Gentlemen, thank you very much.
[41:15]
Li Jiechen
Thanks so much.
[41:16]
Alexander Wei
Thank you so much.