Summary7 min read

Podcast Summary

Episode Overview

Podcast: OpenAI Podcast
Episode: #10 - How AI Is Accelerating Scientific Discovery Today and What's Ahead
Date: November 20, 2025
Host: Andrew Mayne
Guests:

Kevin Weil, Head of OpenAI for Science
Alex Lubchaska, OpenAI Research Scientist & Professor of Physics at Vanderbilt University

In this episode, Andrew Mayne sits down with Kevin Weil and Alex Lubchaska to discuss how AI, especially cutting-edge models like GPT-5, are transforming scientific research. The guests share firsthand stories and upcoming research, address anxieties about AI in academia, and look ahead at what scientific breakthroughs the next five years may hold as AI becomes an ever more powerful tool for researchers.

Key Discussion Points and Insights

1. The “OpenAI for Science” Initiative

[00:41–02:20]

Mission: Accelerate the pace of scientific discovery by drastically reducing timelines (e.g., doing 25 years’ worth in 5).
Why now? Frontier models like GPT-5 are beginning to achieve novel results, not merely repeating known knowledge, but breaking into new ground.
Rapid Progress: “You go very quickly from the model can’t do something, to the model can just barely do something... all of a sudden, you couldn’t imagine doing this thing without AI.” (Kevin Weil, [01:26])

2. Concrete Scientific Use Cases: From Math to Black Holes

[02:20–08:26]

AI is already assisting in diverse areas like mathematics, physics, astronomy, and biology.
Physics Example:
- Alex described using GPT to help solve a complex equation related to pulsars. GPT found a rarely-cited mathematical identity from a 1950s Norwegian paper, nearly getting the final answer right except for a minor typo ([03:33–05:30]).
  - Notable Quote: “I would say that’s a uniquely human ability... and now in 2025, clearly [AI models] are capable of doing things I would consider amazing.” (Alex Lubchaska, [05:25])
Literature Search: GPT-5 excels at “conceptual level” literature searches, making connections across fields and languages (e.g., finding a German PhD thesis with relevant work) ([06:16]).
AI makes it easier to bridge gaps between specialties, allowing researchers to explore adjacent fields more efficiently.

3. Collaboration and Acceleration of Discovery

[08:26–11:18]

AI acts as a tireless collaborator, possessing a breadth of knowledge across disciplines and working without fatigue.
Cross-field Synergy: “Now with GPT5, I’m going to go back and explore that because I’ve got a coworker... who has read just about every scientific paper out there.” (Kevin Weil, [08:17])

4. Overcoming Skepticism and Measuring Progress

[11:00–14:50]

Skepticism remains (“it couldn’t spell strawberry last year”), but real-world examples—like a fusion scientist using progressively harder AI-generated problems—are converting many ([11:18]).
- Notable Quote: “…these are the worst AI models that we will ever use for the rest of our lives.” (Kevin Weil, [14:37])
Rapid evolution: Today’s free models are much less capable than Pro versions which can "think" much longer and tackle more complex problems.

5. Working With AI: Iterative Problem-Solving at the Frontier

[15:13–24:26]

Getting the best results from AI is interactive: “the people that are best… have a sort of patience to go back and forth with them…it’s probably the way you would work with any two people operating at about the limit of their capabilities.” (Kevin Weil, [18:45])
The edge of AI’s knowledge is “jagged,” just as it is for humans; sometimes basic questions fail, while hard ones get brilliant answers ([22:16–24:26]).
- Memorable: “...their edge of knowledge is very jagged in a way that’s different from ours… at the intersection… a lot of interesting things are going to happen.” (Alex Lubchaska, [23:37])

6. The Upcoming Research Paper

[24:33–27:22]

OpenAI and external academics are collaborating on a broad, honest paper about “the state of GPT-5 for science.”
Paper includes real chat transcripts, candid failures, and several non-trivial new mathematical results (some potentially publishable alone).
- Notable Quote: “The goal was not to be hypey… This is what works, this is what doesn’t work. Here’s what I tried.” (Kevin Weil, [25:42])

7. Advice to Students & Early Career Scientists

[27:22–29:53]

AI will not replace scientists—like the telescope, it will empower them.
AI is excellent for prototyping approaches, brainstorming paths, and boosting productivity. Young scientists should experiment with the latest models for signposting research paths.
- Quote: “Just having these signposts along the way is so helpful... it’s going to be a boon for everyone.” (Alex Lubchaska, [29:17])

8. The Next Five Years: Predictions and Uncertainties

[29:53–36:43]

Exponential change: The state-of-the-art shifts so quickly, “you look back 12 months and you’re completely embarrassed by where you were...” (Kevin Weil, [30:14])
Within 5 years, expect profound changes in both theoretical and life sciences. Bottlenecks may shift from conceptual breakthroughs to physical/experimental validation due to increased hypotheses from AI ([32:19]).
AGI’s impact will likely be felt most through scientific advances (e.g., personalized medicine, scalable fusion).

9. The Awareness Gap and Model Evolution

[36:43–40:08]

Many scientists underestimate current AI because tools change so quickly (“I tried it 18 months ago”) or only use free versions.
Persistence pays off as capabilities frequently leap ahead with new models and longer compute times.

10. Scientific Benchmarks and Testing the Frontier

[40:31–42:25]

Benchmarks must be continuously updated. GPT-5 surpasses humans on PhD-level scientific Q&A (90% vs. 70%)—but the hardest, frontier questions remain key ([41:33]).
New evaluations like 'GDPVAL' measure economic/scientific value, pushing models (and their creators) to new limits.

11. Future Hopes and Areas of Potential Acceleration

[42:25–48:06]

Personal Wishes:
- Alex: Accelerate black hole research and dark matter understanding, integrate vast disparate knowledge, design better experiments ([42:26–44:17]).
- Kevin: Solve fusion energy—transform energy landscape (“If we can make energy 10 times more prevalent, 10 times cheaper, it will change the world.” ([45:48])).
The vision: General purpose AI to empower every scientist for their own breakthroughs.
- “We really want to see 100 scientists win Nobel prizes using AI.” (Kevin Weil, [47:26])
This is not the end but the beginning—“There’s a science 2.0 moment happening, I think.” (Kevin Weil, [48:06])

Memorable Quotes & Moments

| Timestamp | Speaker | Quote | |-----------|---------|-------| | 01:26 | Kevin Weil | “You go very quickly from the model can't do something, to the model can just barely do something... all of a sudden, you couldn't imagine doing this thing without AI.” | | 05:25 | Alex Lubchaska | “I would say that's a uniquely human ability... and now in 2025, clearly they're capable of doing things that I would consider amazing.” | | 08:17 | Kevin Weil | “I've got a coworker, effectively a collaborator, who has read just about every scientific paper that's out there...” | | 14:37 | Kevin Weil | "These are the worst AI models that we will ever use for the rest of our lives." | | 18:45 | Kevin Weil | “…there's also a very real sense, like when you're giving GPT5 or any of these AI models a problem that's on the frontier... they tend to still be wrong a lot. Kind of like any human would be at operating at the level of, at the frontier of their capabilities.” | | 23:37 | Alex Lubchaska | “Their edge of knowledge is very jagged in a way that's different from ours… where it can go farther than us or we can get ahead of it, that's where a lot of interesting things are going to happen...” | | 25:42 | Kevin Weil | "The goal was not to be hypey… This is what works, this is what doesn’t work. Here’s what I tried." | | 29:17 | Alex Lubchaska | “Just having these signposts along the way is so helpful... it's going to be a boon for everyone.” | | 30:14 | Kevin Weil | “You look back 12 months and you're completely embarrassed by where you were 12 months ago.” | | 41:33 | Kevin Weil | “Our latest models are nearly at 90%... surpassing the capability of most humans in their field of scientific study across every field at once...” | | 45:48 | Kevin Weil | “If we can make energy 10 times more prevalent, 10 times cheaper, it will change the world.” | | 47:26 | Kevin Weil | “We really want to see 100 scientists win Nobel prizes using AI.” | | 48:06 | Kevin Weil | “Certainly there's a science 2.0 moment happening, I think.” |

Notable Timestamps for Easy Reference

OpenAI for Science mission: [00:41]
Physics-black hole example: [03:33–05:30]
Literature search innovation: [06:16]
Collaborative AI for science: [08:17]
Fusion testing progression anecdote: [11:18–14:50]
Alex’s “AI pill” moment with black holes: [15:15–18:13]
Iterative process & patience with AI: [18:13–22:16]
Jagged edge of knowledge: [22:16–24:26]
Upcoming research paper: [24:37–27:22]
Student/early career advice: [27:40–29:53]
Predictions next 5 years: [30:06–36:43]
Awareness gap and progress: [36:43–40:08]
Scientific benchmarks: [40:31–42:25]
Personal/field acceleration wishes: [42:25–48:06]
Closing thoughts (science 2.0): [48:06]

Tone and Language

The tone throughout the episode is upbeat, engaged, and candid. Both guests mix deep technical knowledge with personal anecdotes, and the host draws out both practical advice and big-picture vision. There is a clear emphasis on honest assessment—celebrating breakthroughs while recognizing challenges and current limitations.

Conclusion: Science 2.0 Is Here

AI is rapidly evolving from a novel assistant to a revolutionary engine for scientific progress. While there are challenges—like bridging the “awareness gap” among researchers and refining models to handle low-pass-rate, frontier problems—the overall mood is one of excitement and optimism. The next era in science will not be defined by AI replacing human researchers, but by empowering them to achieve more ambitious, interdisciplinary, and accelerated discoveries together.

Loading summary

Transcript70 lines

[00:00]
A
Hello, I'm Andrew Main, and this is the OpenAI podcast. Today, my guests are Kevin Weil, head of OpenAI for science, and Alex Lubchaska, who is an OpenAI research scientist and professor of physics at Vanderbilt University. We're going to be discussing how AI is impacting science, an upcoming research paper, and where science might be headed in the next five years.
[00:23]
B
Maybe the most profound way that people are going to feel AGI in their lives is, is through science.
[00:29]
C
With ChatGPT, I can just launch it in that direction. In that direction, that direction.
[00:33]
B
The acceleration that is going to come from these tools is going to change science.
[00:41]
A
So you're running the OpenAI for Science initiative. Could you explain what that's about?
[00:45]
B
Yeah. The mission of OpenAI for Science is to accelerate science. So the question is, can we help scientists do the next, say, 25 years of scientific research and scientific discovery in five years? Instead, science underpins so much of what we do and how we live. And if we can make progress, go faster by putting our most advanced models into the hands of the best scientists in the world, we should do that. That's what we're trying to do. You could ask, why now? Why didn't we do this a year ago? Why aren't we doing this a year from now? One of the big reasons is we're just starting to see our frontier AI models being able to do novel science. So we're starting to see examples where GPT5 can actually prove new things. Maybe not yet things that humans could not do, but things that humans have not done. So these little existence proofs of GPT5 being able to break out past the frontier of human knowledge and into the unknown. And if there's one thing that I've learned from now, a year and a half or so at OpenAI, it's that you go very quickly from the model can't do something, to the model can just barely do something. And it's not great at it yet. But you see these early examples, and then six months later, 12 months later, all of a sudden, you couldn't imagine doing this thing without AI. And I think science is in that initial phase where we're seeing real acceleration. For scientists that are using AI, sometimes novel, you know, not yet, maybe large breakthroughs, call them small breakthroughs. And that just says that there's so much potential in this space.
[02:21]
A
We've seen examples of, let's say, AI helping with mathematical proofs. Could you give me an example of how it might do things in some other areas, like physics? Or whatever kind of things we might see in the short term.
[02:31]
B
Yeah, I mean, we're seeing examples every day, and they're across the range of sort of the scientific frontier. You see examples in mathematics, in physics, astronomy, life sciences, like biology. Alex, I mean, you've worked on some of these. Maybe it's a good time to talk about some of the physics stuff that you've seen.
[02:52]
C
Yeah, I think, coming back to Kevin's point about how this is a special time, that's very much how I feel as well, because I started the year 2025 thinking, yeah, ChatGPT is cool. Like everybody, I used it when it came out, and I thought it's a great chatbot, But I was sure it would take a very long time before it would become really relevant for my own work. So I started the year, I would say, as an AI skeptic, because I like to see evidence before I'm convinced of something. And I saw people using it to help in their writing, and I started to use it for that as well. It's very useful for proofreading, but I thought, oh, it's going to be a while before it gets to do the special stuff that I'm really a specialist at.
[03:33]
A
Black holes.
[03:33]
C
Like black hole physics. Exactly. And I had this experience early this year where I was trying to find this magnetic field solution that describes what happens around a pulsar, which is a rotating star with very powerful magnetic fields. And I was going for this very particular solution. I had to solve a partial differential equation. I was able to identify that solution as an infinite sum over products of special functions called legend polynomials. And if you go to physics grad school, this is the kind of thing that you spend a lot of time getting familiar with. And I also like these puzzles. And I was playing around with the sum, and I felt like there should be a simple formula that it evaluates to. And I thought, okay, I have this friend who has ChatGPT03 Pro, which I didn't have access to at the time. And I thought, okay, I'm just going to send it to him and see what comes out of it. And he sends me back this output. It thought for 11 minutes, which at the time I'd never seen it do because I was using the free version, which doesn't think for as long. And it gave this beautiful answer where it was able to understand what the sum was and break it down into pieces that it could tackle. And then it had to go and find this special identity that was published in one paper from the 1950s in the Norwegian Journal of Mathematics. And so it understood what the problem was and it knew about this random identity that was just the thing for the job. And it used them and. And it gave this beautiful output. And at the end, the answer was wrong because it made the silly typo. It added an extra factor in front. It was almost kind of like a human making a silly typo at the end. But it was very easy to check the derivation. And I went through it and I realized, okay, there's this extra factor. But aside from that, it did the work. And that really sent me reeling because I thought, okay, I would say that's a uniquely human ability. I thought that's something that makes theoretical physicists Special now in 2025, clearly they're capable of doing things that I would consider amazing.
[05:31]
B
Yeah, one of the cool things. So you've got examples like Alex's where it was probably not something that he. He could have done it himself over eventually, but GPT was able to do it faster. That's acceleration on its own. And there's something qualitative about that even as well. Because if you can explore, instead of exploring two paths over the course of a week, if you can explore 10 paths in parallel in an hour, all of a sudden there's a lot more ideas that you can try. That's also acceleration. We also see examples in literature search which you don't think of as maybe deep scientific innovation, but it's really important to be able to understand. Has somebody worked on this problem before? And if so, is there something I can learn to speed up my own work? And we've seen interesting examples where there was one. I might get the details of this wrong, but we were talking to this researcher and he was saying he was exploring this particular idea in like high dimensional optimization. And he was like, man, you know, this thing I'm working on, it's interesting, but somebody must have worked on this before. I can't be the first person to have had this idea. I just can't. But I can't find any examples. And then he had given it. He sort of given a description of what he was working on to GPT5. And GPT5 found an example from. I think it was like economics or something, a completely different field that used completely different terminology. So no keyword lookup would have ever worked. GPT5 did sort of a conceptual level literature search.
[07:00]
C
Yeah.
[07:00]
B
Found somebody's PhD thesis in German. So also a completely different language. You know, it was like basically lost to time. But this person had done really interesting sort of related work that helped him in his research. And so that's another area. So you can talk about the acceleration that comes from just novel proofs and GPT5 being able to do something on its own or guided by an expert. But there's also these examples of acceleration in calculations and literature search and all of them contribute to accelerating science.
[07:34]
C
Yeah, and the exact same thing happened to me. I was trying to derive this property of black holes and I got this equation that described this phenomenon I was after and it had a three derivative term, which is pretty unusual. And I looked at it and I recognized it's something called the Schwarzenegger derivative, which is a special thing that appears in math. And I thought, wow, this is really strange that this would show up. And I just copy pasted the equation into ChatGPT and said, have you seen this before? And it said, oh yes, this is the conformal bridge equation. I had no idea what a conformal bridge was at the time. And it said, oh, just look up this paper. And that was amazing because it turns out that this equation that showed up in my work had already been studied in some other works. And I've heard from a lot of colleagues doing research in physics that there's a lot of that going on. And at the forefront of knowledge everything becomes so niche that it's very hard to know the latest details in neighboring fields. And GPT is an amazing help with that.
[08:27]
B
Yeah, that's another thing that we've heard from professors, researchers that we've talked to, is there's so much you have to be so specialized today. And so sometimes it gets hard to explore an area outside of your main area. There's one particular mathematician we were talking to said, you know, one of my last papers, I knew there was an area that I wanted to go follow it off in this direction, but it wasn't my specialty and it would have taken me a long time. And I just kind of ended up feeling like, you know, maybe that's not the most efficient place for me to spend my time. Now with GPT5, I'm going to go back and explore that because I've got a coworker, effectively a collaborator who has read just about every scientific paper that's out there and is a pretty meaningful expert on just about any topic you want. And I think I'm going to be able to go explore these adjacencies in a far better way with ChatGPT than I could have on my own. And so that's also a Fascinating new take. It helps everybody. It can help you go deeper, like you were saying, and it can also help you go more broad.
[09:29]
A
Literature search is pretty interesting because one of my weird hobbies is I like to go back and look at when was some early scientific discovery made that didn't get utilized too much later on. A famous one was carbon filaments when Thomas Edison spent all that effort to try to find it. And it had been published in like 20 years before. Of course, Dewey Decimal System was that year. So you can't blame him. Other things like silicon assist in my conductor, you know, if somebody would read in the literature, we might have had that five to 10 years earlier ability to replicate DNA that had been published like 10 or 12 years earlier before somebody figured that out. And then the shotgun technique we use for DNA, you know, understanding, you know, figuring out like the DNA sequencing that was first published like 1982 but at that time there weren't supercomputers that could run it right. And that's exciting just to think of just having a really good tool that can search through all of this stuff and pull up these answers you have.
[10:19]
B
Yeah. And I think especially some of the most interesting research now happens at the intersections of two fields. And again, it's hard for one person to be an expert in two fields, let alone three or four or five. And sometimes it's tough for humans to collaborate. You don't necessarily find the right person. The person doesn't have infinite patience. And here with GPT you have now the option to have a collaborator that will work 24, 7, has infinite patience, has read substantially every scientific paper written in the last however many years. And so it's a new kind of collaboration that is its own form of acceleration.
[11:00]
A
You think about Claude Shannon's wife was a mathematician and how much that to help what he was able to do. And I think we forget how much collaboration really is a factor of that. But I would say some people hearing this might go, yeah, but how it couldn't spell Strawberry last year couldn't do math. So why are we going to have it do science?
[11:19]
B
Yeah, so actually I don't even know if I've told you this, my own sort of origin story with appreciating what GPT5 could do. Or in this case it was, I think, oh, this was almost a year ago. So it was 01 preview maybe, but I was meeting with this guy named Brian Spears who's a physicist at Lawrence Livermore that was in D.C. and we'd never met before. So I didn't know sort of what to expect. I thought maybe I was going to go in and be talking to him about what was new and what he could do with O1 preview and why he should give it a try. Little did I know. I sat down and he immediately took control of the conversation and said, let me tell you what I can do with your models. And these are the most amazing things for science and this is going to change the world. He was like, okay, let me take you through this. And he opened up his laptop and he works on fusion, right? Lawrence Livermore was the first to do large scale fusion with positive energy. Super exciting. So he's like, all right, we're going to take a fusion example and first I'm going to start with the undergrad version of this problem. And so he shows me this conversation and he's like, all right, so you've got a copper rod and we're going to bombard it with super high pressure waves. What happens? And you know, he's like, so he answered and oh, one preview gives a good answer. It's like, okay, cool. So it got the, it got the, got the undergrad problem, right? And then now let's, let's ask the graduate version of this. Now what happens inside the rod itself as you're doing this? And you know what, what needs to be true in order for it to generate these certain kinds of shock waves? And he goes through and is like, okay, so got that right. All right, now let's ask the postdoc level question. All right, now let's ask the. And at this point I'm like, despite having a physics background, I'm just following along for the ride because he's beyond anything I can do. Like, all right, now let's ask the. You just joined Lawrence Livermore and you kind of question, you've gone through your postdoc, you're a nuclear physicist. And he keeps going and O1 preview keeps getting the answer right. And then he's like, all right, now let me ask you the you've worked at Lawrence Livermore for 20 years question. And it goes and it gets it right. And then not only that, but it suggests that the only way to go forward is to use these set of simulation tools that are partially classified or that only Lawrence Livermore has. And it's like, I don't have access to these, but if you did, you would want to use these tools. And he's like, look, nothing in here that, nothing that I just showed you was something that I couldn't do, but it would have taken me days and certainly not everybody at the lab can do this. The acceleration that is going to come from these tools is going to change science. And so I went from sitting down with this guy who I thought maybe I was going to be sort of talking to him about the value of AI to him just completely blowing my mind about the, the potential of AI. And this is a year ago. This is 01 Preview. We've come leaps and bounds since then. And the thing that I always try and remind everybody, the AI models that we're using today, as good as GPT 5.1 Pro, is these are the worst AI models that we will ever use for the rest of our lives. And when you think about that, the fact that we're here just implies that the future is very bright.
[14:50]
A
How have your colleagues been using these tools?
[14:53]
C
Yeah, there's a lot of different usages, I think. Literature search. Here's what I'm working on. Does it connect to any other thing? And this is something that we spend a lot of time on as scientists, just understanding when something new shows up in our work, how it connects to other things. And okay, my own experience that made me become AI pilled, I think.
[15:13]
A
Is this the reason you came to OpenAI?
[15:15]
C
The reason, yeah. And when GPT5 Pro came out, I met Mark Chen who works here at OpenAI, he's chief research officer and he gave me a challenge. He was very proud. He said, why don't you just give it a hard problem? And I thought, you want a hard problem? Okay. And so I gave it this question, quantum gravity, right? So I had just found these new symmetries of black holes, which is something that doesn't happen that often. And I'd written up a paper that came out in June on the archive and I was very happy about that. And I thought, okay, well let's see how GPT Pro handles this new question. And so I gave it the equation and I didn't say that it has some symmetries. I didn't give it a leading question. I just said, what are the symmetries? And it thought for five minutes and it said, no symmetries. And I go, ha, it's not there yet. Still better than the AI. And. And Mark Shen is visibly crestfallen. He goes, okay, well just give it an easier question then. And so I think, okay, I'm going to give it the warm up baby version of the problem, which is find the symmetries of this equation not in the full black hole space time. Which is complicated, but in the flat space limit where the space time is empty and hit enter, it thinks for nine minutes. It comes back with this beautiful answer. Oh, this equation has conformal symmetry, which is the correct thing. And here are the three generators. It was very beautiful. And this version of the equation probably has been studied, I'm sure has been studied many times over the decades. So I don't know what he did exactly, but it came up with the answer. I thought, okay, this is very good. This is a great outcome. And then Marc said, okay, well, but now that it's been primed on the warmup example, try again in this instance of chat, the harder problem. And I thought, okay, let's go. And so we give it the hard problem again, hit enter. And it thinks. And it thinks. And that was the first time I saw it think for so long. I think it took 18 minutes. And it comes out with this beautiful answer that was completely correct. And that blew my mind because I had been working on this for a very long time. And I would say that that calculation is at the edge of my abilities. I think it's something that very few people could have done the way I did it. And so I was really shocked because you spent years of your life training to be best in class or something and finding symmetries of black holes and these kinds of equations. That's my jam. And I thought, okay. So I guess that just happened. And it really sent my mind reeling and I was a little bit shell shocked for a few days and I just couldn't stop thinking about it. And after that I realized, okay, I have to become involved in this. Because to see this capability emerge into the world right now and to not be involved with this just seemed crazy to me.
[18:13]
B
I was going to. I actually think you made a really important point in the middle of that around the fact that you gave it the hard question. It didn't get it right, you gave it an easier question. It got that right. And then you were able to give it a harder question. It got there is still, you know, as excited as we clearly are about the future here, there's also a very real sense, like when you're giving GPT5 or any of these AI models a problem that's on the frontier, that's at the limit of their capabilities, they tend to still be wrong a lot. Kind of like any human would be at operating at the level of, at the frontier of their capabilities. And it takes, you know, it isn't just automatic yet. Hopefully in the future it will Be enter in any hard question and the model answers it. But today there's a lot of back and forth and the people that are best, the researchers that are best at getting the most out of the models have a sort of patience to go back and forth with them. I think that's natural. It's probably the way that you would work with any two people operating at about the limit of their capabilities. But I think it's important, especially for folks listening to this who are doing research with the models, to know that it isn't just one shot and it always works. There really is a back and forth and sort of a patience that it takes. And one of the interesting research problems that we're spending a lot of time thinking about is how we help people with, how we sort of help reduce that cognitive load. Because when you're working on a problem, say the model has a 5% pass rate on some problem. So technically the model can get it right once out of 20 times, but it's really at the frontier. So it's not going to get it right nearly even close to every time. If you're sitting inside ChatGPT and just entering in this question, you're going to have to enter it in what, 10 times before you have the odds that it's going to get the right answer. And most people aren't going to do that. And so there's a whole host of problems that the model can solve that people probably try and are like, oh, after three tries it didn't get it right, so I'll move on. The model's not good enough yet. And actually it is, but it's just very hard to tell apart low pass rate problems from problems that are too hard. And I think that's actually a really important thing for us to help researchers and mathematicians get passed. Because the most interesting problems right now are going to be the ones where the model has a very low but non zero pass rate. Those are going to be the hardest problems that the model can solve, the best ways that it can, that it can help accelerate science. And so that's a really interesting research problem that we're taking on to try and make that a little more automatic, a little less groundwork. But for now, putting in the time and really going back and forth with the model does yield results.
[21:01]
A
Well, it feels like we're at a moment, kind of like when we went from GPT 3.5 to ChatGPT 3.5 was a model, extremely capable model, but it was still effectively a base model. And I was A prompt engineer at the time and knowing how to prompt that, I could get great results for it. But it took all those little tricks to sort of understand the context. Then when we went to ChatGPT and we understood, okay, we know the kind of problems people are trying to solve, let's make it a little bit easier for them to get there without having to do that. It feels like that's kind of where we're heading into a science, though, that now that you have people like Alex explaining the problems you're trying to solve and what you're doing, that we may see like a big acceleration with this.
[21:39]
B
I think it's probably just a characteristic of any question that's on the frontier of, or sort of at the limit of what the models can do. And back with GPT 3.5 and early versions of 4, the questions that were at the limit of what the model can do were much more basic. Now they're questions of scientific research. But when you're operating at the frontier, the pass rate will be low. And so you got to kind of. There's value in sticking with it and trying a few different things and taking the parts that it gets right and refining them while telling the model where it got other things wrong.
[22:17]
C
In this example, I mentioned it needed a warmup, but the warmup was the obvious warmup that you would do as a human. Because actually, when I was attacking this problem, I wasn't thinking about the black hole case first. This flat space limit was the obvious place to start. And that is where I began. And so I think the models are actually really good, but we could get better at making them flat. Think of the warmup problem themselves so they can go there directly. But more generally, I think there's this thing we have to bear in mind, which is that as scientists, our role is to push the edge of knowledge. There are things that are just beyond the edge, and our goal is to bring them before the edge of knowledge by understanding them. But this edge is very jagged. So there are very basic questions about the universe, like why are there three dimensions of space? Or what happened to the Big Bang? These are things that everybody wants to know the answer to. And yet, even though they're simple questions, there's really nothing intelligent to say about this. We just don't know. They're very hard problems, actually. And then meanwhile, there are these very hard questions that you would think we wouldn't be able to answer at all, to which we have extremely detailed answers. We can predict the electron dipole moment to. I don't know, 12 decimal places, something crazy. So the edge of human knowledge itself is very jagged and it takes many years of graduate school to learn where the edge is. And I think what we're finding with these AI models is that the edge of their knowledge is also very jagged. So you mentioned there's some basic questions that the models can't answer. That's true. At the same time, there are some very hard questions that they're very well suited for already today. And I think what's exciting is that their edge of knowledge is very jagged in a way that's different from ours. So obviously as time goes on, I think the edge of ability for these models is going to keep expanding. But as long as it expands in a way that is slightly different from our edge, that's also really interesting because at the intersection where it can go farther than us or we can get ahead of it, that's where a lot of interesting things are going to happen, I think.
[24:27]
B
Yeah, human and AI together are exactly much more powerful than human alone or AI alone.
[24:34]
A
I want to explore that a little bit more. But first tell me about the research paper.
[24:38]
B
Yeah, so we've talked a bunch about these anecdotal examples that Alex has gotten from the time that he spent with his colleagues that we see coming in across Twitter on a semi daily basis at this point. And we wanted to sort of bring them together and just write something, publish something about that lays out the current sort of State of GPT 5 with respect to science. And so what we've got, it's a handful of collaborators from inside OpenAI and I think eight or nine academics from beyond our walls across a bunch of different fields. Math, physics, astronomy, computer science, biology, materials science. And the paper is something on the order of 12 sections, each one highlighting a different way that GPT5 is accelerating their work. The goal was not to be hypey and say everything is solved, it's really to say hoverboards for everybody. This is what works, this is what doesn't work. Here's what I tried. In many cases we're sharing the chatgpt, the full share links, the conversation. So you can see the back and forth that the scientist has with the model. And it's meant to be kind of a moment in time to say this is where we are today. And I think we'll look back in 6 months, 12 months, and we'll probably be much further and that'll be exciting. But even where we are today, we've got a section in the paper on a bunch of different examples around literature. Search a section in the paper with a bunch of different examples around acceleration, whether it's calculations and other things like that, and then a section where we actually contribute four or five new non trivial results in mathematics. And a couple of these are small. A couple of them probably could have been papers on their own. And so you go from kind of the mundane but very pragmatic and real bits of acceleration to the more sort of profound GPT5 actually pushing past the current frontier of human knowledge. And so we're super excited about this paper. I think there'll be a lot more to come. We're not the only lab doing great work, by the way. Google has been doing this for a while and I have a ton of respect for what Demis and the team have done with AlphaFold and more. I just think we're at a really exciting time. Ideas in science often have their moment when you have multiple people coming with the same idea, whether it's quantum mechanics like Alex was talking about, or the light bulb. Right now, it's very clear that AI is just beginning to change science and it's going to be an exciting few years.
[27:23]
A
What advice do you have for students and grad students in the sciences? Because I hear people talk about like, oh, we're not going to need scientists anymore, which sounds absolutely crazy. It's not like the telescope got rid of the astronomer, it actually created the astronomer. How do you feel about that and what advice do you have?
[27:41]
C
Okay, I think first of all, it's important to acknowledge there's a lot of anxiety in academia right now that is unrelated to AI. It has to do with lots of changes in the way that science is organized in this country and we're still going through these changes. So I think that talking to young people, there's a lot of anxiety surrounding this. I actually think AI is a really exciting new tool that's coming, that's becoming available. That is going to help a lot because it's just going to make everybody just so much more efficient. As Kevin was mentioning earlier, when you work on a research project, oftentimes you don't know which way exactly to go. You know, you're here, you want to get there, but there are different possible paths, different lines of attack. And the whole point of research is that from the get go, you don't know which way to go. And one of the things that's really fun, actually fun with GPT is that you can just say, hey, I'm trying to solve this here. Are some ideas I have. You can upload some notes that you have or just describe it in a few sentences. And it's very good at getting what you're trying to do. And then you can just say, what if I approached it this way? Or what if I were to do it this way? And it can immediately go off and chart a path through the unknown, just signposting different potential avenues. And that actually saves so much time because, okay, I'm a human, I have a little bit of time, energy. And when I'm going to put in the effort to do a calculation, I spend a lot of time trying to prototype it and think ahead where it's going to take me. And with ChatGPT, I can just launch it in that direction, in that direction, that direction. And it doesn't completely get everything right. But just having these signposts along the way is so helpful because then when you do go down the path yourself, you have somebody helping you along, it feels like. And I think that's just going to make everybody faster, more productive. And already the young people that I meet are spending a lot of time experimenting with ChatGPT and figuring out its capabilities. And I think it's going to be a boon for everyone.
[29:54]
A
You mentioned part of the idea of the paper was to say, okay, this is where we are now. Let's go look in six months, let's talk. We're five years since GPT three or five years from now, we're sitting down here. What are we going to see?
[30:07]
B
Oh, man, the five year question is so hard.
[30:11]
A
It's a great question.
[30:13]
C
Here's a crystal ball.
[30:14]
A
Yeah.
[30:15]
B
I think the exciting thing about this field in general is from you look back 12 months and you're completely embarrassed by where you were 12 months ago. The idea when GPT3 launched, it was unbelievable. I'll speak for myself. It blew my mind, the idea that AI could do any of these things. And then somewhere around GPT 3.5 and 4, the Turing Test, which we had held up for what, 75 years as the pinnacle of artificial intelligence research. Oh, man, the world will be different when an AI can pass the Turing Test. We just went whooshing by and now we just don't talk about the Turing Test anymore. And even you look back to the beginning of this year of 2025, and most people were writing code themselves, most engineers were writing all of their own code. And the idea that you're writing it yourself and now fast forward and you've got the idea that you would do really Much of anything without leveraging codex, Claude Code, GitHub, Copilot, any of these tools, they're all incredible, is crazy. You're so much more productive with it. So just in 12 months. And in 12 months, software engineering has fundamentally changed. I think over the next 12 months we're going to see profound changes in the way that science is done. Both in the stuff that we can do in silico, in theoretical physics and mathematics and computer science. And I think we're going to begin to see it in life sciences, in the physical sciences. That's over the next 12 months, I mean, five years.
[31:55]
A
So yeah, that's a question I think about a lot because when it comes to mathematical proof, I can kind of go into a computer and I can test that and I can verify that or at least test with it to some extent. The same with some sort of equation for physics. But when you get into talking about the life sciences or material sciences and stuff, are we going to have a bottleneck of way more predictions than ways to test them?
[32:19]
B
Well, I think one of the valuable there's so many areas where models can help with life science is if you take biology, drug discovery, for example, you have a huge search space. And the more that models can learn how to prune that search space, the more even if you're going to end up with a bunch of physical, real world experiments to run at the end of the day, if you can intelligently prune the search space, then you can more rapidly converge on the drugs that are likely to work in particular scenarios and then you can think about the impact for that to have real world impact, you need to make it all the way through. The regulatory process that is its own process that AI can help speed up because you end up needing to write these huge papers that bring together tons of different findings and so on. You can take each step of the process and AI can help upfront as you prune the search space and try and find candidates that are more likely to meet your needs and meet the goals that you have. And then as you go through the process to getting this thing out to consumers and making a real world impact, AI can contribute there. And we have pilots with a number of the companies in the space doing that. So it really is fairly broad based.
[33:47]
A
You started off with an interest in particle physics, you were studying that and then you found other things and now you find yourself back in the sciences. Do you think other people are going to follow that pattern?
[33:57]
B
I mean, it is an absolute privilege for me to get to come back and work on science. And I am nowhere near the scientists that folks like Alex and other people here at OpenAI are. But I don't know if something. I think we talk a lot about AGI at OpenAI. Artificial General Intelligence. I think maybe the most profound way that people are going to feel AGI in their lives is through science. ChatGPT is an incredible tool. I use it tons of times every single day. But AGI inside ChatGPT will be able to do lots of things. But when I can have personalized medicine, if AI models can contribute to science, finding a way to do scalable fusion more quickly, those kinds of things will change all of our lives. And I think these are very real possibilities at the pace that we're going. So that's why this is the most exciting thing in the world to me to get to work on.
[35:03]
C
I don't know what AGI will look like, but sometimes the experience you have of giving ChatGPT a really hard equation you're working on, and it just spits out the answer to me. That feels certainly like something approaching that. And I also don't have a crystal ball. And also clearly a bad track record of predicting where AI is going. Given that at the start of the year, I didn't think I'd be here. But there's two things that are simultaneously clear to me. One is the models are definitely going to keep getting better. And sometimes my colleagues ask me, oh, are we reaching a plateau? And that is actually something I was wondering about too. And then I joined OpenAI and I got to play with some internal models that we have that are even stronger. And I was like, okay, this definitely going to keep getting really, really good. And then the second thing is, I think already with GPT5Pro, which is, I think, our best 5.1 pro today, our best model that's available on the outside. I think there's a big gap between what the models can do and what the science community uses them for. And one of our goals here at OpenAI for Science is to start bridging that gap, because I think the models move so fast that unless you're really paying attention, you may not realize how much has changed in just the last few months. And so I think these two facts are true and are going to, over the next year, really lead to big changes in science. The models just keep getting better and people are starting to catch on, and that's why we're seeing all this chatter on Twitter and social media. And that's only going to accelerate. So where that takes us, I don't know, but I'm excited to find out.
[36:44]
A
I think you've both made a very good point in that, is that these models improve at such a rapid pace that sometimes people have a very firm idea of what they are because they tried something six months ago. And I've encountered people who I really respect, and the scientists are like, oh, I tried it. And I'm like, I tried it 18 months ago. And they're not used to a tool evolving that quickly.
[37:02]
B
Yeah. Or they're using the free version because, of course, that's how everyone starts. And the free version doesn't think for as long. And so it can't solve problems that are as challenging. Yeah, I think that's really real. It's one of the reasons that I think the best advice is to just, like, keep trying the problems. Even if you're working on problems and as you try them on GPT5, it, like, isn't super helpful. I wouldn't give up. I would keep trying it every few months. And I think at some point, you know, it's going to start being valuable if it's not already there. Today we talked about sort of thinking time. That's another area that we're really excited to see that with GPT5 Pro, you can get the model. I've seen it think for what, maybe 40 minutes on some of the hardest problems, but it has a certain amount of compute allowance because we have to serve it to many, many, many people. 40 minutes is certainly not a limit on thinking. The models can think for 2 hours, 6 hours, 12 hours, 24 hours. And one thing we continue to see is that pass rate on hard problems continues to improve as you give the models more time to think, which is like. It's surprising, actually, the number of times there's a totally reasonable human, intuitive human analogy to these things. There are a lot of problems that I can't solve in 20 minutes, but that I might be able to solve if you gave me two hours.
[38:33]
A
System one and system two thinking.
[38:34]
B
Yeah, and some that I can't solve in two hours. But if I had a day to really think about it and try different things, I might get there. And the models are the same way. So being able to give a much smaller. You know, there aren't as many scientists in the world as there are users of ChatGPT. If we could find ways to give scientists that really know how to use the models well, just a huge amount of compute, I think that is yet another way that we can accelerate Science.
[38:56]
A
Yeah, it's a very good point because you'll hear people talk about we hit a wall or whatever. And one of the things that was really an amazing discovery, which, you know, a year ago we found out about the whole, the reasoning paradigm and the fact that you can just take the model of today and let it think longer. And we think about, you know, people go, what would we do with all this? Compute. We're building all these. This hyperscaling. It's like even using today's models and letting them think for a long time, we could probably have some amazing discoveries.
[39:22]
B
Yeah, 100%. I think if model progress stopped today, just the process of driving awareness within the scientific community and giving people more of the best that the models can deliver, I think we would see a large amount of scientific acceleration. But of course, progress is not going to stop, as Alex was saying. And so when you think about the models being able to think for a longer time, being able to train them to do harder and harder scientific tasks, and actually also just getting out in the scientific community and helping people see what the frontier really is and how they can use the models better to do the work that they're doing. I'm excited to see where this goes over the course of the next six months. 12 months, 24 months.
[40:09]
C
Yeah. I think this is a really unique time in history. It feels like a special moment. And to be clear, we're not telling people, drop whatever you're doing and come do AI. That's not the message. I think what we want to say is keep doing what you're doing. But also there's this great new collaborator, this new tool you get to use that's going to make it even more fun, and it's going to bring new life into a lot of different fields.
[40:32]
A
One of the challenges right now with benchmarks is that models one, we talk about terms like saturation. It seems like models have done that also. A lot of them just don't seem that impressive anymore. Now it looks like we're moving to the scientific frontier. What does scientific benchmarks look like?
[40:48]
B
Yeah, like with many things, there's sort of an intuitive way to understand this is the models get smarter. Benchmarks are just a way of testing the model in some sense. And as the models get smarter, you need to give them harder and harder tests because they learn how to ace the earlier tests. So if you take gpqa, which stands for Google Proof, Q and A, it's a. It's a scientific benchmark that asks basically PhD level questions across a range of scientific fields. We thought for a long time that was a very hard benchmark to beat. I think it came out in 2023 and GPT4 originally was like, at 39% on this benchmark. Humans, by the way, are at about 70%. But now you fast forward two years, and our latest models are nearly at 90%.
[41:33]
A
Wow.
[41:34]
B
So they're surpassing the capability of most humans in their field of scientific study across every field at once, which is kind of amazing when you think about it. But those aren't the hardest questions in the world. And that's one of the reasons that we're focused on new evaluations that ask frontier science and mathematics questions. We released something called GDPVAL recently, which is an eval that tests the model's ability to do economically valuable tasks. So the smarter the models get, the harder the tests that we want to keep giving them. Because every gap that we see, every place where the model can't answer a certain question, that's feedback for us and gives us a way to improve the.
[42:15]
A
Model, further curing disease. Great. What area, though, beyond that would you really like to see? And it could be crazy or weird or odd. You'd like to see scientific acceleration?
[42:26]
B
You want to go first?
[42:27]
C
Well, I'm very selfish, so I have my own interest. I really like black holes. That's my passion.
[42:33]
A
You want to build a black hole?
[42:35]
C
I think there's a lot of potential for how I can accelerate black hole research. And of course, I want to see it help with cancer and drug discovery and all these good things. But my first priority is, yeah, I want to see more AI helping with black holes. So there's a lot of ideas on the table and so much potential. One thing is there are a lot of theoretical questions that are very thorny. And I think if you just sat down and you could understand everything that is known and you could integrate it, integrate that knowledge, I think a lot of things would fall out of that. And that's one of the things that we're exploring. Dark matter, for instance, is something that we've been talking about because there's a lot of data on dark matter from various experiments, but we still have no idea really what it is. There's a bunch of theories out there, I think, really interesting ideas. Could it be that by feeding ChatGPT all the experimental data that is known about dark matter and all the theories, it could rule some of them out already? By combining bits of knowledge that are just so disparate that it's hard for our human minds to hold Them together. I think that's kind of an exciting frontier. And then I think also since we were talking about the far future, experimental work is totally not out of the question right now. We're focused on more theoretical fields because they can be done in silico. But you could totally imagine using AI to design better experiments and maybe run very hard, complicated experiments, including maybe for black hole physics and other fields. I think there's a lot of ground to explore here and very exciting possibilities.
[44:17]
B
And I'll say fusion, just because if we can actually we have again, small scale, I mean large scale, but small, small existence proofs of it. So clearly it can work. And the challenge now is to do it at bigger scale, more reliably. Clearly it's possible we will figure this out, but if we can accelerate it, then the world with fusion is significantly better place than the world without. We solve a lot of problems if we solve fusion. And I'm excited to see if maybe we can contribute in some way.
[44:55]
A
I think it's easily overlooked by people how much we're dependent upon energy. And if we had the same orders of magnitude improvement on energy production that we had in the last 200 years, what that unlocks. And you think about things that are energy intensive, like desalinization or construction and other things, and when you have really, really, really unbound energy.
[45:21]
B
Yeah, it's incredible. I mean, some groups might need to like, might be looking to build lots of infrastructure for lots of GPUs, for example. Yeah.
[45:30]
C
Who might want to do that?
[45:31]
A
But even, yeah, even beyond that, I think that we're going to probably see from that the infrastructure build out a lot more energy, devoted to energy. And much like mobile phones and laptops made electric cars a lot more efficient because all this money being thrown into battery technology, I think we'll probably see that offshoot.
[45:48]
B
Yeah. And I think anytime you change something by an order of magnitude, the world changes. I think what we've seen over the past year with the way that software engineering has changed, you now don't need to be trained as a software engineer to write meaningful amounts of code. That means you can bring. There are like what, 30 million software engineers in the world? I think now 300 million, maybe 3 billion people can write software, and that's going to fundamentally change things. If we can move, if we can make energy 10 times more prevalent, 10 times cheaper, it will change the world. And I think it's a really high potential place for us to apply the intelligence of our models.
[46:32]
C
If I can add something, we have ideas that we're excited about in terms of the potential of AI to change science. But this is very much not supposed to be a top down effort where we dictate what AI is going to do in the world. We're actually very excited about building the best general purpose AI. And if we release that into the world, then everybody will take it and use it for their own purposes. And for me, I'm a black hole physicist, I want to use AI to further black hole science. But if we're a scientist in another field, I think it's natural to use it for that. And the nature of research is such that it's very hard to know where the next breakthrough is going to come from, really. And so I think our vision is to push this out into the world. I think we could see a lot more adoption than we have today. And once that happens, who knows where the next biggest discovery will come. But that's how we give ourselves the best chance to accelerate scientific discovery.
[47:27]
B
Yeah, it's such an important point. The frontier, or the surface area of science is massive. And this is not about what we can do within OpenAI individually to accelerate science or to accelerate specific scientific projects. It's about giving scientists all around the world AI so that they can accelerate their work. That's how we move science forward faster. So there are pieces I think that we will try and do because it'll help us learn. But the vast majority, what we really want is to see 100 scientists win Nobel prizes using AI.
[48:03]
A
Yeah, it feels like it's not the end of science, it's really the start.
[48:07]
B
Exactly. Certainly there's a science 2.0 moment happening, I think.