Summary8 min read

Latent Space: The AI Engineer Podcast

Episode: Greg Brockman on OpenAI's Road to AGI
Date: August 15, 2025
Host: Alessio (Kernel Labs), Swix (Small AI)
Guest: Greg Brockman (OpenAI President/Co-founder)

Episode Overview

This episode features a deep-dive interview with Greg Brockman, president and co-founder of OpenAI, exploring the release of GPT-5, OpenAI’s new open-source models, the evolution toward AGI (Artificial General Intelligence), the future of reinforcement learning, challenges in model architecture and efficiency, and reflections on the industry’s trajectory. The hosts and Greg discuss not only technical details but also philosophical, organizational, and societal impacts, offering insights for AI engineers and visionaries alike.

Major Themes and Key Discussion Points

1. OpenAI’s Recent Wave of Releases: GPT-5 and Open Source Models

[00:16]–[01:04]
- Greg reflects on an “absolutely wild” week of multifaceted releases: the much-anticipated GPT-5 (the first “hybrid model”), and new open-source models.
- Emphasizes accessibility and millions of downloads for open-source models.
- Expresses pride in the team’s effort bringing multiple advances to the public.

2. From Next-Token Prediction to Reasoning: The Vision for AGI

[01:04]–[04:01]
- The origins of the reasoning team: OpenAI moved beyond simple next-token prediction towards closing the gap to AGI via reasoning and reinforcement learning.
- Quote [01:55]:
  
  “Why is this not AGI?... It’s really hard to describe why...it can answer any question you put in front of it. But it’s not quite reliable...what do we need to do to close that gap?”
  —Greg Brockman
- Reinforcement learning seen as the path to reliability and generalization, as first demonstrated in their 2017 Dota project.

3. The Move Toward Online Learning and Human-Like Learning Loops

[04:01]–[06:44]
- Discussion on the evolution from pure offline pre-training toward incorporating online elements and feedback loops more akin to human learning.
- Models now learn not only from massive pretraining datasets but from inference-time experiences.

4. Bottlenecks: Compute, Efficiency, and the Role of Human Curation

[06:44]–[07:45]
- Quote [06:57]:
  
  “The bottleneck is always compute...if you give us a lot of compute, we will find ways to iterate that make the most of it.”
  —Greg Brockman
- Despite gains in sample efficiency, RL advancements are compute-hungry.
- The importance of clever task curation and leveraging "supercritical" learning (where learning triggers cascades of knowledge integration) is emphasized.

5. Scaling Laws, Compute, and the Limits of Progress

[08:16]–[10:27]
- The team’s Dota experience: scaling compute rather than premature optimization often delivers the biggest breakthroughs, with most “walls” being engineering bugs, not true scientific limits.
- Quote [09:16]:
  
  “You just got to keep pushing it until you hit the wall. And… most of the time, those walls are just bugs and silly things.”
  —Greg Brockman

6. Generalization, Domains, and the Power of ‘Potential Energy’ in Models

[10:27]–[13:20]
- Transferability of reasoning advances from domains like competition math (IMO) to wet-lab science and programming.
- OpenAI’s new models can achieve “mid-tier PhD student” performance in scientific hypothesis generation.

7. Barriers: Reality’s Wall Clock and Biological Scale

[13:20]–[16:21]
- The pace of RL in simulation vs. real-world’s unbreakable wall clock—a tough limitation for real-world deployment.
- Parameter scale: GPT-5 continues to approach human-brain scale (in terms of synapses/parameters), but biological analogies for learning are imperfect.

8. Learning from Biology: DNA Language Models

[16:21]–[18:49]
- Greg’s sabbatical at AHRQ: direct parallels between learning human language and “biological language” (DNA)—same neural net architectures can generalize across both.
- Promising early results for genome-scale models, opening pathways to major bio/med-tech advancements.

9. Clinical and Societal Impacts

[18:49]–[19:33]
- Real-world personal examples: understanding and detecting rare genetic conditions.
- The broader promise of leveraging AI for health and disease discovery.

10. The GPT-5 Era: What’s New and Unlocking “Smart Agents”

[19:33]–[22:51]
- GPT-5 as a leap in reasoning and intellectual depth over prior models—“almost undescribable” new intelligence.
- Quote [19:59]:
  
  “For extremely hard domains...these models are able to perform great intellectual feats. And I think that’s new.”
  —Greg Brockman
- Role as partners in research; accelerating intellectual work in mathematics, physics, and beyond.

11. Best Practices: From Prompting to Model Orchestration

[25:06]–[27:30]
- GPT-5 proficiency comes with skilled use; developers and engineers need to maintain “tenacity” and develop tailored prompt libraries.
- Extraction of maximum “potential energy” from models depends on understanding strengths and weaknesses, iterative experimentation, and agent orchestration.

12. The Future of Agent Interfaces and Developer Experience

[27:30]–[29:56]
- Ideal AI agents are like seamless, high-memory coworkers—integrating “pair” and “async” workflows.
- Approvals, sandboxing, and robust access controls—there’s an OS-like layering for agent security.

13. Agent Robustness, Instruction Hierarchy, and Safety Paradigms

[29:56]–[32:55]
- OpenAI’s defense-in-depth approach: instruction hierarchy, system/user message distinction, sandboxed execution—all analogous to OS/hardware-level security rings.
- Shrinking the “spec-to-reality” gap in model behavior; community-driven evolution of model specs.
- Quote [31:48]:
  
  “The model spec is a perfect example of when the models are very capable, you start to really care about what they’re going to do.”
  —Greg Brockman

14. Psychohistory, Defaults, and Model Socialization

[33:36]–[36:25]
- Models as products of psychohistory—trained on the collective “thoughts” of humanity.
- Personalization and personality-narrowing during RL phase; GPT-5 is the most personalizable model to date.
- Collective vs. individual intelligence analogies: “These models are less like a human and more like a humanity.”

15. Orchestration, Routing, and Model “Menageries”

[39:20]–[41:58]
- GPT-5 uses a routing meta-model, selecting between reasoning and non-reasoning “experts” to optimize speed and capability.
- Composite architectures—menageries of models—may rival monolithic AGIs for flexibility and efficiency.
- Quote [41:12]:
  
  “It’s much easier to have a small, fast model that’s less capable...coupled with a much more expensive reasoning model… You kind of get adaptive compute.”
  —Greg Brockman

16. Model Architecture Innovations and OSS Models

[47:36]–[49:38]
- OpenAI’s open-source model architectures are tuned for practical constraints—memory, batch size, deployment environment.
- Highlights of practical engineering choices over bleeding-edge architectures to maximize usability and accessibility.

17. Edge and Cloud: Hybrid Future of AI Model Deployment

[49:11]–[50:04]
- Envisions future where local and remote models cooperate (“local model that then delegates to a remote model”) for privacy, reliability, and performance.

18. On American Open Source vs. Global Competition

[50:10]–[51:32]
- Emphasizes American leadership in both business and values via open-source releases.
- Ecosystem-building: tech stack, chips, cloud, values—ensuring global influence and interoperability.

19. AI Productivity, Team Structure, and The Value of Engineers

[51:34]–[55:54]
- AI models taking on more “middle” engineering tasks, while hardest system design and cross-team communication remain human-centric (for now).
- Productivity increases point toward doing “100x more things” as models automate away more boilerplate and enable engineers to focus on harder problems.
- Quote [55:54]:
  
  “We are producing technology...that underpin the biggest machines that humanity has ever created. At some point the dollars that go into these data centers starts to be an abstraction...”
  —Greg Brockman

20. Abundance, Economy, and Compute as Future Resource

[65:17]–[66:59]
- If abundance becomes the norm, compute itself may be the resource that replaces money in importance.
- Distribution and access to compute will shape opportunities and societal structures in a post-AGI era.
- “There will always be more return on more compute.”

21. State of AI Research and the Importance of Diverse Approaches

[58:42]–[60:40]
- Each AI lab maintains distinct perspectives; OpenAI’s strategy was to align on vision early, allowing deep focus and rapid execution.
- Field remains vibrant, with both “convergent evolution” and disruptive jumps possible.

Notable Quotes & Memorable Moments

On Scaling and Walls:
“Most of the time those walls are just bugs and silly things. And so you can keep going.” —Greg Brockman [09:16]
On the Role of Compute:
“You give us a lot of compute, we will find ways to iterate that make the most of that compute.” —Greg Brockman [06:57]
On GPT-5’s Leap:
“The intellectual leaps these models are capable of assisting humans in is something we’re just starting to see.” —Greg Brockman [21:02]
On Generalization:
“These models are less like a human and more like a humanity… so many personalities embedded, our goal is to elicit that personality.” —Greg Brockman [35:14]
On the Future Economy:
“At some point the dollars that go into these data centers starts to be an abstraction...what is $50 billion? $100 billion?...it’s beyond almost the scale of human comprehension.” —Greg Brockman [55:54]
Advice to Young Engineers (and Himself):
“Just the most exciting time to be in technology… problem availability will grow over time rather than shrink.” —Greg Brockman [67:35]

Timestamps of Key Segments

[00:16] — Greg Brockman on the “maelstrom” of OpenAI’s release week
[01:31] — The origins of OpenAI’s reasoning team; realizing chat capabilities of GPT-4
[04:31] — Human vs. model learning loops; online learning discussion
[06:57] — Compute as the central bottleneck; role of sample efficiency and RL
[10:27] — Scaling compute; pushing against “walls”
[13:20] — Simulation speed and real-world time as future barriers for RL-based agents
[16:34] — DNA neural nets: applying LLMs to the “alien language” of biology
[19:59] — What’s truly “new” in GPT-5; domain leaps like IMO-level proofs
[25:06] — How to extract maximum value from modern models; prompt libraries and agent orchestration
[27:51] — The importance of robust, memory-efficient, and security-tight agent infrastructures
[31:48] — Model spec and the need for explicit, legible intentions for capable models
[35:14] — Model “psychohistory” and personalization
[41:12] — GPT-5’s router approach and the “menagerie” of composable models
[47:36] — Architectural considerations for OpenAI’s open-source models
[49:38] — Hybrid architectures: local and cloud models working in tandem
[50:28] — The strategic value of American open source models
[51:57] — Engineer productivity, AI adoption, and organization design
[55:54] — The scale and abstraction of compute and economic change
[65:23] — Compute as a new societal resource; post-AGI abundance economics
[67:35] — Advice to his younger self: “problem availability grows, not shrinks”

Summary Takeaways

OpenAI’s Road to AGI is marked by relentless scaling, “reasoning” breakthroughs, and a philosophy of pragmatic engineering over theoretical constraint.
GPT-5’s key unlocks are in reasoning, depth, and forming intellectual partnerships with humans—the agent and model leap is real for high-difficulty domains.
Compute is king, both as constraint and future key resource. Architectural and efficiency improvements directly translate to broader access and impact.
Model safety and alignment are growing central; explicit model specs, robust sandboxing, and community-driven values are developing in parallel with capability.
The future of AI engineering will shift toward composing, orchestrating, and customizing these “intelligent instruments”—while human ingenuity moves further up the ladder into product and societal design.

For the full transcript, show notes, and links to further OpenAI developments, check: latent.space

Loading summary

Transcript142 lines

[00:00]
Greg Brockman
Foreign.
[00:05]
Swix
Welcome to the Litinspace podcast. This is Alessio, founder of Kernel Labs and I'm joined by Swix, founder of Small AI.
[00:11]
Alessio
Hello. Hello. And we are so excited to have Greg Brockman join us. Welcome.
[00:15]
Greg Brockman
Thank you for having us. Excited to be here.
[00:17]
Alessio
You need no introduction, so I was like mentally going to introduce you. I just skipped right to it. Congrats on GPT5, GPT OSS, like all the stuff that's going on in OpenAI lands. We're going to get to all that. It's really good to have you here. How does it feel? Last week was like a whole maelstrom of releases.
[00:34]
Greg Brockman
Wild. It was absolutely wild to get so many things out in one week. But yeah, so we've released our open source models, which are models that we've been working on for some time. I think really pack in a bunch of the advances that we've been making at OpenAI into a very small form factor, very accessible now being used by. There's been millions of downloads of that just over the past couple days. We also released GPT5 again, something we've been working on for a very long time. And so just having these out in the world world and really having done that release process is something that I'm just really proud of the team for doing.
[01:05]
Swix
And GPT5 is the first hybrid model, so most people don't get to choose one model and that's a whole lot of drama we will knock on. But you started originally the reasoning team with Ilya at OpenAI, so maybe can you just give a quick history of Reasoning at OpenAI? So you started with just, you know, next token prediction and then at some point you thought reasoning was something important to build. What was the path from there to GPT5 where now it's like kind of hidden from the user?
[01:32]
Greg Brockman
Well, I'd say that after we trained GPT4 we had a model that you could talk to and I remember doing the very first, we did the post training, we actually did a instruction following post train on it. So it was really just a data set that was here's a query, here's what the model completion should be. And I remember that we were like, well what happens if you just follow up with another query? And it actually was able to then have a response that took into context the whole previous chain of question and answer. And you realize this thing can do chat, right? It can actually talk to you, it can actually leverage all of this information even though it wasn't trained to do it. And I Remember, we had this question, we had a research meeting with a bunch of people, Jakob, Ilya, Wojciech, others. And the question was, why is this not AGI? This model clearly is not AGI, but it's really hard to describe why it's able to answer any question you put in front of it. And, okay, it's not quite reliable. It makes mistakes, it falls off the rails. Okay, that's a real gap. And so what do we need to do to close that gap? And the most obvious thing you need to do is actually have it test out its ideas in the world, actually do reinforcement learning, like try out some hypotheses, get some feedback, and from there become reliable. And this is not a new idea to us. Right. If you rewind to even 2017, we were working on Dota, which was all reinforcement learning. No behavioral cloning from human demonstrations or anything. It was just from a randomly initialized neural net. You'd get these amazingly complicated, very sophisticated, very correct behaviors. And it's like that's the reliability we wanted from our language models. So really, the moment we trained GPT4, we knew that we needed to get to the reasoning paradigm, and it was just a question of how. So we had like 10 ideas, a bunch of different hypotheses about what might work, and people really set out to go and try to make it be reality. And so it was really the labor of many people at at OpenAI across many years. And I think the way that progress in this field works is you need to have conviction on a direction, and the first 10 things you try will fail. And most of the things on that list of 10 did not succeed, but we made one of them work. And I think that that's the real key is that we just keep pushing and pushing and that you get little signs of life and you keep growing from there. And so now Jerry runs our reinforcement learning team and has made really great strides there. There's really amazing infrastructure work. People like Wenda and people from the inference side, people like Felipe. There's many people across OpenAI that all come together to really make this work.
[04:01]
Alessio
Yeah, amazing. I was going over, when you were with me on the AI Engineer conference, you talked about the Turing paper, which you love, and got you started in some ways on your machine learning journey. I think actually he kind of anticipated that the learning machine would be partially online in. I think that's one of the questions I always had when reflecting on this journey. From 3, 4 to 5, learning started all offline and all pre trained and now it's slowly coming online. Do you think that's accurate?
[04:32]
Greg Brockman
Yeah, I think it's a very interesting question. Where does the learning happen? And I think we're still not at the full learning loop that humans do. Right. Which it's also not really clear are humans fully online? Because it's like you go to sleep, there's a lot of, of sort of back propagation, so to speak, that happens into your long term memory. So I think that exactly how humans work is not necessarily represented by how our machines work. But we are moving from a world where it's just you go and train once and then you're inferencing a ton to a world where there's actually this loop of you inference and you train on those inferencings. And one thing that Ilya used to say a lot, I think is very, very astute is that when the models are not very capable, that the value of a token that they generate is very low. When the models are extremely capable, the value of a token they generate is extremely high. It's something that's very thoughtful. It's something that's important. And reinforcement learning has this property that you're generating a bunch of data because the model's trying stuff and then you train on that data. And so somehow the model's observations, also normalized by contact with reality or somehow selected by, by contact with reality, get fed back into the machine. And that is, I think, something that we're starting to get very good at learning from. And the scale required is very different, right? That if you look at pre training, your, your 10 examples of something doesn't go anywhere, right? You're talking hundreds of thousands of any little type of, of behavior. And that's what you learn from, which is totally, totally unlike how humans learn. Again, I think, right. If you're, if you think about recapitulate all of evolution and also think about your 20 years worth of development, there's a lot of just observing the world that happens. There are lots of bits of information that kind of flow through your senses. But with the reinforcement learning paradigm, if you have 10 examples or 100 examples of something, 10 tasks that you're supposed to do, and the model tries a bunch of times that it's actually able to learn from that. And so you really get this leverage out of the human curator creating those tasks and are able to actually get very sophisticated behaviors from the models. And now there's a next step of just having a model that as it goes is learning online. We're not quite doing that yet. But the future is not yet written.
[06:45]
Swix
We had this discussion with Noam Brown about simple efficiency. Do you feel like today the bottleneck is still the human data curator that creates these like great tasks for RL to work, or do you feel like it's still the simple efficiency of the model?
[06:57]
Greg Brockman
Well, the bottleneck is always compute, right? And I mean that in real way, right. It's just like it's very clear that if you give us a lot of computer that we will find ways to iterate that actually make the most of that compute. We are in a world where right now we now have much more sample efficient algorithms, right, with, with the RL paradigm. But it does take a lot of compute still, right? It's like that you have like one task a human created or 10 tasks, or 100 tasks or some small number of those. And then you have a model that tries a bunch of times, not just one time, not just 10 times, but 10,000 times to try to accomplish one task. And you select from those and you learn from from that. And again, it's like the amount of leverage you get as a human designer there is extremely high. But the amount of compute that you have to pour in in order to make it work grows proportionately.
[07:46]
Alessio
I would say one way to expend more compute in the learning process. Alan Turing actually foresaw a lot of this. He had this concept of supercritical learning instead of subcritical learning, meaning we present learnings to machines or teach things to machines. They learn just the immediate thing that we just taught. But supercritical means you also think through the second and third and fourth order effects of whatever you just learned, like to update the rest of everything else that you know. So what are the creative ways in which we spend more compute, right? Like if we had 10x more compute or 1000x more compute, where does it go?
[08:16]
Greg Brockman
I'll just say we will find ways to utilize it.
[08:19]
Swix
Please give us.
[08:22]
Greg Brockman
But I mean it kind of seriously, right? The way that this works, like if you rewind to something like Dota. We set out to develop new reinforcement learning algorithms because it was very clear to everyone that reinforcement learning, the algorithms that existed at the time, did not scale. Everyone knew it. And I remember Jakob and Shimon saying, why do we believe that? Has anyone actually tested it? And no one had actually really tried to scale up just plain old fashioned ppo. And so they're like, well, that's the baseline. We got to do it. And I remember you come back to the office Every week they double the number of cores. And suddenly the agent, the true skill, was going up and to the right. And it's like, okay, you just got to keep pushing it until you hit the wall. And clearly we'll hit the wall and then we can go and do the actual interesting stuff. And we never hit the wall. And you realize that actually the journey of that scaling that is the interesting stuff of really doing the engineering. And of course you have bugs and those bugs cause a wall, but you fix the bug. You have different issues with how your neural net's initialized or the scale invariance or whatever the issues are, but those are not the fundamentals of the algorithm, of the science. And so I think that's kind of the world that we're in is one where it's like we will push on every dimension and maybe we hit a wall. Most of the time those walls are just bugs and silly things. And so you can keep going. Sometimes the ROI for fixing those is really hard. So it's like, it's not really worth it because you have a different dimension. Right. Do you want to push the model to be larger and do more pre training compute or do you want to do more RL and so push more compute to the actual test time? And there's all sorts of dimensions that you can put compute into. And in some ways I think of compute as this. Like, you know, we're doing this refining process. Ultimately start with energy turns into compute, turns into intelligence. And it's almost crystallizing that compute into potential energy that can be converted into the model doing something useful. It's a really beautiful thing. Right? It's like the compute as this fundamental driver, this fundamental fuel of intelligence. And it sort of shapes a neural net, it sort of outputs a program. And of course the nice thing about that program is you can run it many, many times. Even though you port all this compute, in fact, that you actually have this amortization that you're going to use it far more times than the amount of effort you put into creating it once. And so it's just like a, it's a beautiful paradigm.
[10:28]
Swix
Yeah, you're kind of turning kinetic energy into potential energy in the model. And do you feel like the energy that it's already in this model we can then turn back into kinetic to do it all in every other domain? Because we got the IMO gold. I mean we in the. You, you guys.
[10:44]
Greg Brockman
I think it's a.
[10:45]
Swix
For everybody. Do you feel like those same techniques and the same base models can Then get us to the IMO gold equivalent if every other domain, if we just scale the compute. Or do you feel like there's still some work to do?
[10:58]
Greg Brockman
Well, we have pretty good evidence on things like the IMO models actually also getting us a goal in IOI which is the same. Yeah, I mean, I think we did like, I think we talked about the harness. There's a little bit of difference in the harness, but like the harness is not the gold literally. Right. It's like the actual underlying models and there's no training there that we did specifically. This ended up being just a side project of a few people who were like, oh, we may as well do ioi. Right. And it's just a wild fact to me because that used to be something that would be a total grand challenge. Many, many people working on. And the core IMO team at OpenAI was actually three people. Right. Wasn't this massive effort. And so you realize that there's maybe some specialization required for some of these domains, right? Maybe some amount of additional work, some amount of go gather a data set. But fundamentally we have this general purpose learning technology and that learning to solve hard problems is actually a very transferable skill. Learning how to solve hard math problems and write proofs turns out to actually transfer to writing programming competition problems. Now, if you've never run a physics experiment, if you've never actually gone and tried to mix together some chemicals or something, you're probably not going to be magically good at those things. And so that there is something about the limitations of generalization that you do need to actually have some real world experience and try it out. But these models, they go almost unreasonably far already. And we see this all the time where we have wet lab scientists who took models like O3, ask it for hypotheses of here's an experimental setup. What should I do? They have five ideas. They tried these five ideas out. Four of them don't work, but one of them does. And the kind of feedback we were getting on O3 was resulting work is something that could be published in a mid tier journal, not the top tier journal, but a mid tier journal would be kind of the work you'd expect from some sort of third year, fourth year PhD student. And again, it's just a wild fact. That's where we are with O3. And we see exactly how to improve O3 on all dimensions. And it requires compute, it requires a lot of work, it requires getting the task, it requires a lot of human intellectual love and labor. And Time and really pouring our heart and soul into it. But the result to your point, it's like we produce this thing that has all this potential energy within it. And then the amazing thing is that you don't release that potential energy once. It's a checkpoint that you can use many, many times across all of these tasks. And that is something that I think really can uplift all of humanity.
[13:20]
Alessio
That's so inspiring. I wanted to backtrack on two things. One, about the wall. One thing I was trying to get into this debate with Domon was I think there is a wall in terms of wall clock time because time has to pass. The problem with RL interacting with environments and simulation is sure you can speed up the simulations faster than real time. At some point you have to match wall clock time. So you can see us converging towards the pace of iterations towards wall clock time. In terms of getting closer and closer to modeling the real world. I don't know if you have any thoughts on tackling that. Obviously we're not there yet, so we don't have to worry about it.
[13:57]
Greg Brockman
Yeah, I think this is a pretty fundamental barrier. Right. And of course the models have very non human affordances. You can run many copies of them and so you can scale out even if you can't decrease the latency. And it's also very interesting to think about where the compute goes. Right. Because we're going to move from a world where most of the compute is training the model. As we've deployed these models, more, more of the compute goes to inferencing them and actually using them. But then if you think about, well, you're going to have these models that are going to be interacting with the real world a lot and so they should probably think a lot about every single action. So you might end up with tons of compute spent per real world interaction. And so it really shifts around where you'd expect the compute to actually be expended. And I think that really having good harnesses that are very efficient. Right. Do you think about things like if I have been taking a bunch of steps in some rollout in the real world, how do I checkpoint that? And if you have a system that you need to restart it and it's going to forget all of its current state, that's probably pretty bad. And so I think that there's just some, something very different about the digital world where everything can be perfectly observed and checkpointed and preserved as opposed to reality that's much more messy and complicated. And I think it's not a bad thing. Right. I think that we've seen agents with things like Dota that are able to operate in very complicated, very messy environments. So the algorithms are capable of it. And by the way, Dota was like a 300 million parameter neural net. Tiny, tiny little insect brain. Right now we're starting to scale up to things that are much more comparable to human scale in terms of number of parameters, maybe in terms of number compute, we're not necessarily quite there. I think you could look at the math in different ways, but fundamentally we are making progress towards the real goal. And if you think about what an AGI should be, it should be something that is capable of interacting with the real world in ways that are very productive.
[15:51]
Alessio
Yeah, back off the envelope. I think that the numbers I have in my head, you can correct me if I'm orders of magnitude off, but it's something like humans have 100 trillion neurons. We're in the multiple low double digit to high single digit range for GPT 4, 4.5 and 5. But we're not confirming that, but we're scaling there.
[16:10]
Greg Brockman
Yeah, I'd say 100 T synapses, which kind of corresponds to the weights of the neural net. And so there's some sort of equivalence there. Yeah. And so we're starting to get to the right numbers. Let me just say that.
[16:21]
Alessio
And then just on a biological basis, this is an opportunity. I didn't get to ask you last time on what you learned from AHRQ Institute. You had a sabbatical there. I'm curious if that informs anything that you do at OpenAI now.
[16:34]
Greg Brockman
Well, the thing I found most remarkable about working on DNA neural nets is that they're exactly the same. It's just you replace human language.
[16:43]
Alessio
It's even like a simpler vocab. It is, yeah.
[16:45]
Greg Brockman
You've got four letters.
[16:48]
Alessio
But don't you tokenize at a higher level?
[16:50]
Greg Brockman
Yeah, I mean, you can, but actually the way that we approached it was we just did character level. Character level?
[16:55]
Alessio
No way.
[16:55]
Greg Brockman
Yeah. Why not?
[16:56]
Alessio
Well, I guess there's no reason.
[16:59]
Greg Brockman
I don't know.
[17:00]
Alessio
There's only four.
[17:01]
Greg Brockman
Right. And this, to me is, I think, the core. One of the interesting things about human language is we understand the semantics. Right? We kind of understand what it means, what the structure is. It's very easy for us to observe. We kind of have a sense of, when you look at a tokenization scheme, you have a sense of, did you capture all of the words in a reasonable way? And all this stuff, biology, it's an alien language. And the thing that's very interesting is that for humans it's an alien language. But if you look at a neural net, why should human language be any more natural to a neural net than biological language? And the answer is they're not, right?
[17:36]
Alessio
That actually these things are literally the same hardware.
[17:39]
Greg Brockman
Exactly. And so one of the amazing hypotheses is that it's like, well, these neural nets, they can learn human language just fine and so they ought to be able to learn biological language just fine. And we really see the same kinds of results. Right. It's like I'd say that maybe the neural net we Produced is a 40B neural net trained on 13 trillion base pairs or something like that. The results to me felt like GPT1 maybe starting to be GPT2 level. It's accessible and applicable to downstream tasks across a wide range of biological applications. Not yet a GPT3 or GPT4, not a GPT5 for sure. Right. We're not able to solve super hard problems in these domains just yet, but we've got compute, we've got the right techniques and algorithms now we need to scale, we need to think about long context. There's different ways that the biological systems stress the models relative to language sequences. Like language sequence of a billion tokens doesn't really exist, but it does in your DNA. Right. You've got like 4 billion base pairs or something like that. And so you kind of have some sort of different emphasis, but fundamentally it's the same problem you need to solve.
[18:50]
Alessio
Is there an application that you're most excited about, like drug discovery or. Obviously I think everyone goes to drug discovery, but maybe some intermediate thing before that that is reachable and very impactful.
[19:00]
Greg Brockman
Well, I mean, at a personal level, so my wife, we've talked about this, you know, I've talked about this publicly before, has a genetic condition called Ehlers Danlos syndrome. It's something that until very recently, I think we're starting to see genetic markers for it, but it's been kind of unknown exactly what causes it, where it comes from. And that is something where, if you have better tools for understanding biology, you should be able to identify the markers for lots of different diseases. And so that's just like one example of the kinds of applications of the promise that exist within these neural nets.
[19:33]
Swix
How would you characterize the beginning of the GPT5 era? If I think about 3, 4, 5 as the major versions, I think 3 is very text based, kind of like early chef, really getting started. Four is multimodality and all these different low latency, long thinking. Widow 3, what's going to be the 5 flagship thing? Obviously the year of Agents, right? That's the meme. But is there something else that comes to mind that people should think about? Okay, with five, now we unlock X.
[20:00]
Greg Brockman
Yeah, I think it's smart. I think that the intelligence of these models is starting to be just almost undescribable, right? It's like there's still limitations, there's still ways in which they fail. But it really is the case that for extremely hard domains, like look at the IMO results, right? So you can take a model that's been trained on this reasoning paradigm and it's able to write proofs that is at the level of the best humans, right? And it's like in this specific domain there's limitations, et cetera, et cetera. We haven't proven like an unproven theorem, any of that stuff, but. But it's real. It's undeniable at this point that these models are able to perform great intellectual feats. And I think that's new. GPT4 I think was much more. It was kind of capable and commercially useful across a wide range of applications, but the ideas that it produced were not very deep. The problems it would solve it was not very reliable at. And I remember for GPT3, actually trying to teach it how to do even basic stuff, right? That like we kind of realized, hey, you could do this few shot prompting, so you kind of showed a few examples of something and then I'll basically kind of do that task. And so I was like, okay, can you just teach this thing to sort a list? And I gave it like seven numbers to sort. It didn't sort it. I was like, okay. Then I tried to write a whole script of like, I'm a teacher teaching you how to sort numbers. Here's an example of sorting two numbers and then three numbers and whatever. And I'd be like, okay, now here's five numbers and total flop. If you ask GBD5 that. And I've not even tried, by the way, asking GPT5 to sort of this to five, you know, arbitrary numbers. But I am like certain it will do a perfect job of it out of the box. No problem. By the way, it does have access to Python tool as well. But the point is that the intellectual leaps that these models are capable of assisting humans in is something that we're just starting to see. We started to see it with O3 and you can see professional mathematicians starting to kick the tires on GPT5. We've seen physicists starting to kick the tires in GPT5 and say that, hey, this thing was able to get. This model was able to re derive an insight that took me many months worth of research to produce. And that's the kind of thing where it's like, you realize this will speed you up so fast. Right. I remember doing my own wrath research back in high school and at the beginning of college, and I'd spend just like so long just trying to manipulate these objects in my head and think about connections between things. And if I had a partner that I could actually talk to about this, who would actually spend the time, deeply understand what I'm thinking about and produce new insights off of what I'm suggesting, that would have just sped me up so much. It would have been so much more fun. Right? Because you don't just kind of get caught in this loop of just sort of thinking about it off on your own and thinking you're like, wait, I already thought this thought two weeks ago. And so I think that there's just something new about pushing forward the intellectual frontier together. As a partner with GPT5, do you.
[22:52]
Swix
Think people are limited by the difficulty of the problems that they work on? I think for me, in Cursor and in Code Codex, it feels clear that the model is better when I give it hard tasks. I feel like a lot of people put screenshots on X and it's like, oh, GPT5 is not that much better. It's like, well, the question is not that hard. What gave you such confidence when you called it the best coding model in the world? Obviously, you're one of the best coders in the world, so game recognizes game. But for people, how should they really think about evaluating these models?
[23:21]
Greg Brockman
Yeah, so there definitely is a saturation on certain tasks. Right. If you're just going to chit chat and say, hello, how are you? There's only so many things you can say if you're going to say, here's the Riemann hypothesis solution, please. Okay, yeah, there's like a broad range of intelligence that will be desirable there. And of course, most tasks are somewhere in between the two of these. And I think what we've observed is that we've seen GPT5 be able to solve intellectual problems, sort of tasks that require deep intelligence, much better than any other model that we've tested. The second thing we did was we really spent a long time seeing how are people using it in interactive coding applications. And just Taking a ton of feedback and feeding that back into our training. And that was something we didn't try as hard in the past. Right. For something like O3, we really trained it with tasks that we'd set up once and the model, we'd see it go up and to the right on all of our metrics. It'd be great at code forces, you know, competitive programming competitions, which is again very exciting, but it's not reflective of how you actually program. You actually program in a much more messy way, right. That you have some sort of repo, that has some sort of local state and that has different abstractions and just like different versions of different libraries. And that sort of diversity isn't something that magically arises from a very structured. Here's this one specific task, 10 specific tasks you need to accomplish. And so a lot of what we've been focusing on is saying not just how do we push the intelligence, although that is always going to be the core, but also how do we connect the intelligence to real world applications and so that it really got to experience being pushed out of its comfort zone, out of its ivory tower, and actually be able to see the messy reality and diversity of the real world.
[25:06]
Swix
Yeah. What are suggestions on a more practical level that you have on getting the potential energy out of these models? So part of it is adding, you know, the linter, the type checker, the task to like have it self loop any other meta that developers should think about. How do you use the models?
[25:21]
Greg Brockman
Well, the number one thing that I've observed is that there is a real skill in extracting the most from these models and it requires this tenacity of really trying to almost understand the shape of the model's skills and weaknesses. And so you test it, right? You test it with something small, you get a little feedback, you test a little bit higher, try to give it some bigger tasks, try to see if it can work in a certain way. And I think that people usually have their library of different prompts, right? So I definitely have my library of prompts that I've built up since the GPT4 days. Like I remember in advance of GPT4, starting to gather up a couple of like, okay, I wonder if it'll be able to do this. You know, you have some sort of query that importantly you want queries that could have a range of different answers that don't have any one specific right thing. And so for example, on creative writing, I'd like to ask for like a mashup of Lord of the Rings and Startups, right? Just like try to push together two different topics and see what you get in terms of actually testing the model and pushing it. I think that I do a lot of trying to think about, okay, like how do you first of all break up tasks and have something that's self contained that you can let the model run with? Because you don't want to just have one instance of the model operating. You want to have multiple. Right. You want to be a manager of, not an agent, but of agents. Right. And so that you need to first of all think about how your code base is structured, but then actually go and try to push the model to say, can you actually operate it on, you know, these multiple different pieces of your code base? I think that people love doing front end vibe testing. GPT5 is very good at front end, turns out. But of course that's not what most developers spend their time doing. And so it's important not to overfit to that. But I think that maybe just getting a feel for the model and kind of starting to become in tune with its strengths and weaknesses and viewing it almost as an extension of yourself. And know often another thing I'll do is just be kicking off tasks to the model that are sort of not on the critical path while I'm thinking about some super hard thing that the model, for whatever reason, I don't want it operating on. And so I'm just constantly getting information back on. Just like, okay, was it able to do a thing? Or it's just like low risk if it like makes a mistake. Because I don't feel like I had to sit around waiting for five minutes and then, you know, sort of get no return.
[27:31]
Alessio
You've always mentioned, I think, that the roadmap for codecs and OpenAI's coding capabilities, since we're there, is that the background sort of suite agents kind of merge with the in IDE agents. How's your thinking involved there? Is it just as simple as the IDE can call the background APIs and the background APIs can sort of export to the IDE? Or what's a deeper connection in that?
[27:51]
Greg Brockman
I tend to think about AI productization by analogy to a coworker. What do you want out of a coworker who's a great programmer? Right.
[28:00]
Alessio
You don't slack them.
[28:01]
Greg Brockman
Yeah, exactly. So you want to slack them, but sometimes you're like, hey, I kind of need help with this thing. Can you come over and look over my shoulder? Right? And like, hey, could you, you know, take the keyboard? Exactly. So you want the pair form factor. You also want the remote async form factor. And you want it to be one entity that has knowledge and memory across all of this. You don't want it to be a junior programmer who shows up every day being like, okay, I forgot everything. Can you remind me how the. How to SSH into the whatever. Right? So I think all of that has to happen, right? That you need AIs that have access to your infrastructure in a trustworthy way. Right? A way that you can audit. One thing that is different about these models is that they're fine being micromanaged. Turns out humans don't like that very much. Right. If you look at every single command that they're running and you demand reports on everything they did, probably you're not going to retain that person. But the models are perfectly happy to. And so that's an affordance that's well worth thinking about and changing the interfaces to take maximum advantage of at the same time. Yeah, you really want the seamless blending between a model that's able to do a bunch of work on its remote machine, doesn't mess up my local state. Fully sandboxed, fully observable, and then sometimes can be like, okay, I'm ready to run something locally. And that depending on what that is and depending on how sandboxable it is, that you can do one off approvals, you could give it full delegated access. And I think that having the human be in control of this observability and to be managing this team, an agent that has just different surfaces, the identity of the agent being something that runs locally versus the agency being something that runs remotely. To me, that's the wrong question. It's really the agent should be this model that's executing and then requesting to run things in a remote sandbox or locally or maybe multiple sandboxes, or maybe it's running on your computer and my computer, there's no reason that it has to be local to any of these things.
[29:57]
Alessio
Yeah, software agents, you can just sort of seamlessly and fluidly move around. You mentioning approvals gives me a chance to spotlight my friend Fouad, who is helping to start the agent robustness team that was also launched at AI Engineer. What's that? What's OpenAI's interest in that?
[30:12]
Greg Brockman
The way we think about agent robustness is through defense in depth. There's a layer of the model itself. We publish techniques like instruction hierarchy. And so with instruction hierarchy you sort of indicate that, hey, there's this message is from the system this message is from the developer, this message is from the user and that they should be trusted in that order. And so that way the model can know something that says ignore previous instructions from a user. I'm not going to follow that. Right. And so I think that having like it's almost like thinking about how we prevent SQL injections, right? Having systems at a low level that are robust against these attempted exploits is very important. But that's not where you stop. You want multiple layers of thinking about the system controls, right? If a model is sandboxed and isn't actually able to execute something or access a specific piece of data, then you have full guarantees around what's possible. And there's various levels in between of approach that we take. And so I think that a lot of what is the frontier as these agents get become more embedded in our lives and are trusted with more responsibility is also increasing the safety and security of them in lockstep.
[31:20]
Alessio
There's an analogy that I make to the Linux kernel OS rings as well and it's really interesting that we're basically kind of building this in to the LLM as concept of different layers of security. And also the other thing I also was very happy to see was that I invited a talk on the model spec for AI engineer and that was the most viewed talk that we've ever had, which is hard to safety and reliability. Sexy.
[31:49]
Greg Brockman
I think the model spec is a perfect example of when the models are very capable, you start to really care about what they're going to do. That becomes the most important question. And the model spec is an example where we've made it very legible to the outside world what our intention is for this model to do. And it doesn't mean that we always produce a model that is capable of following that. But it's a North Star, right? It's something that really sets this is the intention and anything that deviates from that is not through our explicit effort. It's anti to our explicit effort. And I think that the gap between the spec and the actual behavior is shrinking very very constantly. The thing that's very interesting is almost like values, right? It's really thinking deeply about well, what should a model do if you ask it a controversial question, right? If you say I think that the world is flat or whatever. Like is it supposed to say yes, it's flat or are you supposed to be like well like here's what science says and honestly these things are subtle, right? That it's not really clear what the right thing is. Just on two minutes of thinking about it. But if you read the spec, you can actually really see the thoughtfulness that has gone into it. And it's not the final answer. Right. It's something we want feedback on. It's something that we want to produce collectively as a community.
[32:56]
Swix
I know we want to talk about open source next too, but I had a more esoteric question. I was listening to your old Lex Friedman interview and you kind of mentioned foundation back in the day. Yeah, foundation by Asimov. It made me think about. We have Brett Taylor on the podcast and we talked about how certain languages have interim capabilities like Rust, is memory safe. And so that just happens. Do you see almost like a psycho History of LLMs and software engineering where it's like, hey, these models, I can predict the way software is going to look. Like everything is going to be blue and purple gradients. Right? We're kind of seeing that today. What else are these models really driving us towards? And is there a way that we can change that?
[33:37]
Greg Brockman
Well, there's definitely a psychohistory of them because to some extent these models are a product of psychohistory. Right? It's like these models have been trained on observing human thought. Right? Effectively, that's what you can think of. Take public data, learn on that and just observe. The point is to understand the rules that govern a data set. What are the underlying rules that generate the data in the first place? And that's kind of what these models grew up on. It's almost like watching a bunch of TV as an alien trying to figure out what are humans all about. And then you have this reinforcement learning phase where they actually got to try things out. And there are given positive and negative feedback, depending on how much that aligns with what the human wants. And now we put them in reality and say, okay, now try stuff, and here's a new task you've never seen before. And it uses all of that previous history to decide what to do. As an aside, it's not clear sometimes the biological analogy to humans, it's very easy to overstate it, but it's also easy to understate it. I think it is at least a useful template to think about. To some extent. That's how humans work too, right? It's like you have some sort of prehistory encoded into your DNA. You have your life experience, you have your parents who provided positive and negative rewards, and you have your experience in just trying things out in reality. And now you have to go out and use that knowledge and what do you do? And how do you predict what a person's going to do? And actually, you can predict a lot of what a person's going to do. It turns out you have a pretty good model of other people and how they'll react to something, if they'll like it, if they won't like it. And a lot of that gets baked into knowing someone's values tells you a lot about what they're likely to do and how they're likely to behave. And I think that for models, the future is not predetermined. It's not like the algorithm itself says that the model's going to have to prefer purple gradients or something. Right. But there's something in this whole process that does produce that preference. And I think one of the opportunities with models, one thing that Alec like to say is that these models are less like a human and more like a humanity. Right. That there's so many personalities embedded within them. It's almost every single personality is in there. And our goal is to elicit that personality. And some of this post training work, some of this reinforcement learning work, almost narrows down the space of those personalities to just the ones that are desirable. And I think that what that means is that we have both an opportunity to produce models that operate according to our values. Right? According to. If you don't just want the purple gradient one, you want the blue gradient, the green gradient, whatever, you can have all that in a single model, it's fine. And GPT5 itself is extremely good at instruction following. And so it actually is the most personalizable model that we've ever produced. You can have it operate according to whatever you prefer just by saying it, just by providing that instruction.
[36:25]
Alessio
The analogy I have is the Borg. There's this collective intelligence. There's always this debate between Star wars people and Star Trek people. Who has a better model of the future? And I think it's like Star Trek.
[36:35]
Swix
Well, Sam picked, he tweeted the Death Star. So you're on the Star Wars.
[36:41]
Alessio
What was that?
[36:41]
Greg Brockman
What was that? You'd have to ask them. One thing I think is very interesting about these models is that we have all these arenas now, like Ellen Marina and others, where you can actually see human preferences on top of how the models operate. And you almost have this layering of. The models were trained on human preferences. Now they're doing stuff and being judged by humans. And then we kind of use that to feedback on. Huh? Okay, yeah, maybe the purple is a little bit too much and we should change it there. And so it's almost this co evolution of the models move in a certain direction. Do humans have a certain set of preferences so then we move them in a different direction and then you kind of keep iterating to get something that's more and more useful and aligned with human values.
[37:22]
Swix
How do you do that? When the RL rewards are kind of tied to things that the humans maybe don't prefer? Like in my experience it's been like try catch. Like the models like the right try catch so that it doesn't fail. Do we need just a lot of preference data that shows them they shouldn't do that? Is there something in the RL environments that we're going to change to make the less desirable? Like I'm trying to figure out where we go from here.
[37:43]
Greg Brockman
Yeah, I think that the way that you decide or the way that you figure out where do interventions go is very multifaceted and it's very specific to the behavior. There are some things like the model's knowledge of different libraries and things like that, that's kind of baked in from the early days. But you can also teach the model that hey, don't rely on your previous knowledge. Go and look up the most up to date docs. And that's something you can kind of put at a higher level and then something like overusing try catch. That's something you can actually prompt the model for. Right. And that's something where when we train it in reinforcement learning, you can provide rewards saying like, ah, don't go in this direction. And the beautiful thing about these models is it feels like, okay, there's probably a long list of different preferences and different styles and things like that. You're going to have to give it feedback on during training if that's the way you want to go. But these models generalize the algorithms that we have generalized. And that's the beauty of deep learning. That is the true magic. Right. It's a very easy. We kind of have this whole stack now that's built up around the core of deep learning. It's like all these ways of orchestrating models and how you get feedback and all of these things, the data, et cetera, et cetera. The core magic of deep learning is its ability to generalize. And in some ways the generalization is weaker than you'd like. But I think that the same is true for these models. It's really trying to think about in order to get them to be able to operate according to different preferences and values. We just need to show that to them during training. And they are able to sort of generalize to different preferences and values that we didn't actually train against. And that's something that we've seen very consistently across different model generations.
[39:13]
Alessio
I was just envisioning this meme of like, my model doesn't generalize and we'll just make the whole world your distribution and that's how you solve everything.
[39:21]
Greg Brockman
Done.
[39:21]
Alessio
Done. Exactly. As simple as that. You just have to build the Dyson sphere along the way. One thing I wanted to touch on for, I think, last couple of topics on GPT5 before we move to OSS. You've acknowledged that there's a router which is really cool. I was also listening to your podcast with John Collison on Cheeky Pint, which is really fun format that they say that you told a story of the DOTA side that I don't think I've heard before about the beta model versus the main model and stitching it together. Is that a similar insight for GPT5's router, where you have reasoning model, non reasoning and then you just stitch it together?
[39:57]
Greg Brockman
To some extent, yes. In the multiple models. And do you put some sort of router on top of them? That specific one was for a very specific reason, which is that we had a deficiency on the first half of the game because it kept losing, right? Exactly. So there was part of the game that this specific model didn't do a good job of. There's a part of it that it did. And these models, the behavior, the domain they were operating in was simple enough. It was very easy for us to say here when you want to use one model versus the other. And to some extent what we have with GPT5 is no different. We have a reasoning model that we know is good for applications that require this intelligence. But you're okay waiting a little bit longer. We have a non reasoning model that is great for applications where you want the answer fast. Still a good answer, right? But not like deeply thought through that might have a lot of tricks to it. And then you just kind of want to put an if statement that says which of these it should be. And then sometimes too it's like, you know, if someone's run out of their credits that you want to fall back to a different model and all these things and not pushing that burden to the user is actually a really nice thing. And by the way, I do want to say model switchers are not necessarily the future. Right? They are the present. Like having a fully integrated model that just does the right thing feels Very preferable in many ways. The flip side though, is that I think that the evidence has been away from having the final form factor, the AGI itself being a single model, but instead thinking about this menagerie of models that have different strengths and weaknesses. And I think that's like a very interesting finding of the past couple years. Right. Just a direction of like, it's much easier to have a small, fast model that's less capable, but can just do a lot more, you can generate a lot more tokens from it, coupled with a much more expensive reasoning model. And if you combine those two things, you kind of get adaptive compute and that we haven't really cracked. How do you do adaptive compute within the architecture, but doing it within the orchestration of a system? It's very straightforward. And so I think you get a lot of power out of the fact that these models are composable in this way.
[41:58]
Alessio
Yeah, I want to give whoever did the model card was amazing. They even provided the big parameters to the if statement of conversation type complexity tool needs explicit intent and usage rate limit, which is kind of interesting. Any one of those you want to comment on in particular? That was interesting for debate?
[42:16]
Greg Brockman
No, I mean, I think honestly all of it is fairly what you'd expect. And I think that the core message in my mind is that at OpenAI, there are many things we've done. Right. Naming is not one of those. Having a simple surface for users to understand how to use it. Not necessarily one. Right. If you look at all the different models that we've had, how are you supposed to know which one to use? I remember my wife was using 4.0 at one point I was like, no, you need to use O3. And she's like, wait, but why the number is smaller than 4.0?
[42:49]
Alessio
Well, ship 04, then you have 4 and 04.
[42:51]
Greg Brockman
There you go. So, yeah, so. So, okay, we clearly needed to do a reset, right? A reset on complexity. And I think that us internalizing that complexity rather than pushing it to the user, that is really important. And so I think this is a first step and I think we've heard loud and clear from the community about the places where they weren't ready. Right. That we were not delivering on that simplicity for people. Right. That it should just be. It's always better to go with our choice of it rather than manually selection. And we're not quite there yet. I think that we can make the progress, but I think that ultimately our goal should be to both make sure that power users are able to have the kind of control and consistency that they're looking for, while also not forcing the broad base of people who don't want to have to think about the 4003, all that stuff to have to go to that level of detail.
[43:41]
Alessio
Yeah, awesome pricing question. We talked about that. GPT5 pricing is aggressive and very competitive, even compared to Gemini. One thing I was surprised to learn from the meetup that we had the other day was that GPT5 pricing can go much cheaper. What degree of order of magnitude are we talking? How much percent of that is just getting better infra like Stargate?
[43:59]
Greg Brockman
I think that the answer for these things is always that, okay, if you look at the history of our pricing, we have very consistently cut prices by like, I don't know the exact factor, but let's say like 10x per year.
[44:11]
Alessio
I'd say more aggressive than that.
[44:12]
Greg Brockman
Yeah, probably more aggressive than that, which is a crazy thing. And you can see it with 03, I think we did an 80% price cut and actually the usage grew such that it was like, I think in the revenue it either was neutral or positive. And it just shows you that I think there's this cost curve. The demand is extremely steep. And so it's like if you just make it more accessible and available to people, they will use way more of it. And I think that's very aligned with our mission. Right. Our goal is to ensure that AGI benefits all of humanity. Part of that is making sure that this technology is broadly distributed, that lots of people are using AI and using it to apply to things in their life and their work. And one of the things that helps us get there is by having more efficient inference, having cheaper models, all of these things. Now what unlocks it partly is having just more compute. Right now we are extremely compute limited. And so I think that if we were to cut prices a lot, it wouldn't actually increase the amount that this model's used. We also have a lot of efficiencies to gain and that's something where our teams are always working super hard to get to the next level of inference efficiency. Some of this is about improving the model architecture itself, right? That there's lots of architectural decisions that you can make and that now that we're in this world of reasoning, that it's not just about the sort of model architecture, it's also about the post training, right. It's about how long does it think for a specific task and things like that. And so there's just many, many dimensions of improvement that we have to make and that we'll, we'll keep pushing.
[45:41]
Alessio
By the way, the numbers. I have a chart for this if you ever need it. Since the day you launched GPT4, it's been a 1000x improvement in cost for the same level of intelligence.
[45:51]
Greg Brockman
That's pretty wild.
[45:53]
Swix
It's pretty good.
[45:54]
Greg Brockman
Yeah. That's like two and a half years or something like that. What else has like a three order of magnitude improvement over the course of two and a half years?
[46:02]
Swix
I don't know. Nothing.
[46:04]
Alessio
Can't think about it.
[46:05]
Swix
And it's going low. It's not even. It's like from 10,000 to like $1,000 is going to like pennies. For the GPT5 release, I did this article called Self Improving Coding Agents. So I basically asked GPT5, can you build tools for yourselves to be a better coding agent? And this is a sque lancer task. And then it does the task, it kind of fails in some ways. And then I ask it, can you improve the tools for yourself and kind of do this loop? And what I found is like, the models don't really like to use this new tool set. The built for themselves. They basically respond saying, you know, I can just do it. I don't really need the tool. And I think there's kind of like.
[46:40]
Greg Brockman
This sounds like a human.
[46:41]
Swix
Yeah, there's kind of like this feeling of like, how can they really push themselves to like improve? Do you feel like part of it is like, hey, they're just being taught to use these tools, which is like, you know, graph and like whatnot. And so it's kind of hard for them at inference time to build the tools. Or do you see this as part of that jump?
[46:59]
Greg Brockman
I think that's part of the step for sure. I think it's not like we're at zero on being able to do that. I think a lot of this is just about the training. If the model really has trained with just a specific set of tools, hasn't really been pushed to adapt to a new tool very quickly, then you shouldn't expect it to do any differently at evaluation time. But the idea of producing your own tools that make you more efficient and build up a library of those over time in a persistent way, like, that's an incredible primitive to have in your toolbox. And I think that if your goal is to be able to go and solve these incredibly hard challenges, unsolved problems, then I think you're going to need that kind of thing as a dependency.
[47:37]
Alessio
Any architectural decisions or innovations that you would like to talk about sliding window attention, the very fine grained mixture of experts which I think Deepseek popularized. Rope yarn attention sinks, Anything that I think stood out to you and the choices made for GPT oss.
[47:54]
Greg Brockman
I would say that these choices are all we have a team that's been working on different architectures, we explore different things. Something like mixture of experts is something that. It's funny, I would say that I would credit our team for the choices there, but I'd say that the picture in my mind is we wanted something that would be easy to run in these environments. And so picking things like just how sparse to go is very tied to your memory footprint and then you know how much compute you actually can use for forward pass and things like that. So I think that to some extent the architectural decisions were fairly constrained by the model sizing and the compute. We expect for them to have access to it when they're running.
[48:37]
Alessio
Yeah, I mean it's very practical engineering decisions really.
[48:41]
Greg Brockman
Yeah, yeah, I think so. And I think that the power of the model really shows. We really did use a lot of our cutting edge techniques to actually push the capabilities models further and further.
[48:51]
Alessio
I'd say I definitely detect a difference between the architecture for models designed for API use versus models designed for single machine. You know what I mean? When you have multi tenancy, when you can have batching, it's very different from single machine.
[49:06]
Greg Brockman
Very different.
[49:06]
Alessio
Yeah. I don't know if that'll ever combine, but maybe it's a menagerie model like you always say.
[49:11]
Greg Brockman
Yeah. I think it's also really interesting to think about an architecture where you have a local model that then delegates to a remote model sometimes. Right. And this can be something where you can run much faster. It's helpful for a privacy architecture perspective that just trying to decide what actually goes, what stays. And having that edge compute means that then you lose Internet connection, you're still able to do something and you can have a slower planning model. It's like this interplay between those things is very interesting.
[49:38]
Alessio
Yeah, so like a GPT5 on device where you have GTOSS here and then it routes through online if it's available. I don't know.
[49:46]
Greg Brockman
Yeah, something like that. And then you have your Codex infrastructure that has a local agent and a remote agent and that is able to seamlessly interplay between the two and then is able to do multiplayer. This is what the future is going to look like and it's going to be amazing.
[50:04]
Swix
And then you have a device always with you. I can see where things are going.
[50:09]
Greg Brockman
It all connects. Yeah.
[50:10]
Alessio
What can we say about the device? You raised it.
[50:12]
Swix
I don't want to.
[50:14]
Alessio
What can I say about the device?
[50:16]
Greg Brockman
It's got to be great.
[50:18]
Alessio
Okay. And then another political. I don't know if it's political or not. There's a lot of open models coming off from China. Why is it important for there to be American open source?
[50:29]
Greg Brockman
Another thing at a very practical level that we've thought about with open source models is that people building on our open source model are kind of building on our tech stack. Right. If you are relying on us to help improve the model, that you're relying on us to get the next breakthrough, then that means that you actually really have a dependence in both a way that's good for our business, but I think is also good for the country. Right. That you think about having an American tech stack from the models that people are running directly, but then how those are going to interface and interplay in the way that we just talked about that it actually allows us to build a whole ecosystem where people are able to have, you know, control over the parts of it that are important to them, ultimately be built on these models that reflect American values and then be able to interplay with American, hopefully chips underneath and cloud models on the back end and execution environments and all of that fitting together is something that I think it adds a lot of value and I think it allows for American leadership to really also mean that we have leadership in our values in the world.
[51:33]
Alessio
Yeah. Congrats on launching that.
[51:34]
Greg Brockman
Thank you.
[51:35]
Swix
Let's talk about engineering at OpenAI. I know there's a lot of debate about cloud code and AIDR and open code and all these different tools. How do you think about structuring the team itself that gets the highest leverage out of this? Are you changing the way you build the team from a numbers perspective, from a capabilities perspective, from a team size perspective, within the Org. Anything that you want to share?
[51:57]
Greg Brockman
Well, software engineering is definitely changing in many dimensions. There's a part of engineering that's very difficult for these models to really crack. But we're starting to see the beginnings of it happening and that that's these very core hard algorithms, right? Things like CUDA kernels are a good example of a very self contained problem that actually our models should get very good at very soon. But it's just difficult because it requires a lot of domain expertise, a lot of real abstract thinking. But again, it's not intractable. It's self contained. It really is the kind of problem that is very amenable to the technology we have. There's other problems that are very difficult in terms of architecture. How do you think about how a system should be put together and thinking about the abstractions? And again, our models are starting to get kind of good at this. But so I think what we've seen is that for most of our engineers, even our extremely good engineers, there's a lot of their work that actually maps very well to the core strengths of the models right now. And definitely for anything where it's like a language that you're not an expert in. Yeah, you definitely don't want to be writing that code yourself. You really want a model to be doing it. And then there's parts of the job that become much harder because of requires things the models don't have access to. It requires a lot of context, going and talking to people in order to make good decisions. And so I think we're not at the point yet where we really see changes in how you structure a team, because these tools exist. But I think we're at a point where it is an extreme high priority to get these models to be used in all domains that they possibly could be. And to think about how you do that well and responsibly and think about what the guardrail is, should be, and that that happens in a very practical way. And so I think a lot of what I'm seeing is like, we're in a early adopter phase that's starting to transition to a mainstream phase. And the productivity impacts of people being able to do more means we actually want more people. Right. It's like we are so limited by the ability to produce software, so limited by the ability of our team to actually clean up tech debt and go and refactor things. And if we have tools that make that 10x easier, we're going to be able to do 100x more things. And so I think that there's this incredible opportunity that is entailed by these models not being a real driver of just do the same stuff more efficiently, but be able to do way more. And that that is, I think, the overall goal.
[54:16]
Swix
Yeah. How have you changed the team's work to fit yellow lamps better? Is there a different way in which you track issues? Is there a different way in which you structure code bases?
[54:27]
Greg Brockman
So I think we're still at the early edge of this, but the thing I've seen be most successful is that you really build code bases around the strengths and weaknesses of these models and so what that means is more self contained units, have very good unit tests that run super quickly and that have good documentation that explains what this module is for. And if you do that and you kind of leave the details to the model, it works really well. And then thinking about how these things compose and making sure that you're thinking about the dependencies that you only have. These like clean AI optimized modules can only be depended on by other AI optimized modules. Then you end up with a whole system that's actually AI optimized. And so I think that we're still scratching the surface of what's possible. And the models are advancing so fast that actually what it means to work around the weaknesses of the model in six months, I think those weaknesses will be vastly shrank. So you don't want to necessarily spend all your time just overfitting to what exists today. But I think there's a lot of potential to be able to move quickly in this particular moment.
[55:28]
Alessio
One question I'm very curious about, is the value of an engineer increasing over time? Increasing over time? Well, I mean, also there's some part of our work that's being automated away. I think obviously there are very, very high signing bonuses, higher than we've ever seen in the history of our industry. Is it really the engineers that are valuable or the systems that enable them? I feel like it's kind of like a bit of both. But people are paying a lot for the engineers.
[55:55]
Greg Brockman
I mean, I think that the thing at the end of the day that is new is that we are producing technology. These models that are the most useful tools that humanity has created and that underpinning them, we are building the biggest machines that humanity has ever created. It's like at some point the dollars that go into these data centers starts to be an abstraction, right? What is $50 billion? What is $100 billion? How can you possibly internalize what that is? I think it's beyond almost the scale of human comprehension. The engineering project that we collectively, as a country, as a society, as a world are undergoing right now, right? It's like projects like the New Deal pale in comparison, the Apollo program pale in comparison to what we're doing right now. And in many ways it's as it should be, right? Like that the economic return on this technology is very large. But even more importantly, the way in which we are moving to a new economy, an AI integrated economy, an AI powered economy. And this is ultimately what our mission is about, right? Is it's like we see this change on the horizon. We want to help, we want to help steer it to be something that uplifts everyone, right? That it's this amazing opportunity almost unique in human history. And we are all fortunate, right? To be at this moment in time and to be able to be involved in some way. That to me is the backdrop to really think about this big shift that is going on at humanity scale. And it's sometimes almost you feel this cognitive dissonance because you're debugging some low level CUDA deadlock or you're worried about the purple gradient and you realize this is like the future of humanity that we're really talking about. And so when you think about engineers and who's at which company and all these things, these things matter, right? It's not just about any individual, it's about a team, right? But it's also not about any one product or any one system. It's really about the overall society, the overall economy that we are building together. And so I guess I sometimes step back and think about the big scale, but you also need to think about the micro scale. You need to think about are people happy, right? Do people feel connected to the mission? Do they feel like the work they're doing matters and those things actually turn out to be the most important things? And so what makes the headlines is not necessarily the stuff that actually most drives the people, but it is for sure like a reflection of the economic reality that people see as the potential of this technology.
[58:22]
Alessio
This connects a bit with what Noam was saying on the multi agents team, where the individual intelligences of humans, we can only do so much individually, but as civilizations we can go to the moon and build cities and build AI. And together I think we can do a lot more than we can individually.
[58:40]
Greg Brockman
We can do amazing things together, no question.
[58:43]
Swix
What do you think about the current state of AI research? Is everyone really just doing the same thing? Do you feel like every lab is a different take that is eventually going to help us converge to the right thing or just because now the dollars has gotten so big that you need to do the thing that you think is going to work?
[58:58]
Greg Brockman
I think there's a surprising amount of diversity in the field. I think sometimes it can feel like there's convergent evolution. But I think that if you really talk to people at different labs, you really realize that there's different perspectives people have. You know, one of the decisions we made early on in OpenAI was that we really wanted a set of people who were aligned in how they think, right? Because for people who have been pursuing a PhD for a long time who are, you know, sort of have their own research vision, you kind of can't tell them what to do. And so if you want people who are going to row in the same direction, it means you have to select that set of people. And that was I think, the most maybe important early decision that we made at OpenAI that helped us to achieve the things that we have. And so I think that that means that you necessarily have different vectors that you could pick. And you really see it in the taste of different labs and what they focus on, what they produce. And at OpenAI, I think we've been very much focused on how do you do the research that gets you to the next level. And even for something like GPT5 that we sort of had a lot of pressure to think about, okay, let's just do the grind of here's feedback on problems that we have on the coding side and you can pursue that grinding and get somewhere, but you also sometimes have to step back and think about how do you do the next step function, how do you do the next paradigm shift? And something like the reasoning paradigm is a good example of a time that we did that very successfully and we've done that many times over the course of OpenAI and we'll continue to do that. And so I think that the breakthroughs remain to be made. And there's such a diversity of multimodal and different ways you could generate things and all of this stuff that I think that the field is more, the field of research is more abundant than it ever has been.
[60:40]
Alessio
Yeah. And not to forget, that's like the mainline research. There's also voice, there's also image generation, video generation.
[60:47]
Greg Brockman
Yeah, yeah. Easy to forget about these things.
[60:49]
Swix
Remember Studio Ghibli was like the biggest thing in the world.
[60:51]
Greg Brockman
Exactly. It's amazing. And that's the kind of thing, by the way, that was like there's really a team of a small number of people who are really focused on that problem for multiple years. And that, that is, I think the sort of core ethos of OpenAI is to make these long term bets on problems that matter in a direction that really adds up to a cohesive whole.
[61:12]
Swix
So from the outside it's kind of hard to figure out what you're focusing on. Kind of imagen just came out of the blue almost, which was great. Got a lot of adoption. How should people think about how you prioritize versus what people should explore and build and should wait for you to Improve on.
[61:28]
Greg Brockman
Well, there's a massive possibility space in this field, right. Because neural nets, deep learning is applicable to effectively any sort of data, any sort of domain, and that we can't do everything. The core reasoning paradigm, that clearly is something we're going to keep pushing on multimodal voice things like image generation, video generation. These kinds of areas are also things that we view as very important and all kind of fit together. But there's been areas where it's just hard for us to really figure out how do we prioritize as part of the core program. Right. And we've been through times where, for example, robotics was one in 2018, where we had a great result, but we kind of realized that actually like that we can move so much faster in a different domain. Right. That actually we had this great result with the robot hand solving an unscrambled a Rubik's Cube. And that that team was bottlenecked by the fact that this robot hand, you could run it for 20 hours before its tendons would break. And so then you would have a mechanical engineer come and fix it. And that team went on to go do what became GitHub Copilot, which is obviously an amazing feat and a real accomplishment and something that they were able to move so much faster in the digital domain than in the physical one. And so I think that for us, we really try to. We have, you know, no matter how many people we hire, how many GPUs we get, we have limited bandwidth, right. That we are, you know, sort of one company, one lab that's focused on as much as we can, a coherent one problem. And so I think that you can kind of look at the set of things we're doing, and sometimes we'll do offshoots and sometimes that will be something that then becomes part of the core program, but that there's just so much possibility space for everyone. Awesome.
[63:06]
Alessio
I'd like to take a chance. We're kind of closing up. Few small little lightning questions. Just on zooming out from OpenAI, this question I got from Alessio. So why don't you take it?
[63:16]
Swix
So when you started OpenAI, you almost believed that it was too late to start an AI lab. What are things that people today think it's almost too late to do that they should be doing?
[63:27]
Greg Brockman
Well, I think it's pretty clear that connecting these models to real world application domains is extremely valuable. And I think sometimes it might feel like all the ideas are taken, but the economy is so big, every application of human endeavor is so big and so it is worthwhile and really important for people to really think about how do we get the most out of these amazing intelligences that we've created. And a lot of that is for something like healthcare. You have to really think about all the stakeholders. You have to think about how does the system work today and how do you slot these models in. Well, and I think that's across all of these domains there is so much fruit that is not yet picked.
[64:05]
Alessio
So go ahead and write the GPT wrapper.
[64:07]
Greg Brockman
Do it. But I think the thing that I would advise is to really think about domains where the value that you're producing is not necessarily just having written a better wrapper. It's really about understanding a domain and building up expertise and relationships and all.
[64:20]
Alessio
Of those things you do occasionally. Angel invest what gets your attention.
[64:25]
Greg Brockman
I actually have not angel invested for a number of years now. Yeah, it's just like everything is a distraction from OpenAI and I just like to stay laser focused.
[64:32]
Alessio
Okay, this is a time travel question. What is one post it note you want to send to 2045 Greg? So you'll be 58.
[64:40]
Greg Brockman
How's the Dyson sphere?
[64:41]
Alessio
How's the Dyson sphere? I don't know if you've actually done the math on what it takes to.
[64:46]
Greg Brockman
Do that, but yeah, I mean more seriously it's like, like, like 2045 is just so hard to imagine given how fast things are moving right now. And so I hope it'll be a world of amazing abundance and that I think at that point we really should be multi planetary and kind of almost any sci fi dream you can imagine. It's hard to deny its possibility except for things that are limited by the physical ability to move some atoms at that rate. But yeah, it's like I think I would just hope that that world is as amazing as it could be sitting here in 2025.
[65:18]
Alessio
Will we even need UBI with abundance? Because true abundance means we don't need it.
[65:23]
Greg Brockman
Well, first of all, I think that there's been a lot of debate. I remember early on in OpenAI of post AGI, will money mean anything? Right? And it's really unclear, right? If you can just talk to a computer and it'll produce anything you want. You want something that you want some physical good, you want some any sort of material item and it can just be manufactured for you instantly, effectively free. What does money mean? And the flip side is I think that there is one resource that is very clearly going to be in very hot demand which Is compute already the case? We see this within OpenAI that the researchers that have the access to the most computer are able to have the biggest projects and do more. And I think in the future, thinking about how do people get access to computer and the more compute that you have for whatever task you care about, for whatever application you care about, it will be solved more that more will happen and that I think that that question of what the compute distribution looks like will be something very important. And so I think that the question of exactly how, if you don't do work, do you survive? I think the answer will be yes, you'll have plenty of material needs met. But I think the question of can you do more? Can you have not just generate like, you know, as much, you know, like Sora movie as you want, but have like this amazing detail and like all this extra fanciness to it and have this thing go, you know, think super hard for, you know, 100 years worth of a subjective experience about what the best thing is for you Specifically, I think that there will always be more return on more compute. And so that will be something we have to really think carefully about about how that society is architected.
[66:59]
Alessio
And then this. I always find this harder. By the way, post a note to send to 2005 Greg. So 18 year olds, wow.
[67:06]
Greg Brockman
I get the time travel. How long of a note can I write?
[67:12]
Alessio
A little bit of advice to yourself. And obviously this is a proxy for everyone else, right? But address it to yourself.
[67:18]
Greg Brockman
I think the single thing that I have been most surprised about is the abundance of problem grows over time. Because I remember in, you know, 1999, 2000, reading about Silicon Valley and feeling like I've missed the boat. I was born just a little bit too late.
[67:34]
Alessio
Very common.
[67:35]
Greg Brockman
Exactly right. Just felt like all the cool problems must be solved. By the time I'm ready to go work on things, there'll be nothing left. That turned out to be totally false. Right? Like now is just the most exciting time to be in technology, to really be operating in the world because we have this amazing tool that is going to uplift and revolutionize every application, every field of human endeavor. And I think that the fact that that's something to be excited about, that is something that we can apply and there are challenges we have to work through, no question, but for the purpose of achieving this amazing outcome. And so I think that just that message of that the problem availability will grow over time rather than shrink, I think is the core thing. I wish I had sort of internalized at the moment.
[68:18]
Swix
Awesome. Thank you so much for joining us, Craig.
[68:21]
Alessio
Thank you both for your time. Thank you so much.
[68:22]
Greg Brockman
It's been great to be here.