Loading summary
Thibaut
The first time I showed it to someone, they were like, no way. This is like a fake demo. This cannot be this fast. This will change everything. Especially because it's not yet the fastest that we can actually get it to be.
Host
My experience was trying the app. I didn't really want to go back to a terminal. What I realized is, actually, GUIs are great. IDEs are just the problem. There's something that's a GUI for programming that's not an ide, and it seems like you're figuring that out, but I don't even know what that's called.
Thibaut
It's called a Codex sound.
Dan Shipper
Dan here and I want to take a second away from the episode to tell you about Granola. Granola is an AI note taker for your meanings, and I use it pretty much every day. That may sound a little bit weird or a little bit creepy, like transcribe all your meanings. Well, for me, it's actually kind of indispensable. As a leader, Every is about 20 people now, and it's really important to me that I understand how decisions get made, how I'm showing up in meetings, and how I can help my team the best way I can. Granola acts a little bit like a leadership log for me, so I can see how I've done in meetings, what situations came up in a particular week, and how I can do better next time. If you're trying to improve as a leader and scale your company, try Granola as your AI powered notepad for meetings, head to Granola AI every code every to get three months free. And now back to the episode.
Host
Thibaut, Andrew, welcome to the show.
Thibaut
Hey, thanks for having us.
Andrew
Thanks for having us.
Host
Great. Great to get to chat with you. So for people who don't know Thibaut, you are the head of Codex OpenAI. And Andrew, you are a member of the technical staff on the codex app at OpenAI and you are the people of the moment. They just ran a Super bowl commercial about Codex OpenAI did. How are you feeling?
Thibaut
Yeah, that super bowl was quite surprising. Wasn't really was.
Host
I think the core thing and I think the place I want to start this conversation is it feels like that is a strategic shift. You would expect OpenAI to have run a ChatGPT commercial during the super bowl. And maybe not. Especially if you looked at Codex's positioning three or four months ago for professional engineers, maybe not have run an ad targeted at a much broader audience. It felt like for a long time there was this divide where Codex was for professional Engineers. And if you want to do vibe coding, you do that in the ChatGPT app. It seems like that has shifted a lot over the last month or two. Can you tell me about that?
Thibaut
Yeah, I think especially we can talk about last week. Last week, on Monday, we released the Codex app immediately. We saw a ton of downloads, more than a million downloads in the first week. And then we knew that we were releasing like an extremely strong model, like 5.3 codecs on Thursday. That just made, I think, this it very visible that we're here to put incredible experiences out there. We're very committed to codecs and also agents are really starting to work and be able to create these things, even if you're a little bit less technical. I think the app really showed that it's much more inviting for people to just try it and run multiple agents. You know, with our models being like, very, very good at sort of like allowing for multitasking and being reliable for long running. Long running sessions sort of like allows you to create a lot more. So it just felt that maybe we can inspire more people to build and then show that agents are here. Right. It's like it's not. It's coming. It's going to be mainstream. Why don't you try and like create something new and inspire people? That felt like the right thing that we wanted to reinforce.
Andrew
Yeah. While we were designing and developing the app, one of our internal mandates to ourselves the whole time was that we had to make something that we love to use and that we used for like all of our work. And if we couldn't do that, then we weren't going to put this out. And this was back when we started. And I think that we surprised ourselves a lot with how fun it was and especially as we started to build this app. Before we started to build agent skills and then once we kind of paired them together, it became this really rich interactive experience where you could open the browser or you could connect to these various services. And so all of a sudden we started to feel this really connected interactive experience and wanted to share. I kind of see the ad as like a love letter to builders. Right. I have never seen a Linux CD in a Super bowl ad. And so, you know, like, that was really cool to watch.
Host
What was the impact of the ad?
Thibaut
We're still to measure that. We'll see, like, you know, how it plays out over the long term. But we saw a giant surge of traffic actually, like, remarkably like, you know, very, very quickly after 4pm, like PST, when it aired like the surge and our systems were under heavy load. So it felt kind of weird to me that people are watching the super bowl and then going and installing the app, and they just trying it out right there and then. But it happened, and a lot of people reached out and saying they were really inspired by it and just wanted to build afterwards, which is what we're aiming for.
Host
Timing back. I still want to talk a little bit about the strategic shifts. So Codex app moving from, or Codex in general, moving from something that is really for professional developers, moving to something that has a more. A broader audience and maybe moving some of the vibe coding from ChatGPT into the Codex app. Tell me about that.
Thibaut
I don't think we're trying to move vibe coding from ChatGPT into the Codex app. We're very much. Two things are happening. One, we're pushing the frontier on professional software development. Five, three codecs beats every single other model, like, on the top benchmarks for coding. So it is a very, very capable model. And it's also, like, at the speed and cost, it's like, it is a top performer out there. I think the app. The second thing is the app does make things more accessible, and so it does appeal to a wider audience. But internally, we're also seeing the app. It is very much used within research, within our own team. Like, the entire Codex team uses the app. It makes people more productive. So it's like very much leaning in into how we think agents are best used. The patterns that we were seeing that were making people very productive here at the company and outside. And then it's just sort of going all in on that. It does happen that. At the same time also, it's like, hey, it's just delegation is finally here. It works, and it's much more accessible. And we're going to try and see how we can package that and actually ship this to a much, much wider audience. But that might not be the Codex app. When you use that all the days,
Andrew
you just build in there, 99% of the code that I write is using the Codex app.
Thibaut
I live in here now.
Andrew
Yeah.
Host
Okay. Well, that's actually really interesting. I definitely want to talk about the app in particular, but I want to go back to the thing you just said, which is maybe, if I'm reading you right, you're kind of like, we're pushing the frontier. We're seeing lots of people who are maybe broader than just, like, senior engineers using this. However, the overall idea of, like, who is doing what in which app, like, maybe you haven't totally figured out yet. And it's. It's not as clean of a line as like, no longer vibe coding in ChatGPT or Really Vibe coding in Codex. It's like you can do it in both, but we haven't figured out exactly like which thing you're going to do where.
Thibaut
Yeah, I think Codex is like the most powerful experience right there. So you should be fairly technical so that you understand, hey, code is actually getting written. It's going to get executed on your machine by default. It's executed in the sandbox, but you should probably be able to read code in order to use codecs to its fullest. We will bring a similar experience to ChatGPT at some point, which will have different properties in terms of the sandbox and how concepts are represented. Maybe we won't be showing, hey, this scary terminal command thing is running and you should probably approve it. It's like, of course you shouldn't do that to someone who is not technical. And Codex is really there to appeal to just all coders, builders, technical people who are close, either technical themselves or technical adjacent, like data science, these kinds of things.
Andrew
Yeah. And if you use the Codex app for any amount of time, you can see the inspirations from chat. The layout's very similar. We auto name your conversations, we've got contextual actions, but it's pretty clean. The composer looks very similar and you'll see some of that inspiration back in chat for other types of things. But we still believe that when we set out to make something that was for the professional software developer and for us, that it deserved a dedicated experience that could really showcase the power of the models and the way that the models could change the development life cycle. And so we made something very tailored to that. And we've had a lot of success internally with research teams, with product teams, and so we'll look beyond. But I think we're really happy with where we've ended up on the kind of tailored, the tailored approach to this.
Host
Can you tell me about the decision to invest in a GUI over a 2e? I feel like twoies are so hot right now and obviously you have one for Codex already and you could have said, okay, we're going to double down and just make the terminal experience even better than it is now and really invest in that versus, okay, we're going to go. I think making a GUI is a little bit of a counterintuitive or counter narrative thing to do. So tell me about that decision process.
Thibaut
I think it wasn't counterintuitive it's more. Maybe it's not mainstream. And so we experiment with a lot of different approaches. I very much consider that we're still in the experimentation phase and we're responsible primarily for two things is building the most powerful entity out there that's capable of coding. And then increasingly this will become a multi agent system and. And it will become more and more capable and you will have to figure out how to steer and supervise its outcome and its behavior. That's one thing that we're building. And then we're also building how you even interact with this. It's like, what is the optimal way to have visibility into what this very capable entity or system of entities is doing? How do you steer them, how do you supervise them? And so we're very much still experimenting with what that is. It's like, sure, you can do it in the tui at some point it starts to feel very limiting, especially on multimodal actually. The models can draw little diagrams and generate images, or you can talk over it using voice. Maybe you have many of them going in parallel and you start to lose track. So we felt like we needed to start experimenting with something else. And it is only when we saw it become super popular internally, where we're like, we have to ship this externally. It's like, this has come to a point where it's too good to sort of like just keep it to ourselves. I mean, that was like the journey that you went, you know, you were now building in the app. Although, like, when did you start building in the app? That was actually like fairly quickly. Like when the app was building itself.
Andrew
That. That was. Yeah, that was pretty quickly. And yeah, because I was starting with the TUI and with the IDE extension. And I think that my goal personally was how can I get to fully building the app on the app as fast as possible. Right? It's like, it's really easy when building this stuff to slip into the mode of like, oh, this will be good for somebody. Like, somebody will love this. A certain type of like, they will love this. Right? So we really wanted to get quickly to like, I want to be able to build the app on the app. I want it to be able to run itself with skills. I want it to click around on the app that it spawned. And I want this to be like part of my workflow as soon as possible. And I still use the TUI sometimes when I want to fire something quick. But I think that there is something about the flexibility of controlling the UI and being able to have some panes be persistent and others be ephemeral. And we shipped voice with the app, so you can prompt with voice. We have mermaid diagrams in the app, we have full image rendering. So all of those things, I think, are like the tip of the iceberg on what we want to do with a dedicated ui. And it's pretty simple, and it's simple intentionally, but I think we're going to do a lot with dynamic stuff there.
Thibaut
I mean, yeah, the ceiling is just much higher.
Host
Yeah, it's interesting. My experience was trying the app. I didn't really want to go back to a terminal. And I had been coding mostly in Claude code and some codecs in the terminal for the last several months before that. And I think what I realized is actually, GUIs are great. IDEs are just the problem. And there's something that's a GUI for programming that's not an ide. And it seems like you're kind of in that, figuring that out, but I don't even know what that's called.
Thibaut
It's called a Codex app.
Andrew
There was a moment during the development of this where everybody and their mother was forking the same ide and we looked at each other and we were like, hey, should we have done a fork of VS code as well? Very seriously. I remember exactly which day it was. And I think. I don't know if. I don't know if I would say that IDEs are the problem, but I go back to, like, the truck analogy sometimes with them, which is that, like, I will open an IDE here and there. Like, I opened one today. It was something very specific that I wanted to do that. I don't even remember what it was. But then I closed it and I went back to using the Codex app. And I think that there is something there with the Codex app being a great daily driver. And occasionally you need an ide, or occasionally you need a really complex terminal setup, but that this should be your home base, it should be your command center for the agents that are running and a place that you can come back to and track all this stuff. There are a lot of design decisions around do we allow freeform panels like an ide. And we came to the conclusion that a lot of what these models are great at is knowing what is needed in the moment for what type of task. And so we wanted to have kind of more full control over what was able to show at what point. Right. And you can see that in plan mode, where you're not necessarily getting a composer, you're getting a really quick way to answer questions. You can, you know, and you've got your plan and you can edit your plan. And I think we only want to do more with that as we go.
Thibaut
It seems like you were surprised that you didn't want to go back to the Turi after.
Host
I was, yeah.
Thibaut
Is that
Andrew
we like a. Like. Greg did an interview and Greg was like, I am a power user. I thought I would never leave the terminal.
Thibaut
Greg lives in Emacs.
Andrew
Are you like a. I was a
Host
TUI power user for like six months, starting with when cloud code first got really good. And I was like, holy shit, this is so much better than being in Cursor or Windsurf or whatever. And now I feel like I speed ran my 2 era and I'm back in GUIs. Like, I'm kind of thudding back and forth right now, but I sort of see the light where, especially if you have a bunch of them going at once, the affordances of GUI are just like, make it much nicer.
Thibaut
Yeah. And there's a lot more to come there. And it was a very intentional thing for us. Like, we sort of see, you know, agents will act and are already acting on like, much more than code. Right. And so they need to be a companion to like every single app and every single thing that you can do on your computer. It's like we integrate with like Linear Slack. And of course you also need to be able to read the code and like produce code, but maybe it can do like a deploys reverse cell as well. Like, are you going to do all these things from your ide? That would sort of like, feel very odd. And so it's like this command center for your agent. We optimize the entire experience around that, you know, around the idea that you have a very capable, intelligent entity that you're like controlling, steering and supervising. And, you know, you never need to like, sort of like go in there and do the things yourself. It's like, no, the thing is very capable of like, you know, being delegated to. Like, I think, you know, when. When you accept that that is like, you know, what we're headed towards. And like, you know, with 5:2, Codex is like, you know, it just feels like, you know, we're getting like almost there, right? Then you're like, well, you know, it's the same with you, right? You know, like when I talk to you about like a feature idea or something, it's just like, you know, you know, you go and you get inspired and you go and do it. You're just like, you know, I don't suddenly jump into your ide and like, you know, just go and like implement it.
Andrew
You could, yeah.
Thibaut
I mean I think you would find it disturbing. Right? It's like, I mean so that's the way that you will, you know, everyone will work with agents. It's like you just talk to them.
Host
How has your, your workflow changed with 5.3 codecs versus 5.2?
Thibaut
I was surprised at how much faster it was and sort of like I have to adjust on. I had been optimizing a lot more for like long running, sort of like multitasking and you know, I sort of like had an expectation of like okay, this type of task will take like, you know, 10, 15 minutes. I'm going to kick like, you know, for like, you know, different things and then come back. So I'm able to like, you know, maybe do a little bit less multitasking and like, you know, be more in the flow. So that you know, felt really good. And then it just feels now very satisfying as well. Like you know, to kick off like automations with it using skills. It's like it's, it's a more generally capable model. It's like less sort of like super focused on code. Right. And so I find it like much more reliable. So going through Twitter replies and summarizing the important themes or filing bugs in linear and then coming back to that and using automation so that things are implemented daily feels like it's much more robust for these things. But you're really the superpower user here and there's like the kind of stuff he does is just like, it's like I have very vanilla usage of codecs compared to Andrew.
Andrew
No, I mean, well said. I had a series that I had intentions to run this for a while and I only ran it for three days on X Twitter which was that I was setting up a prompt to basically add a feature to the Codex app. Like some random non shippable feature to the Codex app. I had this long prompt like about the quality bar that we had to do and once I switched it to 5.3 Codex the results got actually much more interesting. Like we did a Subway Surfers panel on the right was one of them like a little Minecraft UI for the sub agents was another one that we did that I don't know, maybe we'll ship it.
Thibaut
I was like get back to work. Yeah, yeah. Why do we have Minecraft in the critics app now?
Andrew
Yeah, but gotta explore. No, I mean 5.3 codecs. Like, it's, it's, it's neat, it's fast, it's capable, it's multimodal.
Host
What are. Thibaut says you have a lot of cool use cases. Like, what are, what are the, like, more interesting ways that you're using the Codex app that maybe people should try but haven't thought of yet?
Thibaut
Andrew came up with Automations and I think that's sort of like shift the way that, you know, you're thinking about these things when you can just like sort of like hop it in the background, you know, on a specific trigger at a specific time and then, you know, just, you can sort of like program it yourself. Yeah, you're using that a lot.
Andrew
There are a lot of things that I use the app for that are a little bit outside of just like coding features. I keep it to, I use it to keep my PRs mergeable with automations. And so it'll resolve merge conflicts, it'll keep them updated, it will fix like build issues so that basically like as soon as they're ready to go, like, they're ready to go. There's no, like, oh, hey, somebody merged a big thing and there's a conflict now. So I do that.
Host
So you said like. So at what point is the, is the automation trigger? Because I thought the automation triggers like at a certain time schedule, but it sounds like there are other triggers I didn't know about.
Andrew
I, Yeah, I. We're looking at a lot of things. I have it right now just on a time schedule and I use our GitHub skill and some internal skills for our CI and that that runs hourly or every two hours and kind of just cleans everything up.
Host
I see. So it's like it just looks through all, you know, there are any changes on main and it just looks through any PRs and just like make sure that they're all up to date so that whenever you're ready to go, it's never like, that's actually. That's good. I like that.
Andrew
Yeah, it's actually really helpful. It's surprisingly helpful. I have one that every day at like 9am I get sent all of the contributions that have merged to the Codex app over the last day. And so it'll do like a nice report of who merged what and it will. I have a group it by theme so I can be like, all right, like three people worked on this part of the composer. Two people worked on Automations. Like, here's what happens so that I can at least be like Knowledgeable what's happening because things get chaotic right before launch.
Thibaut
And yeah, one automation I have is I run it like multiple times a day and it's like pick a random file and find and fix like a subtle bug. And then it's kind of funny because it actually does pick a random file. So it will run Python Rand and it will find a random file and it will start from there. And so it's like every time it explores a new one.
Host
Has it caught anything interesting?
Thibaut
Oh yeah. We catch it's often latent bugs that are not triggering actually on the critical path, but they're actually bugs and then it's trivial to fix it, merge it um, takes very little time and it's a thing that, you know, I would have never found myself found like an issue and like constraint sampling like the other day.
Host
Yeah, that's really cool. Do you have other, other automations that are worth sharing?
Thibaut
Let's see.
Andrew
I feel like I have 60 that are running at all times, like some for testing and some for real. Some of the members on the team really like this one that looks at the PRs that you've done and past day or so and quietly cleans up any bugs you shipped and kind of like looks at a few of the observability platforms to see and like tries to basically ship a fix before anyone's noticed that you shipped a bug.
Host
That's cool.
Thibaut
Everyone is not coding related, which is like marketing research. It runs daily and it's just sort of like it's prompted with like a specific skill to do like deep marketing research which I've sort of tuned over time and then that just goes and searches the web on any sort of new things that sort of came up in terms of just how users are perceiving, talking about codecs. And then I just received that little report and it always makes for an interesting read. We can just go on. It's like these are just examples that we do rely on. They run. Yeah.
Host
Do you have any particular skills that you guys like that are beyond the normal kind of. I have a GitHub skill and that kind of stuff.
Thibaut
I love Andrew's Yeet skill which it just takes the change and then does the commit, does the pr, writes the draft, puts it in draft and publishes a PR with a PR title and body.
Andrew
Yeah, it's very satisfying.
Thibaut
Yeah, it just does everything that one is like, makes definitely makes people like productive. What are the top used ones for you?
Andrew
Image Gen is a cool one.
Thibaut
Yeah.
Andrew
For both like silly automation purposes, like hey, make me an image that characterizes my last day of work. Not my last day of work, my previous day.
Thibaut
Yes, yes, yes. Andrew.
Andrew
I, you know, the. The ImageGen skill was actually really cool for I. I used the Codex app to make a book for my daughters. And so I had, I like, you know, put together this prompt for teaching it about, like, a script that I wanted written. So, like, 24 pages, here are my daughter's ages. Here's like, where we've lived in the past. Like, we were in Boston and moved to New York and then moved over here. And then I said, like, after that, we went through that. I agreed on the script. And then we went through and I said, like, all right, now it's time to use the Image Gen skill. And it made like, it prompted for every page in the book based on the script it prompted for the image, and then it kind of put them all together and used the PDF skill to put together the book's PDF and then I printed it. And so we've got like a super custom book that, you know, I read to my kids, and it's really cool.
Thibaut
It's just this awesome thing when you can combine the intelligence of the agent and then it works in a programmatic way by using skills, and then you can just combine them in novel ways. I think the PDF and image gen1 is a common combo that we see.
Host
It feels like the Codex model. Obviously, it's got faster, which makes it much more usable, and it also feels a little more opus y. Like it's a little more. Has a little more emotional intelligence, but it still has a little bit of that. Like, it does exactly what you say thing in a way that is a little. Can be annoying. How are you guys thinking about how you shape the way the model feels and which way you're pushing it?
Thibaut
It's something that we obsess over. So we. We definitely want the model to excel at coding and be really good at instruction following at the same time. When we optimize a little bit too much in that direction, it can over index on specific words or misunderstand the intent in ways that humans wouldn't. Sometimes I will just have a typo and then typo actually find its way into the file. And I'm like, obviously I didn't mean the typo. I meant this name of this class. So that's something that we're definitely continuing to push on. But the thing that we're pushing on the most right now is really efficiency, speed, and then also what we now refer to as personalities, how supportive is it? And we understand that not everybody has the same preferences there. The previous default was definitely super blunt, pragmatic personality. Now we've also introduced a more supportive, friendly personality, and you can just, like, big between those. And I think for things that don't have, like, sort of like a universal, like, accepted, you know, thing that, you know, everybody that, you know should just use is like, you know, we're probably going to introduce, like, some way for you to just make it your own. Right, you know, you should feel like you have your own little personal codex that, you know, works in exactly the way that you want it to work. Do you use the friendly or the pragmatic one?
Andrew
Pragmatic.
Thibaut
Pragmatic, yeah. Okay. I also use pragmatic.
Andrew
Yeah.
Host
Interesting. I think you guys recently put out a model that is so fucking fast. I was testing it before it came out, and I was just like, I can't really keep up with this thing. So I'm curious how that changes how you think about what is now possible with coding with a model like this and also the affordances that you need in order to manage models that are so quick effectively.
Andrew
Yeah, the first time we used this model in the app, we had kind of that same thing happen where all of a sudden there was just, like, this wall of text, and we were at the bottom of the scroll, and we were immediately like, all right, we need to smooth this thing out coming in. And so we actually do slow it down ever so slightly, just so that you can see the words come in, like, a little bit smoother.
Host
So funny.
Andrew
It's like a really funny problem. But this thing has been super fun, and I think what I'm most excited about is what sort of capabilities we can start to add to the app that are really, really dynamic that we couldn't with a model that wasn't this fast. So, yes, this model is going to allow you to iterate really, really quickly, but it also opens up a lot of new opportunities to how you code and how you interact with the codecs app.
Thibaut
The first time I showed the very first prototype, when we hooked everything up. And obviously, the model is powered by Cerebras, and we've talked about the partnership there, and we're very excited to put the first model that we're serving through that out there. It's, you know, obviously, like, still, like, very early. It's like, literally the first time we hook it all up, and we're just, like, so excited that we want to share it. But the first Time I showed it to someone, they were like, no way. This is like a fake, a fake demo. It's like, you know, this is not real, like this cannot be this fast. And then they tried like a few prompts, they were just like, this is like, oh, I literally cannot keep up. It's like this is insane. And yeah, I think this will change everything. Especially because it's not yet the fastest that we can actually get it to be with the preview, we're putting it out quite early. We're actually going to layer a number of optimizations on top of it which should be able to make it maybe 2 to 3x faster than the experience that you have experienced. So that's going to change things. And we're thinking about this also from a point of view of delegation. We think this model has a huge role to play as part of a system of multi agent systems and as a way to speed up maybe the slower, more intelligent agent as well. So we're going to be experimenting in that way.
Host
And do you expect the same hardware speed ups on the more intelligent agents to come out soon?
Thibaut
A lot of the things that we worked on were interesting. So like distributed systems and like infra problems that we uncovered because we were able to sample from the model at unprecedented speeds. Right. And then if you're getting tokens back this fast, you need to go and optimize the entire set of bottlenecks that you sort of like uncover on the critical path of serving. All of those benefit. The current they benefit like GPT5, 3 codecs and all future models. And there's one thing that we've been doing as well, which I'm sure we're going to put in a more detailed blog post at some point which is we wrote the entire server stack to be based on websockets and a persistent connection and to do things a lot more incrementally and statefully and that decreases the overall latency across all models. We haven't shipped it by default yet but it's, you know, it is something that, you know, we are making the default for this new like super fast model and then we're also going to enable like on the other models and like it makes things, it decreases like overall turn latency by like something like 30, 40%. We can look into the exact numbers.
Host
Like yeah, what are the most surprising things that you've seen using the model internally in terms of like what, what a speed speed up like this enables?
Thibaut
It just allows you to be super, super in the flow and you're almost just in real time sculpting the experience or the code. It's just a very different feel to it. It's very unsettling at first. And then once you get into it, it's very hard to go back to any other model. That's like the feedback that we've seen. That's what I have felt myself. Uh, and so it's. It's like this very. It takes like five minutes to adapt and then. And then you sort of like, know, okay, it's like, this is how I'm going to use this thing. Yeah.
Andrew
I also don't think that we've poked at the full extent of what we could do with it.
Thibaut
Yeah, it's true.
Andrew
It's very early. We haven't had it for very long.
Thibaut
Yeah. Someone on the team, like Channing, was just sewing, like, oh, yeah, it's so fast. And it can actually, like, play pong, you know, not very well. But the model is able to react to things almost in real time.
Andrew
You start to see how it might replace some deterministic steps. We have in the Codex app a set of git actions. And as everybody knows with Git, certain configuration of things or certain states that you can be in can make it really hard to run those without a ton of error handling and, like, all sorts of, like, error messages and guidance. And it's really hard to create a good git experience, which is why, like, nobody ever has. But if you have a model that's as almost as fast as running these scripts, then you can imagine a world where these things turn into skills or something like that. And you can have your operations run a little bit differently with some, like, some intelligence, and not have the same latency that you have today when you're asking it to go track something down the code base. Right. You can kind of like vaguely gesture and be like, hey, like, send this up and have that be fast enough for a. For a button.
Thibaut
What I'm very excited about is, like, when it's going to come together with. One thing that we shipped with 5.3 codecs as well is like this thing that we call, like mid turn steering, where you start with your prompt, it's like, it got to work, and then you send another prompt while it's still working. And it adapts in real time as well. It will just sort of receive that message, acknowledge it, and then continue its work. If you start to think about, okay, what would this look like with voice? And then with a model that is as fast as the One that we just shipped, then that's like a whole other experience that we would be very excited to bring, you know, hopefully very
Host
quickly, because you can easily interrupt as you're.
Thibaut
Yeah. If you're just talking and engaging with like, you know, lateral language and then doing the midterm steers and then the, you know, the implementation happens like almost instantly because of the speed is like, it becomes like a very pleasant thing to use. Like right now you can sort of emulate it with like voice dictation and then send it and midterm steering and then know, watch the model implement and it's like a very cool thing. I think we're going to have a step change in that experience when we just really just polish it.
Host
If speed as a bottleneck is like close to being solved, what do you think is the next bottleneck? What is the next limit on making the thing you want?
Thibaut
The bottleneck that is very apparent is like, you know, how fast can you verify that things are correct? So we can generate code faster than ever before. We can implement entire features. I saw someone just based on a description of the Codex app. If you synthesize that into a plan just based on screenshots, the models are very much capable of reproducing 95% of the features and just rebuilding the app from scratch. Now, is it going to be bug free? Is everything implemented to perfection in the same way that the actual app is? That takes a lot of time still for a human to go and click and verify and make sure that the designs are consistent. There's no bugs here or there. The settings panel, when you click that button, it actually does the thing that you expect. I think verification, know, definitely becomes a bottleneck. Like we have people on the team, like, complain, you know, like, there's too much code to review. It's like, you know, that's what we're trying to solve for. I mean, you. You complain about that, I complain about that.
Andrew
There's so much code to review now, both like, like on your own machine and like from another peer. It's. It's like we're gonna have to figure that out.
Thibaut
Yeah, you're already reviewing. You're reviewing the code the first time because the agent is just presenting it to you. And then you have to review, you know, the code produced by your peers. You know, we're like, there's like these two rounds of reviews and.
Andrew
Yeah, yeah, I mean, this is something that we're working on. A lot of us still do have to review code and we want you Know, we're taking a look at what that experience should look like with the model involved. Right. We've got a review mode in the Codex app that works really nicely and kind of annotates your diffs on the side with findings and stylistic things and lots to do.
Thibaut
Yeah, it's one thing I'm sort of also excited about, making the models faster. And then this one that we just put out, which is mind blowingly fast, you can also use it, you can imagine using it in a way to understand code, understand features, helping you with code review, helping you understand the code that appear wrote. And it's like much more pleasant because this is something that you want to do. You want to be there in the flow. It's something that has to be synchronous. It's not something that you delegate. You cannot delegate understanding. It's like you're trying to get to understand something. And so speed there is a real advantage. So it helps offset as well the fact that models are producing more and more code is speed helps you understand this code faster as well.
Host
Yeah, I mean, I definitely think I found this already with this new model is speed, especially for end to end testing, is faster. Because if you're having it do end to end testing, like manual integration testing, often there's like a toast that pops up. And it pops up for like a second. And if the model's not fast, it's not going to get it. And it seems like it's better for that because the cycle times are much shorter. And I definitely find this too. It's like I can produce so much code, but when I see a PR come in or when I make a pr, my first question is, is there evidence that you've actually tested this and this actually works, not just unit tests, like you've gone through it end to end.
Thibaut
How do you handle this?
Andrew
I mean, I've seen a lot of PRs that I have the same question about. It's like it's so easy to code things now, right? Yeah. I mean, we have gotten the Codex app to be pretty good at through some skills that we have of running itself, clicking around, screenshotting itself for evidence, and uploading it to the pr. There's, there's like a lot that's pretty interesting there, especially when we make this like more async or when, you know, the models get really fast at this stuff. Like I don't know exactly what it looks like yet, but there is a lot there around like, hey, here's a bug fix. This is exactly like what it looked like when it was happening. And here's exactly what it looks like now with the same exact click path. And so like, maybe that's the turning point, that code review becomes less important when it's like you can verify that part instead. So you have to kind of like do less through the code as a proxy. But there's definitely more to explore there.
Host
Last couple questions. I'm curious, what have you guys learned from anthropic and cloud code? And how do you think about your positioning in the market versus them? How do you think about the differences?
Thibaut
I think they were first to put something out there and that was interesting to us because we had been working on similar ideas for a bit. Um, but I think our models were a little bit at the time not ready. Like, you know, they were not like reliable, like on long horizon tasks. Like, you know, they were not able to like do like reliable tool calls and, you know, stay on topic. And so as soon as like, we started to really invest on that and you know, especially with GPT, 5 is like, you know, we were like, okay, the models are there, we know how to make them even better 5 to even better long horizon reliability, long context understanding. And what we were seeing is that anthropic was sort of like to us, losing a little bit of steam when it came to the model. And we were in this fortunate position where the way that we run Codex is we've got product, we've got engineering, but we've also got research. And we just all work together and sit together and solve problems together. And it's like a highly creative space where at times we decide to solve problems in the product, in the harness. But at times we also, we're like, hey, how can we actually improve the model? And let's just talk about it and idea it together. And then research will come and be like, hey, we've got this breakthrough that we're sitting on. It's just like, would this be sort of like something we can ship? And then it was just sort of like, get excited about that. One of the examples was we had a lot of complaints on compaction. Compaction was like something that people felt whenever you would hit compaction, people would complain. It's like it's losing too much context. And so we sort of solved that end to end. And we decided to do end to end RL training and introduce compaction within research and then make the model itself very familiar with the concept of compaction and producing optimal delegating to itself across time. And Once we had that and we had solved it at the model level, the hardness problem became so much easier because it was just like, oh, just let the model do it, and it's going to be very reliable. So through that and through that collaboration, it just felt like the momentum has been very strong and we're so able to improve models and ship a model roughly on a monthly cadence. And then we took a bit of a different bet and a different approach with the Codex app, which turned out to be an awesome thing to just try and do, is not just force ourselves, you know, and, like, trying to cram everything in 2-2E. I mean, it was like. It was like a great challenge. Right. You know, you were like, I'm. You know, it's like, let's build an app. Like, just like, where do I get started? And then, you know, just like, you just got obsessed by it.
Andrew
It's hard not to.
Thibaut
Yeah. I mean, it's like, how was it to just, like, you know, build something that was quite contrarian, I suppose.
Andrew
Yeah. I mean, I remember you and I talking about whether or not, like, early on we were like, we don't know if we'll ship this.
Thibaut
Yeah.
Andrew
Like, we're. We will try it out. We'll see if we can get there with something that we love and see if we can get. I remember saying, like, let's get some PMF internally. Let's. Let's get everybody at OpenAI to want to use this thing without being forced to use it. Let's see. Let's see if we can do it. Right. We did, and it was, like, adopted very quickly. I mean, the minute it was barely usable, the research folks, like, put dev boxes on it. Right. Like, which was like, this crazy hack at the time.
Thibaut
Yes, yes.
Andrew
But now they use it, like, for everything.
Thibaut
Yeah, yeah. It's like, including and training, like, five through codecs. And so I think I feel really good about having hit the point where, like, you know, like, everyone technical at the company, like, almost everyone technical at the company, like, uses codecs, but, like, the people who use it the most are actually building codecs and building the models. And so we're just able to improve things at crazy, crazy speeds, and there's no signs of it slowing down.
Host
Amazing. Well, I'm excited for what you ship next. Thank you guys for your time. I really appreciate it.
Thibaut
Thank you. Thank you for having us.
Andrew
Thanks.
Podcast Outro Announcer
Oh, my gosh, folks, you absolutely, positively have to smash that, like, button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge. Bombs About Chat GPT Every episode is a roller coaster of emotions, insights and laughter that will leave you on the edge edge of your seat, craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor, hit like Smash, subscribe and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.
Host: Dan Shipper
Guests: Thibaut (Head of Codex, OpenAI), Andrew (Technical Staff, Codex App, OpenAI)
Date: February 18, 2026
This episode dives deep into OpenAI’s Codex app and its significance in revolutionizing the way developers—and even non-developers—interact with code. Dan Shipper interviews Thibaut and Andrew, two key leaders behind Codex, exploring strategic shifts, product decisions, user impact, and the breakthrough in speed that the latest Codex model (5.3) introduces. The conversation touches on the evolving landscape of programming tools, multi-agent capabilities, UX design, and the challenges and opportunities this new era brings.
Super Bowl Ad & Mainstream Outreach
Broader Target Group
Why a GUI?
Breaking from Traditional IDEs
Adoption by Power Users
Drastic Speed Improvements
Speed as a Game Changer
Technological Infrastructure
Evolving Workflow for Developers
Automations & Novel Use Cases
As code and feature generation accelerates, bottlenecks shift to:
Speed Also Helps Understanding
For anyone involved in software development, AI tooling, or product strategy, this episode is a goldmine of practical insights and firsthand experiences from the cutting edge of AI-powered software creation.