wavePod

Get Wave AI

How OpenAI's Codex Team Builds with Codex (43 Min) | Alex & Romain - Behind the Craft | Wave AI Podcast Notes

Back to Behind the Craft

How OpenAI's Codex Team Builds with Codex (43 Min) | Alex & Romain

Behind the Craft

Sun Apr 05 2026

Summary

Behind the Craft Podcast: "How OpenAI's Codex Team Builds with Codex" April 5, 2026
Guests: Alex (Product Lead, OpenAI Codex), Romain (Developer Experience Lead, OpenAI Codex)
Host: Peter Yang
Duration: 43 minutes

Episode Overview

In this fast-paced, insight-packed episode, Peter Yang sits down with Alex and Romain from OpenAI's Codex team to explore how the team builds, ships, and leverages AI to redefine product creation. The conversation covers live demos, internal product philosophies, the future of product roles, community-driven development, and the inner workings of Codex, the leading coding agent.

Key Discussion Points & Insights

1. Live Demos: Building with Codex in Real Time

Speed and Capability:
- Romain demonstrates how Codex can instantly build new app features or edit existing ones with simple prompts. E.g., adding a new screen to an iOS app or live-editing a 2D game (01:05–03:42).
- Codex's "Spark" mode enables near-instant iterations, described as "insane speed."
- Quote: "On the left side you have GPT 5.4...and on the right side you have Codex Spark and boom, you have like 1200 a second on average. This is insane speed." — Romain (02:06)
Workflow Integration:
- Codex allows users to pop out chat conversations, brainstorm, and delegate tasks directly from the chat window.

2. Product Building Philosophy

Almost No Specs—Just Do the Work:
- The team avoids traditional specs; documentation is minimal and highly focused (10 bullet points at most).
- Quote: "We write very, very few specs on the Codex team...the docs that we do write...tend to be incredibly short." — Alex (04:41)
Decentralized Decision-Making:
- Those closest to the problem make decisions. Specs are only used when coordination or complex alignment is required.

3. How Codex Accelerates Product Collaboration

Prompting & Plan Mode:
- Codex facilitates planning through conversational prompts; it'll suggest features or improvements based on project context (05:31–07:07).
- Alex often uses Codex to reason through vague ideas before sharing them with engineers.
Designers & Non-Engineers Shipping More:
- Designers on the team now write more code (with Codex’s help) than engineers did previously.
- Quote: "The designers on the Codex team write more code now than was written by an engineer like six months ago. They're absolutely goated." — Alex (07:55)

4. App Simplicity vs. Power User Features

Design Principles:
- Codex app is intentionally simple to use yet highly configurable for advanced users.
- Cutting-edge users often contribute by forking and modifying open-source code.
- Notable: Community feedback drives new features; some advanced functionalities ("skills" and "automations") are discovered organically.
- Quote: "Whenever we're building a feature, we start getting complaints on Twitter...that's like an awesome part of the product." — Alex (11:10)
Discovery-Driven UX:
- Power users discover skills/features over time, making the app feel "like playing a game" (12:50–13:56).

5. Evolution and "Vibe Shifts" in Codex

Key Milestones:
- First big leap: GPT5 and the IDE/CLI integrations (August; 20–30x growth).
- Second: GP5.2 and beyond enabled true "agentic delegation" (multiple agents working in tandem, December/January).
- Quote: "There's two vibe shifts in Codex history...the second shift was around December, January, where we actually could get back to this vision of delegating to models." — Alex (14:46)

6. Product Roadmap and Strategy

No Medium-Term Planning:
- OpenAI plans either short-term (up to 8 weeks) or sets long-term, "vibes"-based directions.
- Quote: "At OpenAI, you either plan near term or long term, but you never plan medium term. It’s just too difficult." — Alex (15:39)
Dissolving the Workspace Paradigm:
- Codex app aims to break from the single-folder workspace, letting users smoothly interact with multiple agents, locally or in the cloud (16:07–17:20).

7. How the Codex Team Works

Team Growth & Structure:
- Team rapidly scaled from ~8 to 50–100+ people (21:54).
"Pirate Ship" Mentality:
- Minimal hierarchy, high autonomy, and little cross-functional alignment required.
- Most features originate from engineers’ and designers’ own needs and user feedback.
- Quote: "We just kind of like view ourselves as like intentionally a bit of a pirate ship like team..." — Alex (24:40)

8. Community & Feedback-Driven Development

Ambassadors and Grassroots Engagement:
- Codex has "ambassadors" around the world running events and hackathons (28:30).
- Codex team members are always online, gathering rapid feedback and iterating launches with the community.
Open Source as a Strategic Lever:
- Transparency and real-user involvement are core advantages.
- Quote: "Because we're open source, we kind of just found ourselves being incredibly open about everything we do. And I think the community really rewards it." — Alex (27:23)

9. The Blurring of Traditional Product Roles

Less Need for PMs, More for Builders:
- Most teams don't need many PMs; roles are blending (32:11–33:11).
- Codex (and tools like it) enable designers and PMs to code and ship, and engineers to participate in design and strategy.
- Quote: "All of the lines between career ladders are blurring, and we're all builders altogether." — Romain (32:14)
Interest and Agency Above All:
- The most vital qualities are curiosity, agency, and being hands-on.
- "The fewer people you need in a room to do anything, just the better that thing goes, the more pure every decision is." — Alex (33:43)
- PM is now a "fill-in-the-gaps" position rather than a traditional leadership role.
Labels Lose Meaning:
- As everyone becomes more "opinionatedly themselves," the boundaries between engineer, designer, and PM blur even further (36:41–38:06).

10. Hiring Philosophy

Qualities Sought:
- High agency, technical skill, and strong product sensibility.
- Preference for people who ship, are active in community spaces/Socials, and have side projects.
- "When someone DMs me...for me, it's like, is there a link? If there's a link, I always click it...I'm much less likely to read...[their] CV...than, like, their ideas and what they built." — Alex (42:14)
- College and credentials are unimportant; shown work and initiative matter.

Notable Quotes & Memorable Moments

On Codex’s Impact:
"The designers on the Codex team write more code now than was written by an engineer like six months ago. They're absolutely goated."
— Alex (07:55)
On Simplicity vs. Power:
"We are really careful about, like, what the core primitives of what we're building are... It's not just a vibe coded thing, we're really thoughtful."
— Alex (11:10)
On PM Role:
"I don't actually view PM as a good leadership position. I view it as a fill in the gaps position."
— Alex (38:30)
On Team Structure:
"We view ourselves as like intentionally a bit of a pirate ship like team...there's not too much alignment going on there."
— Alex (24:40)
On Hiring:
"People who do things is like literally the most important thing."
— Alex (39:28)

Timestamps for Key Segments

Live product demo & workflow – 01:05–03:42
Spec writing & decision-making – 04:41–05:21
Planning and brainstorming with Codex – 05:31–07:07
Role of designers and PMs in shipping code – 07:55–09:25
Skills, Automations, and app design philosophy – 09:51–13:56
Major product “vibe shifts” – 14:46–15:21
How planning & roadmaps happen at OpenAI – 15:39–17:20
Codex team structure & daily workflow – 21:54–24:38
Community-driven development & open source – 27:21–28:58
The blurring of product roles & talent stack – 32:11–33:43
Qualities sought when hiring for Codex – 39:28–42:56

Final Thoughts

This episode offers an authentic, behind-the-scenes look at how the OpenAI Codex team challenges conventional software development with high agency, minimal process, and deep integration of AI tools. The hosts' insightful, candid discussion sheds light on how future-forward teams can work faster and smarter—with traditional roles melting away in favor of empowered, multidisciplinary builder culture.

Loading summary...

Transcript

A (0:00)

We write very, very few specs on the Codex team. We're talking like 10 bullets or something and that's it. The designers on the Codex team write more code now than was written by an engineer like six months ago.

B (0:10)

And I made a quick prompt to create a little 2D game, maybe add

C (0:15)

some more decorations, houses, trees and stuff.

B (0:17)

Could you add some more decoration like trees? And there we go. We have already new trees appearing for a small change.

A (0:23)

It's often faster to send a PR than it is to communicate to someone and get them to prioritize that task when they have 10,000 other things to do. I don't actually view PM as a goog leadership position. I view it as a fill in the gaps position. I think the fewer people you need in a room to do anything, just the better that thing goes, the more pure every decision is.

C (0:42)

Okay, welcome everyone. I'm really excited to host today. Alex and Roman from OpenAI's Codex team. They're going to demo how they build new features of Codex, what Codex is capable of, and also talk about how the Codex team ships nonstop. So welcome guys.

B (0:57)

Thank you. Thank you for having us.

A (0:59)

Yeah, excited to chat.

C (1:00)

So do you guys want to just quickly show what kind of things code can actually build in one shot?

B (1:05)

Yeah, for sure. I mean, let me share my screen to give you a sense. And so there's so much I could show, but maybe a quick glimpse into. For instance, Here is an iOS app I've been building and if I want to actually create a new feature for this app, I can simply dictate and voiceover something that says, hey, can you add a new screen for NASA's Artemis mission return to the moon? And I can send that prompt with GPT 5.4. And sure enough, the model will like create a new screen for this particular iPhone app. So here we have this app. It's pretty cool. And it's currently building this new feature, so we should see that in a moment. But we also have the Codec Spark model, which can really help you ideate and iterate in just a few seconds on. On anything. In fact, let me show you, like what it's working over here, the difference of what it does to have a Spark model responding so quickly. On the left side you have GPT 5.4, right? I'm going to give it a head start. And on the right side you have Codec Spark and boom, you have like 1200 a second on average. This is insane speed. And so when you want to build something, let's say a Game or right before we started this conversation, I actually went to the Codex app and I made a quick prompt to create behind the Crossing, a little like 2D game where I can start building. What also I love using with the Codex app when I'm in the flow is taking the Codex app like this and pop the conversation out on top of the screen, right? And so this way now what I can do is like if I'm like actually working on this game, I can keep iterating and have more ideas. I don't know what we want to do. Do you have an idea, Peter, for what you would like to change on this game?

A (7:07)

It's funny, I do this a lot and drive them. Yeah, like often I'll. Okay, so there's like various kinds of changes, right? There's like the super simple change, you just go straight in. You just prompt it. Yeah, right. Then there's like sort of a medium complexity change where maybe you'll like reason about how to do it or ask for a specific plan. But something that I actually do is kind of like similar to this where if I have like a vague idea, I might just go into Codex and just ask it to start like thinking about how it might solve a problem. I don't even have a feature in mind. And then like, you know, it'll go explore and like ask me some questions. And like, in my case, I often don't end up even using that thing because maybe this is quite a complex change that. There's a digression here, but what code do PMs write is an interesting thing to get back to, but maybe a complex change. I don't actually want to be on the hook for landing and maintaining that change, but I'll still go through the motions of a plan mode and exploring it and then I just have a better mental model of what we need to do. And then that becomes something that not the plan itself, but just the thinking becomes something that I share with an engineer, I feel like. So to take that digression briefly, the designers on the Codex team, we like to more code now than was written by an engineer six months ago. They're absolutely goated. But obviously the tool is a massive part of this. And the team was making fun of me for not landing that many PRs in the last year. I'm not going to give you the number, but I'm like, yeah, it should be more. Especially when you consider how many of those were very small tweaks. But I feel like we're at a point now where like it's not about can you generate the code? Like the agent is amazing, you can delegate tasks to it. It starts to be a point of like, what are you deciding to do? That's actually super important. Like, are we aligned on what. What this thing is becoming? And then on the other side of It. It's like, how are we making sure the thing is really high quality? Like, you know, like some folks will say, proudly say, like, the entire app is vibe coded. Like, in the case of Codex, like, the vast majority of code was generated an agent, but we still spend a lot of care and attention, like, thinking about the system and making sure it's really high quality. And so that's why, for instance, if there's a really complicated feature, I often will make sure there's like, a more robust stable owner to own it. And I don't. I don't think you want. Part of the value of a PM is they can be like, super distracted and they go around. And so you don't necessarily want PMs owning these systems.

A (11:10)

there's something that's really interesting about building in this space is that developers love just automating tools for themselves, building tools themselves, automating parts of the work. And so I feel like a really important part of the product is that it's like super configurable, right? And so, like, for us, Codex, like the harness is open source. Like, you can go deep in, like whenever we're building a feature, we start getting complaints on Twitter that the feature, which by the way is like not enabled in prod, is like broken because that people are going in and like changing the code themselves or forking it to like get these new features working. But for me, that's like an awesome part of the product, right? And what that means is like, the cutting edge of your users are just absolutely living in the future with us and pulling us into that future. On the other hand, though, if you only build for that, you end up with this thing that's nearly impossible to understand. And you should spend all day on Twitter, like you were saying. And so we kind of have this view of like, we are really careful about, like, what the core primitives of what we're building are like. That's the place where the stuff will be written down. And it's not just like a vibe coded thing. It's like, we're really thoughtful, like, okay, how do we mostly let the product be almost invisible, get out of the way of the model and just let the model. And every time the model gets better, just do more and more then from there. How do we package this in as configurable a way as possible for power users so they can figure out what it is? For instance, there's an implementation of subagents that's out in the wild right now, and people are using it and experimenting and we're learning a ton from them. Even though we don't actually trigger that proactively in the product. It's just something that users can learn about and go use. And then we learn from how people are using it and then from there we think about, okay, now how do we make that super simple for everyone else? So like, the Codex app is actually an example of this, where around the Time of like, I would say like GPT GPT 5.2 Codex in December. All of a sudden, like it was like incremental steady model progress, but we just kind of cleared this point where you could start delegating way longer tasks to the model and it would just like one shot it anyways. Yeah. And so we started to see like people were already tmuxing like, or, you know, for anyone who doesn't know what tmuxing people are already like running many parallels in terminal. But we started seeing like crazy like things on social media. Like this is one picture of like Peter Steinberger with like, you know, the creative openclaw with like, I don't know, like 18 terminals across eight, like three monitors. So we started seeing people like using codecs in this very advanced way. We were very excited. We kept making sure the delegation worked well in the basic product, like cli. But then we were like, okay, like Maybe the top 1% of engineers are going to work that way. How do we make this feel really intuitive? And so then we got to the Codex app, which you launch. It just feels super simple. It's just like a chat. It'll do work. But then you start discovering, oh, there's a sidebar. Oh, I can run multiple tasks. Oh, it's like really easy for me to click between them. Okay. Now I'm like being really effective myself. And then it's like, ah, there's a skills tab. Let me like go into here. And so we try to like make it so it's almost like playing a game. You're just like discovering what's next.

A (15:39)

Okay, so it's like, neither. And I got some really good advice from a researcher here called Andre, and his advice for me was that at OpenAI, you either plan near term or long term, but you never plan medium term. It's just too difficult. So near term is like, up to eight weeks from now, eight weeks being the absolute maximum. What is a concrete thing that you can, like, motivate a team to, like, rally together around and get done. And this is something that we're really good at, opening eyes, like, kind of like rallying a team around, like a thing that we want to do. Yeah. The other thing you can do is you can kind of have a vibe that's like, you know, like a year from now, we're going to have models that are way smarter. They're going to be able to do, like, you know, I'm rewinding back a little bit in time now. You know, you can be thinking, because now, like, what I'm about to say is, like, obvious, and it's obviously less than a year from now, but you might be like, yeah, we're going to have models and we're not going to want to lend them our computer to do work, because then we can only do one thing at a time. We're going to want, like, infinitely many models and they're just going to be doing work independently, like validating their own work, maybe even deploying the code themselves and monitoring it themselves. And we shouldn't even have to prompt them necessarily. And so you kind of think ahead to, like, this Kind of vibe. Right. And the in between thing is just kind of awkward. So the in between thing is like a product roadmap. We just, we basically don't really have those. We have the combination of like a sort of long term direction and like, things that we think bring us in that direction. So, for instance, in the case of the Codex app, like, one of the strategic goals that we had was to dissociate ourselves from a specific workspace. Okay, so that's a bit abstract. What I mean is that, like, if you're using an IDE like VSCode, which is my favorite idea, you open VS code to a specific workspace.

A (17:21)

A specific checkout of the code. Yeah, a specific folder. Even if you're using git work trees, you can only open it to one git work tree at a time. You basically can only work on one thing at a time. And the same is true for a CLI as well. Because we know we have this vision of we want people to be working with agents that they've delegated to in the cloud that are just working independently. We know we need to get to a point where it feels really natural to be talking to multiple agents at a time, or even just one agent that's orchestrating multiple agents for you. However, we've also learned that if you start in cloud, it can be quite hard for the developer to get value because your tools aren't there. You've got to do environment setup. It's a little bit hard to get partial credit for a task because maybe if the model goes halfway, you need to jump in and course correct or just poke at things. So we're like, okay, we need a local experience that is separated from a specific folder, but yet feels super intuitive to work with folders on your computer. And so when we started the app, we had a bunch of this, like, vibes thinking up here, like esoteric vibes thinking. And then we had a bunch of like, prototypes that random engineers had built that were just like, I wish we had an app. And it was like this or that or the other. And there was actually a hack week where like multiple independent people built different versions of apps. You might have even built one, I don't remember. Um, and so the project, when it got started, the only thing that really needed to be written down was why we thought it was a good idea to build an app. Like, there was no specific spec for the app. And, you know, eventually we generated one by, like, through building. But really it was like, quite contentious actually. Like, should we Even be building an app. Like, the IDE extension is super popular. Should we just focus on that and improve the quality there? What about CLI? Like, feels like CLIs are a thing and then if we are building an app, like, what is the point of building an app and where should we go? So that's kind of how these things start.

A (22:16)

Okay, so in my case, I was thinking about this recently because I realized that I don't know how to answer that question. And I think what I realized is that I have these, like, different modes that I operate in. And, you know, this is not. This is not advice. This is just me, but I think I have a mode. Like before, for example, we were shipping the app, which is just like straight up execution, you know, obsessing over quality, making sure we aren't like, we're looking around all the corners and, like, landing every little bit of thing. And that mode is like spending a lot of time in Codex, actually, like, both to, like, because, you know, you can. We are not, you know, we tend to use Codex a lot to, like, understand what's happening. Like, I use Codex a ton to understand, like, what is happening in Slack. Like, what is the feedback we're getting. I'll have Codex just go and like summarize that. Follow up to linear. So there's like a lot of the, like just understanding the state of quality using codecs. Then there's a lot of using codecs to understand what the, like just things about the code and then using codecs to make changes. Because nowadays it's like for a small change, that's like not building a new system, which again I try to avoid, but like, you know, taking care of existing systems, it's like often faster to send a PR that is good and you've tested than it is to like communicate to someone and get them to prioritize that task when they have like 10,000 other things to do. Because we're aiming to launch an app in like two weeks. Yeah, yeah. So there is that. And then you know, obviously there's a lot of human side of just like cheerleading, rallying, but also being a critic of what we're building. So that is one mode that I've noticed and actually you can tell if I'm in that mode if I'm on Twitter a lot. As we approach a launch, I tend to get more on Twitter. And then there is this other mode which is like where for example, like now it's like quite top of mind for me that we are at a stage where we have these amazing models. Like GPT 5.4 is incredible. We also have this app experience that is even more popular than we anticipated and we now have it on all platforms, including Windows. And so now in my mind I'm like, okay, it's time to really get back to cloud and invest more in that. And so when we enter these kinds of phases, I spend much more time thinking about what to do and understanding what is the state of things. And so that's kind of like a coordinationy mode where actually I am spending less time in Codex. Like I tend to be using Codex more for communication and less for writing code. So I have at least those two modes. There are probably more. Yeah.

A (33:11)

Okay, so I think. I don't think I've. I'm trying to remember, like, what have I said? I feel like it's somewhere on the Internet. I said that. I think it's a red flag if a startup has a PM when it's like less than like 20 engineers or something. Maybe. Maybe I said that. I think, like, kind of like what you said, like, all these roles are blurring together. Right? Like, a designer can do more engineering, an engineer can do more design, a PM can do more building. But, you know, also engineers, often they need to be focused, right? So a lot of why they aren't, like, I don't know, triaging tasks or doing some other kind of like the project management side of PMING might be because they just need to spend time coding. But now that that's really easy. And you can just ask an agent like Codex to like, go, like, analyze the feedback and prioritize, you have more time. And so I think everyone's able to do everyone else's jobs. And like, Scott Belsky has this idea of like collapsing the talent stack. Yeah, I like that idea. I think it is happening. I think the fewer people you need in a room to do anything, just the better that thing goes, the more pure every decision is. So then the question is like, well, what is. What is left for PMs? And I think that there are many PMs who should actually convert roles, right? Like if you're a PM who kind of just like always wanted to be an engineer, but maybe you just like you were very good at managing people, but you were like not that good at engineering. Like maybe now you should become an engineering manager, you know, and with a coding agent, like that's fine and maybe that's just a cleaner role for you. I think there's an analogous version where like a different PM might just want to be a designer now, you know, just be closer to building. But I think ultimately what it comes down to is interest. I think interest in agency are like the most fundamental qualities that remain important for humans in a world with AGI. And so for me that's kind of what I end up thinking about. Like if you fundamentally are more interested in writing code and like you just did PM work because there was like someone needed to do it, now you should be delete yourself and become an engineer and just do the same thing from an engineering standpoint. Same for design. But if you are like fundamentally like most interested in like spending a lot of time with users, even if it takes you away from building, right. Or like trying to look around corners and understand where the market is going, etc. And, and if you are on a large enough team where there's already enough engineers, then I think maybe there's still room for a PM there. But yeah, I think it really comes down to like, what are you most interested in? Okay. And maybe I'll add one thing which is like, I still think every problem needs a human that's accountable for the problem area, but I just don't think that that human has to be a pm.