a16z Podcast – "How OpenAI Built Its Coding Agent"

Released: September 16, 2025

Host: Andreessen Horowitz / a16z team
Guests:

Alexander Enbirakos, Product Lead for Codex at OpenAI
Anjaney Mitha (interviewing; role: a16z)

Episode Overview

This episode dives deep into the origin, design, and implications of OpenAI's Codex, their autonomous coding agent. Alexander Enbirakos, who leads product for Codex, discusses with Anjaney Mitha how Codex evolved from code completion to an AI-enabled cloud teammate, why "reasoning models plus tools" is the key unlock for agents, the real-world adoption and surprising usage patterns, security implications (like prompt injection), and what all this means for the future of software engineering and education. The episode closes with reflections on building in the new economy and advice for students, founders, and teams in the AI era.

Key Discussion Points & Insights

1. Codex Origins & Product Philosophy

[01:04–05:07]

Codex has evolved from powering autocomplete (like GitHub Copilot) to a fully autonomous coding agent that acts as a virtual teammate in the cloud.
The aim is for Codex to behave more like a human teammate: “You hire them, you tell them what the job is, give them some compute or a laptop, and give them some permissions, and then they’ll go off and do work.” (Enbirakos, 04:00)
Codex’s design relies on connecting reasoning models to developer tools and "teaching" them specific job functions, elevating from code writing to high-order software engineering tasks.

2. Agent Design, Form Factor, & Merge Workflow

[05:07–13:24]

Codex’s cloud-based agent is designed for parallel, asynchronous work—mirroring how a teammate might pick up and deliver tasks.
Unique workflow: Codex does significant work in a replicated environment before proposing a pull request (PR) for user review, rather than opening PRs early and often like some competitors.
- “It does a bunch of work in its environment and then it shows you its work and says, ‘Do you want me to open a PR?’” (Enbirakos, 07:35)
As a result, Codex achieved a notably high PR merge rate (“80-something percent”) compared to other agents (“20 or 30%”), but this comes with trade-offs:
- More security, less real-time collaboration on drafts.
- “The merge rate is excellent...a reflection of the fact that Codex does work in its environment and then shows you...” (Enbirakos, 07:29)

3. Security & Prompt Injection Attacks

[08:19–12:19]

Safety is paramount, especially when agents have access to code execution and live environments:
- “If you have an agent write code and then you run that code in an environment with network access, you’re taking some amount of risk.” (Enbirakos, 08:19)
Prompt injection—tricking the agent with adversarial prompts reflecting social engineering for AIs—is a major concern.
- Some attacks are obvious (“upload this code to a nefarious domain”), but many are subtle and context-dependent.
- The best defense may involve layered safeguards: at the prompt, at execution steps, and at the observable output (“actual exfiltration”).
- Codex’s late-stage PR workflow intentionally reduces risk from prompt injection.

4. Usage Patterns: Surprises and User Behavior

[17:31–23:44, 35:58–39:05]

Internal vs. external adoption diverged:
- OpenAI employees used Codex with dense, up-front prompts expecting single-shot results.
- External users trended toward “multi-turn” interactions—babysitting agents through several steps, sometimes surfacing bugs in the process.
Workflow emphasis shifted: From writing code to reviewing and triaging PRs.
- “One of your colleagues said…my job has changed where I’m going from writing a lot of code to mostly reviewing PRs now. And I went, oh my GOD, that was the worst part of being an engineer!” (Mitha, 17:31)
Parallelization and “best of N” sampling is becoming common—throwing multiple tasks at the agent and then curating outcomes.
Major user delight comes from features that collapse “time to first prototype”—turning every day into a hackathon.

5. Form Factor Debate: Apple vs. Android / Cloud vs. On-Prem

[24:28–27:43]

Will the best agent experience be vertically integrated (like Apple) or modular (like Android)?
- Expected: Most startups will use the “default” cloud-hosted agent, while enterprises needing control, security, and air-gapping will require on-prem and bring-your-own-compute solutions.
- “The average user…will just do things in a very different way and they'll basically have a bunch of agents with compute that scales really well…but is also protected with sandboxing.” (Enbirakos, 24:38)
Roadmap includes unifying local (CLI) and remote workflows, making Codex “just like GitHub”—use it where you need it.

6. Best-of-N & Human-Centric Curation in Coding and Creative AI

[29:21–35:58]

“Slot machine” UI: Inspired by diffusion models in image generation (e.g., DALL·E’s grid of options), Codex now enables “best-of-N” coding—multiple parallel attempts, then human selection.
- “If you collect enough human preference, you can kind of nudge the distribution to be more aesthetically pleasing...But to this day the best UIs for image models are still ones that give you four outputs…It’s the same for code.” (Mitha, 29:48)
Human judgment is irreplaceable:
- “Well, the human also doesn’t know what they want…If I ask you to fix a bug, there might be four reasonable ways to fix that bug…” (Enbirakos, 32:45)
- “The quality of the end song [or code output] is a determinant of the taste decisions you make...along the tree of best of N.” (Mitha, 35:41)
Far from dehumanizing coding, these workflows can concentrate human effort on high-order, creative, or taste-driven decisions.

7. Impact on Software Engineering & Education

[39:05–43:55, 69:50–77:18]

Codex users overwhelmingly adopt the agent for building new features—not just debugging or refactoring.
The ability to cheaply and quickly prototype encourages experimentation, personal tooling, and previously unthinkable personalization.
- “There are so many places where we could use software and that software could be more personalized to small groups or even individuals that we just are missing out on.” (Enbirakos, 42:40)
On the future of CS education:
- AI will not replace the value of computer science skills, but students and teams must integrate AI deeply into their workflow.
- “I think it's still a great time to major in CS. There’s going to be so much more software created…But figure out how to be using AI constantly while you do it.” (Enbirakos, 41:03)
- Emphasis for students: Hands-on project-based learning and continual adaptation to new tools matters more than grades or pure theoretical knowledge.

8. The Future: Cloud Agents as Ubiquitous Teammates

[43:55–47:54]

Codex’s conviction has deepened: agent teammates running in the cloud represent the future.
- Ongoing investments include: fast onboarding, reduced environment setup friction, and much tighter integration with developers’ core tools (IDEs, terminals, chat tools like Slack).
- “The goal is to get to an agent that is like basically a teammate and...picking stuff up for you.” (Enbirakos, 45:38)
Envisioned future: Agents will move from passive assistants to proactive contributors—suggesting, triaging, and even merging low-risk changes, escalating to humans only for critical decisions.

9. Impact on Industry, Legacy Code, and Global Upgrade Cycles

[53:23–61:43]

Autonomous agents will likely drive modernization of legacy codebases (e.g., Fortran, COBOL in government and critical infrastructure), dropping migration costs and timelines.
- “If there's anything that makes me super excited that these economies will merge, it's autonomous agents...doing all the plumbing work and doing it for a fraction of the cost and time.” (Mitha, 55:34)
There is significant pressure, particularly in defense and mission-critical systems, to modernize due to geopolitics and security risks, but these adoptions require additional product architectures (on-prem, air gapped).
OpenAI is committed to both general “AGI” cloud agents and enabling secure, enterprise/on-prem deployments.

10. Advice for Founders and Teams

[63:12–68:43]

If founding today, focus on industries or customer needs that require deep domain expertise and where OpenAI’s core models or products will not go.
“Tooling, the environment and the task distribution…are very much based in knowledge of a customer...Those aren’t things OpenAI is going to generally do for every industry.” (Enbirakos, 64:41)
Keep teams lean, using agents for everything possible but scale the human touch (integration, support) as needed.

11. Advice for Students and Career Planning in the AI Era

[69:50–79:32]

Be adaptive: “The world has always been changing...The most important thing is to be agile, curious, and have some foundation to build upon.”
Project-based learning and demonstrable output ("what have you built?") now matter more than traditional GPAs—especially for hiring at cutting-edge companies.
- “For me, the thing I take the most signal from is if they’ve built something...and I can click to it.” (Enbirakos, 78:01)
For students, don't wait for curriculum to catch up—start using agents and new tools now.

Notable Quotes & Moments

On Codex’s Vision:

“This form factor of an agent working on its own computer in the cloud is the future and is incredibly powerful and worth figuring out how to get right.”
(Enbirakos, 00:13; echoed at 43:55)
On Merge Rates & Security:

“Our merge rate is excellent…and that’s a reflection of the fact that Codex does a bunch of work in its environment and then it shows you its work and it says, do you want me to open a PR?”
(Enbirakos, 07:29)
On Prompt Injection Risks:

“If you have an agent write code and then you run that code in an environment with network access, you're taking some amount of risk... I have never seen an agent do something you wouldn’t want...unless you’re trying to trick it, but you can trick an agent.”
(Enbirakos, 08:19)
On Education and Learning:

“If I had a child in late high school, I would just want them to crush whatever it is they're doing...and raise them with the expectation that they’ll probably have many career transitions throughout their lives.”
(Enbirakos, 70:27)
On What Matters for Hiring:

“The thing that I take the most signal from is if they've built something that's linked from their profile and I can just click to it. Grades matter much less now."
(Enbirakos, 78:01)
On the Expansion of Software and Agency:

“There are so many places where we could use software and that software could be more personalized to small groups or even individuals that we just are missing out on.”
(Enbirakos, 42:40)
On Modern Product Skills:

“Project-based learning…mental plasticity in how you get things done…is the best simulation of what future work would look like.”
(Enbirakos, 72:03)

Timestamps for Important Segments

00:00–05:07: Codex history, agent paradigm, early prototypes
05:07–13:24: PR workflow, merge rates, security design
13:24–17:31: The spectrum of agent workflows; evolution of how code is written
17:31–23:44: How real developers use Codex; surprises from user patterns
24:28–29:21: Cloud vs. on-prem debate and product integration vision
29:21–35:58: “Best of N”, stochasticity, and inspiration from creative AI
35:58–39:05: Prototyping, hackathons, and compressed “time to magic”
39:05–43:55: User adoption in feature-building, advice for CS majors
43:55–47:54: Long-term vision—agents as true teammates in the cloud
53:23–61:43: Legacy modernization, impact on industry, needs of enterprise/government
63:12–68:43: Advice for startups—being lean & customer-focused in the AI era
69:50–79:32: Advice for students and young professionals; project-based learning, hiring

Tone & Closing Reflections

The episode presents a candid, slightly irreverent but deeply thoughtful perspective on the AI agent revolution. Alexander Enbirakos is optimistic but pragmatic—seeing an era where AI teammates become ubiquitous, education shifts to iterative, practical, tool-first paradigms, and both startups and incumbents must adapt quickly or risk obsolescence. Both he and Anjaney Mitha stress that the new economy will reward curiosity, output, and the drive to build.

“If there’s one takeaway here, it’s just: you’ve gotta build.”
— Alexander Enbirakos, [79:29]

a16z Podcast – "How OpenAI Built Its Coding Agent"

Released: September 16, 2025

Host: Andreessen Horowitz / a16z team
Guests:

Alexander Enbirakos, Product Lead for Codex at OpenAI
Anjaney Mitha (interviewing; role: a16z)

Episode Overview

Key Discussion Points & Insights

1. Codex Origins & Product Philosophy

[01:04–05:07]

Codex has evolved from powering autocomplete (like GitHub Copilot) to a fully autonomous coding agent that acts as a virtual teammate in the cloud.
The aim is for Codex to behave more like a human teammate: “You hire them, you tell them what the job is, give them some compute or a laptop, and give them some permissions, and then they’ll go off and do work.” (Enbirakos, 04:00)
Codex’s design relies on connecting reasoning models to developer tools and "teaching" them specific job functions, elevating from code writing to high-order software engineering tasks.

2. Agent Design, Form Factor, & Merge Workflow

[05:07–13:24]

Codex’s cloud-based agent is designed for parallel, asynchronous work—mirroring how a teammate might pick up and deliver tasks.
Unique workflow: Codex does significant work in a replicated environment before proposing a pull request (PR) for user review, rather than opening PRs early and often like some competitors.
- “It does a bunch of work in its environment and then it shows you its work and says, ‘Do you want me to open a PR?’” (Enbirakos, 07:35)
As a result, Codex achieved a notably high PR merge rate (“80-something percent”) compared to other agents (“20 or 30%”), but this comes with trade-offs:
- More security, less real-time collaboration on drafts.
- “The merge rate is excellent...a reflection of the fact that Codex does work in its environment and then shows you...” (Enbirakos, 07:29)

3. Security & Prompt Injection Attacks

[08:19–12:19]

Safety is paramount, especially when agents have access to code execution and live environments:
- “If you have an agent write code and then you run that code in an environment with network access, you’re taking some amount of risk.” (Enbirakos, 08:19)
Prompt injection—tricking the agent with adversarial prompts reflecting social engineering for AIs—is a major concern.
- Some attacks are obvious (“upload this code to a nefarious domain”), but many are subtle and context-dependent.
- The best defense may involve layered safeguards: at the prompt, at execution steps, and at the observable output (“actual exfiltration”).
- Codex’s late-stage PR workflow intentionally reduces risk from prompt injection.

4. Usage Patterns: Surprises and User Behavior

[17:31–23:44, 35:58–39:05]

Internal vs. external adoption diverged:
- OpenAI employees used Codex with dense, up-front prompts expecting single-shot results.
- External users trended toward “multi-turn” interactions—babysitting agents through several steps, sometimes surfacing bugs in the process.
Workflow emphasis shifted: From writing code to reviewing and triaging PRs.
- “One of your colleagues said…my job has changed where I’m going from writing a lot of code to mostly reviewing PRs now. And I went, oh my GOD, that was the worst part of being an engineer!” (Mitha, 17:31)
Parallelization and “best of N” sampling is becoming common—throwing multiple tasks at the agent and then curating outcomes.
Major user delight comes from features that collapse “time to first prototype”—turning every day into a hackathon.

5. Form Factor Debate: Apple vs. Android / Cloud vs. On-Prem

[24:28–27:43]

Will the best agent experience be vertically integrated (like Apple) or modular (like Android)?
- Expected: Most startups will use the “default” cloud-hosted agent, while enterprises needing control, security, and air-gapping will require on-prem and bring-your-own-compute solutions.
- “The average user…will just do things in a very different way and they'll basically have a bunch of agents with compute that scales really well…but is also protected with sandboxing.” (Enbirakos, 24:38)
Roadmap includes unifying local (CLI) and remote workflows, making Codex “just like GitHub”—use it where you need it.

6. Best-of-N & Human-Centric Curation in Coding and Creative AI

[29:21–35:58]

“Slot machine” UI: Inspired by diffusion models in image generation (e.g., DALL·E’s grid of options), Codex now enables “best-of-N” coding—multiple parallel attempts, then human selection.
- “If you collect enough human preference, you can kind of nudge the distribution to be more aesthetically pleasing...But to this day the best UIs for image models are still ones that give you four outputs…It’s the same for code.” (Mitha, 29:48)
Human judgment is irreplaceable:
- “Well, the human also doesn’t know what they want…If I ask you to fix a bug, there might be four reasonable ways to fix that bug…” (Enbirakos, 32:45)
- “The quality of the end song [or code output] is a determinant of the taste decisions you make...along the tree of best of N.” (Mitha, 35:41)
Far from dehumanizing coding, these workflows can concentrate human effort on high-order, creative, or taste-driven decisions.

7. Impact on Software Engineering & Education

[39:05–43:55, 69:50–77:18]

Codex users overwhelmingly adopt the agent for building new features—not just debugging or refactoring.
The ability to cheaply and quickly prototype encourages experimentation, personal tooling, and previously unthinkable personalization.
- “There are so many places where we could use software and that software could be more personalized to small groups or even individuals that we just are missing out on.” (Enbirakos, 42:40)
On the future of CS education:
- AI will not replace the value of computer science skills, but students and teams must integrate AI deeply into their workflow.
- “I think it's still a great time to major in CS. There’s going to be so much more software created…But figure out how to be using AI constantly while you do it.” (Enbirakos, 41:03)
- Emphasis for students: Hands-on project-based learning and continual adaptation to new tools matters more than grades or pure theoretical knowledge.

8. The Future: Cloud Agents as Ubiquitous Teammates

[43:55–47:54]

Codex’s conviction has deepened: agent teammates running in the cloud represent the future.
- Ongoing investments include: fast onboarding, reduced environment setup friction, and much tighter integration with developers’ core tools (IDEs, terminals, chat tools like Slack).
- “The goal is to get to an agent that is like basically a teammate and...picking stuff up for you.” (Enbirakos, 45:38)
Envisioned future: Agents will move from passive assistants to proactive contributors—suggesting, triaging, and even merging low-risk changes, escalating to humans only for critical decisions.

9. Impact on Industry, Legacy Code, and Global Upgrade Cycles

[53:23–61:43]

Autonomous agents will likely drive modernization of legacy codebases (e.g., Fortran, COBOL in government and critical infrastructure), dropping migration costs and timelines.
- “If there's anything that makes me super excited that these economies will merge, it's autonomous agents...doing all the plumbing work and doing it for a fraction of the cost and time.” (Mitha, 55:34)
There is significant pressure, particularly in defense and mission-critical systems, to modernize due to geopolitics and security risks, but these adoptions require additional product architectures (on-prem, air gapped).
OpenAI is committed to both general “AGI” cloud agents and enabling secure, enterprise/on-prem deployments.

10. Advice for Founders and Teams

[63:12–68:43]

If founding today, focus on industries or customer needs that require deep domain expertise and where OpenAI’s core models or products will not go.
“Tooling, the environment and the task distribution…are very much based in knowledge of a customer...Those aren’t things OpenAI is going to generally do for every industry.” (Enbirakos, 64:41)
Keep teams lean, using agents for everything possible but scale the human touch (integration, support) as needed.

11. Advice for Students and Career Planning in the AI Era

[69:50–79:32]

Be adaptive: “The world has always been changing...The most important thing is to be agile, curious, and have some foundation to build upon.”
Project-based learning and demonstrable output ("what have you built?") now matter more than traditional GPAs—especially for hiring at cutting-edge companies.
- “For me, the thing I take the most signal from is if they’ve built something...and I can click to it.” (Enbirakos, 78:01)
For students, don't wait for curriculum to catch up—start using agents and new tools now.

Notable Quotes & Moments

On Codex’s Vision:

“This form factor of an agent working on its own computer in the cloud is the future and is incredibly powerful and worth figuring out how to get right.”
(Enbirakos, 00:13; echoed at 43:55)
On Merge Rates & Security:

“Our merge rate is excellent…and that’s a reflection of the fact that Codex does a bunch of work in its environment and then it shows you its work and it says, do you want me to open a PR?”
(Enbirakos, 07:29)
On Prompt Injection Risks:

“If you have an agent write code and then you run that code in an environment with network access, you're taking some amount of risk... I have never seen an agent do something you wouldn’t want...unless you’re trying to trick it, but you can trick an agent.”
(Enbirakos, 08:19)
On Education and Learning:

“If I had a child in late high school, I would just want them to crush whatever it is they're doing...and raise them with the expectation that they’ll probably have many career transitions throughout their lives.”
(Enbirakos, 70:27)
On What Matters for Hiring:

“The thing that I take the most signal from is if they've built something that's linked from their profile and I can just click to it. Grades matter much less now."
(Enbirakos, 78:01)
On the Expansion of Software and Agency:

“There are so many places where we could use software and that software could be more personalized to small groups or even individuals that we just are missing out on.”
(Enbirakos, 42:40)
On Modern Product Skills:

“Project-based learning…mental plasticity in how you get things done…is the best simulation of what future work would look like.”
(Enbirakos, 72:03)

Timestamps for Important Segments

00:00–05:07: Codex history, agent paradigm, early prototypes
05:07–13:24: PR workflow, merge rates, security design
13:24–17:31: The spectrum of agent workflows; evolution of how code is written
17:31–23:44: How real developers use Codex; surprises from user patterns
24:28–29:21: Cloud vs. on-prem debate and product integration vision
29:21–35:58: “Best of N”, stochasticity, and inspiration from creative AI
35:58–39:05: Prototyping, hackathons, and compressed “time to magic”
39:05–43:55: User adoption in feature-building, advice for CS majors
43:55–47:54: Long-term vision—agents as true teammates in the cloud
53:23–61:43: Legacy modernization, impact on industry, needs of enterprise/government
63:12–68:43: Advice for startups—being lean & customer-focused in the AI era
69:50–79:32: Advice for students and young professionals; project-based learning, hiring

Tone & Closing Reflections

“If there’s one takeaway here, it’s just: you’ve gotta build.”
— Alexander Enbirakos, [79:29]

wavePod

How OpenAI Built Its Coding Agent

Summary

a16z Podcast – "How OpenAI Built Its Coding Agent"

Released: September 16, 2025

Episode Overview

Key Discussion Points & Insights

1. Codex Origins & Product Philosophy

2. Agent Design, Form Factor, & Merge Workflow

3. Security & Prompt Injection Attacks

4. Usage Patterns: Surprises and User Behavior

5. Form Factor Debate: Apple vs. Android / Cloud vs. On-Prem

6. Best-of-N & Human-Centric Curation in Coding and Creative AI

7. Impact on Software Engineering & Education

8. The Future: Cloud Agents as Ubiquitous Teammates

9. Impact on Industry, Legacy Code, and Global Upgrade Cycles

10. Advice for Founders and Teams

11. Advice for Students and Career Planning in the AI Era

Notable Quotes & Moments

Timestamps for Important Segments

Tone & Closing Reflections

Summary

a16z Podcast – "How OpenAI Built Its Coding Agent"

Released: September 16, 2025

Episode Overview

Key Discussion Points & Insights

1. Codex Origins & Product Philosophy

2. Agent Design, Form Factor, & Merge Workflow

3. Security & Prompt Injection Attacks

4. Usage Patterns: Surprises and User Behavior

5. Form Factor Debate: Apple vs. Android / Cloud vs. On-Prem

6. Best-of-N & Human-Centric Curation in Coding and Creative AI

7. Impact on Software Engineering & Education

8. The Future: Cloud Agents as Ubiquitous Teammates

9. Impact on Industry, Legacy Code, and Global Upgrade Cycles

10. Advice for Founders and Teams

11. Advice for Students and Career Planning in the AI Era

Notable Quotes & Moments

Timestamps for Important Segments

Tone & Closing Reflections