Latent Space: The AI Engineer Podcast
Episode: ChatGPT Codex: The Missing Manual
Date: May 16, 2025
Episode Overview
This episode dives into OpenAI’s ChatGPT Codex—their latest software engineering agent for code generation, automation, and developer workflows. Hosts Alessio and Swyx are joined by Alexander (from the Codex product team) and Josh, both instrumental in building and shaping Codex’s features and philosophy. The conversation dissects Codex’s origins, technical architecture, best practices, the future of AI-powered code engineering, and real-world usage advice direct from those at the frontier.
Key Discussion Points and Insights
1. Origins and Evolution of Codex
- Alexander’s Journey:
- Previously worked on "multi," a native macOS pair-programming tool aimed at human-to-human collaboration.
- With ChatGPT’s rise, shifted to exploring human+AI pairing, leading to joining OpenAI.
“We were...working on human to human collaboration. And…what if...it was like a human pair programming with an AI?” – Alexander [01:59]
- Josh’s Journey:
- Founded Airplane (internal tool dev platform), later moved to Airtable to lead AI engineering.
- Saw the looming “moon landing” moment as AI agents became capable software engineers.
- Joined OpenAI to shape the form factor and purpose of Codex.
“I think we are going to build an agentix software engineer...whether or not I was involved in the next two years.” – Josh [06:58]
- Codex’s Core Concept:
- The breakthrough was giving AI agents access to terminals and computers, making them function as autonomous software engineers rather than just code-completing models.
“We realized...it’s just super valuable to figure out how to give a reasoning model access to, to a terminal. And then now we have to figure out how to make that a useful product and how to make it safe.” – Alexander [04:20]
- The breakthrough was giving AI agents access to terminals and computers, making them function as autonomous software engineers rather than just code-completing models.
2. Codex: Model vs. Agent
- Beyond Code Completion:
- Codex reframes the AI’s role from a “coding model” to an “agent” that autonomously executes software engineering tasks end-to-end.
- This required new strategies for integrating tool use, environment management, PR descriptions, and test results.
- Hosted vs. Local:
- Codex available in ChatGPT is essentially a hosted (cloud-based) Codex CLI, but with significant architectural and UX differences.
- Emphasis on form factor flexibility, scalability, and collaboration.
“This is not just a model that's good at coding, but rather this is an agent that is good at independent software engineering work.” – Alexander [09:24]
Memorable Quote:
“The feeling is...it takes a leap of faith. The first few times you’re like...not really sure if this is going to work...but then it comes back and it’s like, wow, this agent went out, wrote a bunch of code, wrote scripts...tested this and it really went through the full end to end.” – Josh [12:47]
3. Best Practices for Using ChatGPT Codex
Advice for Maximum Effectiveness
- Adopt Modern Development Practices
- Use linters and formatters: helps agents verify code in the loop.
- Commit hooks become valuable agent checkpoints, even if humans find them tedious. [14:21]
- Invest in AgentsMD
- Codex reads from
agents.mdfiles for agent-specific instructions. - Hierarchy-aware: can differentiate instructions across subdirectories.
- Start simple, gradually evolve as Codex interacts with your repo.
“A simple agent's MD will get you a long way...We would really like to do is auto generate this...but we figured we ship faster rather than later.” – Josh [15:31]
- Codex reads from
- Codebase Clarity
- Modular and well-structured codebases help Codex (and humans).
- Good PR descriptions and discoverable codebases reduce onboarding friction.
- Choose type-safe languages (TypeScript over JavaScript) and modular architecture for easier AI navigation. [18:40]
- Name things in a way that’s efficient for both humans AND AI agents.
On the Difference Between README and AGENTMD
“Maybe agents MD ends up being the stuff that you need to tell the agent that it’s not automatically figuring out from the README.” – Alexander [23:12]
“For agents, I don’t think you really had to tell it code style. It looks at your code base and writes code that's consistent to that. Whereas a human’s not going to take...their time to go through the code base and follow all the conventions.” – Josh [25:02]
4. Product Decisions and Model Philosophy
-
Prompt-Driven or Deterministic?
- Codex leans on prompting and trusting the model over hard-coding deterministic pipelines.
- The philosophy is to train the model to “learn” good practices instead of enforcing them via scaffolding.
“The model isn’t all the product, but the model is the product.” – Alexander [29:12]
-
Delegation Mindset
- Optimal use: fire off many tasks, don’t micromanage or watch step-by-step.
- Speed and parallelization welcomed: “the more you’re running in parallel, actually, I think the happier we are.” – Alexander [39:55]
- Designed for an abundance, not resource-constrained, mindset.
-
Long-running, Fully Autonomous Tasks
- Codex is built for tasks that may take up to an hour (currently capped).
- Focus is on one-shot, high-value outputs versus continuous human-in-the-loop or multi-stage workflows.
“What we see as the role of Codex...is to really push frontier on that sort of single shot. Autonomous software engineering.” – Josh [46:10]
5. Technical Details & Safety
-
Environment & Compute Platform
- Agents get their own isolated compute environments with customizable setup scripts (often to install dependencies).
- No external internet access while agents run (to avoid exfiltration and ensure safety).
- Humans can interactively tweak environments (limited REPL available).
“We would like to give humans and agents alike as much access as possible within safety and security constraints...But once the agent starts running...we'll cut off internet access...For now.” – Josh [41:49, 42:57]
-
Concurrency & Rate Limits
- Rate limits: currently 60 runs per hour per user.
- Average run time is much less than 30 minutes, hour-long tasks are rare.
- Parallel tasks and rapid iteration are encouraged.
6. Real-World Use, Feedback, and Vision
- Workflow Change
- Codex isn’t meant to be used like a chat or paired-interactive coding session. Instead, fire-and-forget multiple tasks and collect outputs.
“I found out I was using it wrong. I was using it like, cursor. Like...I had my chat window open and I watched it code...you guys are just firing the things off and...going on about your day.” – Swyx [40:13]
- Codex isn’t meant to be used like a chat or paired-interactive coding session. Instead, fire-and-forget multiple tasks and collect outputs.
- Iterative Experimentation
- Codex is in research preview; user feedback will shape everything from environment setup to pricing.
- OpenAI’s philosophy is to “iterate, release quickly, and generalize successful agent behaviors to larger models.”
- Future Features
- Multimodal inputs, more network access, tighter IDE integration, and even easier environment customization (Docker, Dev Containers) are top priorities for moving out of research preview.
“Some of the items...are top of mind for me are multimodal inputs...just giving it a little bit more access to the world...UI...not the final form...” – Alexander [47:42]
- Multimodal inputs, more network access, tighter IDE integration, and even easier environment customization (Docker, Dev Containers) are top priorities for moving out of research preview.
Notable Quotes & Memorable Moments
-
On Shifting Developer Mindset:
“You must have an abundance mindset and you must think of it as not using your time to explore things...The more you’re running in parallel, actually, I think the happier we are.” – Alexander [39:55]
-
On Building for the Future:
“We really want to get as much of that complexity, as much of that state machine as possible pushed into the model.” – Alexander [30:58]
-
On Difference Between Human and Agent Practice:
“For agents, I don't think you really had to tell it code style...Whereas a human’s not going to take its time...to go through the code base and follow all the conventions.” – Josh [25:02]
-
On Feedback and Environment Customization:
“How do you get close that last 30, 40%?...Would love feedback from folks on how they would like to see their environment customized. Do they want to just ship us a Docker image?...still very much an open question.” – Josh [50:53]
-
On AI Hiring Analogy:
“If you start with a base reasoning model...you have this...weirdly spikily intelligent...college grad...What we’ve done with Codex 1 is basically give it its first few years of job experience.” – Alexander [16:47]
Timestamps for Major Segments
- 01:59 — Alexander’s journey to OpenAI & Codex origins
- 06:58 — Josh’s path and vision for agentic software engineering
- 09:00 — Hosted vs. local Codex: What’s the difference?
- 12:47 — Why Codex “feels different” from other coding assistants
- 14:21 — Best practices: linter, commit hooks & modular code
- 15:31 — Using and evolving AgentsMD files
- 18:40 — Codebase clarity, naming, and agent ergonomics
- 23:12 — Distinguishing README and AgentsMD
- 29:12 — Codex’s model-driven product philosophy
- 39:55 — Rate limits, concurrency, and productive workflow advice
- 41:49 — Compute platform, safety, and environment management
- 46:10 — Codex’s role and ambition: one-shot autonomous engineering
- 47:42 — What’s left for Codex post-research preview?
- 50:53 — Call for feedback on environment customization
Closing Notes & Calls to Action
- Try Codex now in ChatGPT Labs! Generous research preview limits—try all workflows and provide feedback, especially on how to best customize your agent’s environment.
- Feedback sought: OpenAI wants to hear how Codex works in your stack, which setups/environments you’d want supported, and any pain points.
- Vision: OpenAI’s focus is not just developers—it’s taking notes from this experiment for a more general future of agent-powered work across all disciplines.
Visit latent.space for show notes, follow-up links, and more.
Summary compiled in the spirit and language of the episode’s lively, technical, and open-dialogue tone, to provide both actionable insights for practitioners and context for non-listeners.
