Latent Space: The AI Engineer Podcast
⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust
Guests: Brian Fioca & Bill Chen, OpenAI
Host: Latent.Space
Date: December 26, 2025
Episode Overview
This episode brings together Brian Fioca and Bill Chen from OpenAI for an in-depth conversation about the evolution and future of coding AI agents—specifically, the new GPT5-Codex-Max model. The discussion focuses on training agents with distinct personalities, advanced tool integrations, building trust with engineers, agentic architectures, and the new abstraction layers enabled by next-generation coding models. The guests share insights into agent design philosophy, real-world use cases, and the cultural and technical shifts underway in AI-powered software engineering.
Key Discussion Points & Insights
1. Naming and Design of Codex-Max (00:35–02:31)
- Origins and Meaning: The name "Max" was chosen to distinguish from previous models and signal its ability to run for extended periods (24+ hours), favoring speed and maximalist operation over deliberate, slower alternatives ("Pro").
- “Max can run for a really long time. Like, 24 hours or more. It’s about speed and maximization, like maximalist.” — Brian Fioca (01:44)
- 'Max' Model Performance: Not just longevity, but improved speed and quality—"better and faster" coding outcomes for the same types of problems compared to previous models.
2. Training for Personality and Trust (03:02–04:11, 09:15–11:36)
- Purposeful Personality: Training the model with a trustworthy “pair programmer” persona was central to increasing developer adoption. Characteristics like communication, planning, and verification were emphasized.
- “It’s really important to build trust with developers...if a model doesn’t act the way that you expect or doesn’t work alongside you as well, you’re not going to really trust it.” — Brian (03:15)
- The model communicates its thought process, announces tool usage, and keeps users informed to foster collaboration.
- Customizing Personality:
- “I created a [more fun] personality for my coding agent…because I want my tools to be fun to work with.” — Brian (11:12)
- However, verbosity can be a downside in long-run agentic tasks; toggling “personality” is easier in general-purpose GPT-5 models than in the more opinionated Codex line.
3. Tool Use & Harness Design (04:36–08:36)
- Integration and Adaptability:
- Codex’s training is deeply tied to terminal tool interactions. Partners discovered increased performance by mimicking terminal tool conventions even for non-terminals.
- “If you call it ‘grep’, it does a little bit worse. If you call it ‘rg’, it actually does really well.” — Bill (08:24)
- Model “habits” emerge akin to human muscle memory, making harness and naming conventions critical to outcomes.
- Codex’s training is deeply tied to terminal tool interactions. Partners discovered increased performance by mimicking terminal tool conventions even for non-terminals.
- Model Generalization: The "5.5.1 non-codex" (mainline) models are more general and steerable for diverse tools, at some cost to specialized performance.
4. Shifting the Abstraction Layer Upwards (11:55–15:56)
- Agent Layer Abstraction:
- More opinionated, agent-centric design—packaging agents (like Codex) as ready-to-plug-in units for platforms, decreasing the need for continual adaptation to model upgrades or API changes.
- “Rather than focusing on optimizing with every single model release, you can just plug in an agent like Codex into your platform and use it as a box.” — Bill (12:45)
- Implications for Startups: Teams can now focus one layer above, integrating agentic behaviors without heavy lifting on harness updates or resetting for every new model.
- More opinionated, agent-centric design—packaging agents (like Codex) as ready-to-plug-in units for platforms, decreasing the need for continual adaptation to model upgrades or API changes.
5. Sub-Agents and Multi-Agent Architectures (14:04–15:56)
- Emergence of Sub-Agents: Codex-Max’s self-managed context windows enable it to spawn and manage sub-agents, parallelizing work and opening up "multi-agent" workflows as a new design paradigm.
- “Codex Max manages its own context window...so it can run basically forever...and hand off its own context to sub-agents.” — Brian (14:46)
- Next-Level Agentic Workflows: The team expects significant evolution in long-running, agent-coordinated tasks and modular integration—agents that use agents and create new abstractions on the fly.
6. Trust, Evaluation, and Applied Evals (16:10–20:23)
- Organizational Uptake & Trust:
- At OpenAI, widespread adoption of Codex (“I haven’t written a single line of code by hand in months”—Brian, 15:56) hinges on rigorous evaluation pipelines, robust tracing, and meta-prompting for continuous improvement.
- Evals as a Path to AGI:
- “The path to AGI goes through evals...There are a lot of academic evals...but a lack of evals of the real world on what people care about the most.” — Bill (17:43)
- Tight focus on “applied evals” to capture real user priorities, with platforms for agent traces and rollouts.
- Multi-Turn Evals: Evaluating agents over extended interactions is becoming a core challenge. Ideas include using LLMs as judges for world trajectories, “job interview evals,” and building agentic harnesses that mimic real-world tasks.
7. Automation and Beyond Coding (22:28–24:51)
- Personal & Professional Automation:
- Coding agents increasingly used for personal productivity—organizing desktops, automating mundane tasks, handling email, and beyond.
- “I had Codex go through my messy directory...completely organize them...it was wonderful.” — Brian (23:51)
- “I used it for something more boring—organizing my desktop.” — Bill (24:00)
- The lines between coding tools and general automation agents are disappearing.
- “Coding tools are breaking out of coding and just...everything. They're personal automation.” — Bill (24:18)
- Coding agents increasingly used for personal productivity—organizing desktops, automating mundane tasks, handling email, and beyond.
8. Vision, UI, and 2026 Predictions (24:51–27:06)
- Vision Native Agents:
- Current agents are not yet “vision native”—better integration of visual and UI-based tasks is expected by next year.
- More ‘Computer Use’ Agents:
- Anticipation of agents building their own integrations even with legacy/no-API apps, suggesting a next leap in extensibility and general-purpose automation.
- Democratization of Software Engineering:
- “I wish every company...could turn to a coding model and be like, hey, how do we do this crazy refactor...and have it be so trusted and so right and so smart that we can actually perform better than we could normally get access to.” — Brian (26:16)
Notable Quotes & Memorable Moments
-
On Model Habits:
- “Model training is literally like, they develop habits just like a person does.” — Brian (08:36)
-
On Model Steerability:
- “If you’re wanting to go bleeding edge coding focused, pay attention to the Codex line...people are having success bending it in ways that maybe we haven’t thought of.” — Brian (06:46)
-
On Agent Layer Abstraction:
- “Packaging that up more closely so we’re actually shipping the entire agent together. Then you can actually build on top of that agent.” — Bill (11:55)
-
On Model Trust and Testing:
- “At OpenAI, I get to work with some of the most amazing developers...I wish every company...could turn to a coding model and be like, hey, how do we do this crazy refactor...and have it be so trusted and so right and so smart that we can actually perform better than we could normally get access to.” — Brian (26:16)
-
On Automation Beyond Coding:
- “Coding tools are breaking out of coding and just like everything, they're personal automation agents.” — Bill (24:18)
-
On Evals and AGI:
- “The path to AGI goes through evals...applied evals is capturing all of those sorts of real-world use cases and things for us to hill climb together.” — Bill (17:43)
Timestamps for Important Segments
- Naming & ‘Max’ Philosophy — 00:35–02:31
- Training for Trust & Personality — 03:02–04:11, 09:15–11:36
- Tool Use, Harness Design, and Emerging Model Habits — 04:36–08:36
- Model Differentiation: Codex vs. Mainline Models — 06:40–07:38
- Move toward Agent Layer Abstraction — 11:55–15:56
- Sub-Agents & Parallel Workflows — 14:04–15:56
- Evaluations: Applied Evals and Multi-Turn Challenges — 16:10–20:23
- Personal Automation Use Cases — 22:28–24:51
- 2026 Predictions & Vision for AI Engineering — 24:51–27:06
Final Thoughts
Brian and Bill’s discussion paints a compelling vision for agentic AI in software engineering: faster, more reliable coders with rich, modular personalities and a growing capacity for trust. The abstraction is moving upward—soon, entire teams may simply integrate and build atop intelligent agents that handle everything from code integration to personal productivity. The guests foresee a future where agentic behaviors, enhanced by persistent context and evaluative feedback, radically democratize coding and automation—expanding the reach of “coding agents” far beyond code itself.
Contact & Feedback:
Brian and Bill invite listeners to share feature requests and feedback via email or social media, highlighting OpenAI’s open stance toward collaborative product development (27:11–27:29).
