Latent Space: The AI Engineer Podcast
Episode: DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever
Date: October 7, 2025
Host: Latent.Space (Alessio and Swyx)
Guests: Sherwin & Christina, OpenAI Platform Team
Overview
This episode dives deep into OpenAI’s third annual DevDay, focusing on the major launches and trends shaping AI engineering in 2025. Guests Sherwin and Christina from the OpenAI Platform Team discuss the debut of the Apps SDK, Agent Kit, integration of the MCP protocol, improvements in code generation and agent building, and the continuously evolving role of prompt engineering. The conversation is rich with practical insights, memorable moments from DevDay, product philosophy, and candid perspectives from the people designing these core tools for millions of developers.
Key Discussion Points and Insights
1. DevDay: Growth, Community, and Culture
Timestamps: 00:05 – 02:53
-
Growth of DevDay and Developer Community:
- DevDay has matured, growing from a small event in 2022 to hosting over 4 million developers (01:17).
- Improved organization, including a dedicated podcast studio—a sign of OpenAI’s commitment to developer engagement.
- Community feedback directly influences event logistics and experience.
-
OpenAI’s Mission Extended:
- "Our mission at OpenAI is one, build AGI, which we're trying to do, and then... is to bring the benefits of that to the entire world. We really need to rely on developers, other third parties, to be able to do this." – Sherwin (03:22)
-
Product & Event Evolution:
- Iterative approach to DevDay and platform features, influenced by real developer use and feedback (02:06, 05:24).
2. Apps SDK: A New Developer Paradigm
Timestamps: 02:59 – 08:54
-
Apps SDK as Extension of API Philosophy:
- Expands from exposing endpoints to empowering developers to leverage OpenAI’s massive user base (04:31).
- "I view this [the Apps SDK] as a natural extension...engaged with developers as a way for us to bring the benefits of AGI to the rest of the world." – Sherwin (04:39)
- Builds upon evolutions from plugins and GPTs, with an increasing focus on giving third parties more control over UI and experience (08:06).
-
MCP Protocol Integration:
- Adoption of the open and general MCP protocol (from Anthropic) as a backbone for agents and tools, facilitating cross-ecosystem interoperability (05:24 – 07:18).
- Collaboration with other AI companies and standards bodies.
- "It's a great protocol. It's very general...very easy for us to integrate because of how streamlined and how simple it was." – Sherwin (07:07)
- Notable OpenAI team contribution: Nick Cooper on MCP steering committee (06:19).
-
UI and Experience Inversion:
- The interface model has shifted: ChatGPT becomes the top layer, integrating other experiences inside it, rather than acting as just a plug-in to existing apps (07:18).
- Noteworthy demo: Canva embedded in ChatGPT with rich context.
3. Agent Kit: From Building Blocks to Production-Ready Agents
Timestamps: 08:54 – 15:06
-
What is Agent Kit?
- "A full set of solutions to build, deploy and optimize agents...it all takes a lot of expertise. So [we’re] packaging those learnings into a set of tools that make it a lot easier..." – Christina (09:26)
- Includes Agent SDK, Agent Builder (visual), Connector Registry, Chat Kit, eval tools, and more (10:14).
-
Demo Recap and Capabilities:
- Christina’s 8-minute live build of a DevDay website agent drew attention for its speed, completeness, and customizability (08:54).
- Visual workflow: includes advanced nodes like user approval, state management, supporting complex human-in-the-loop applications (11:35).
-
Templates and Playbooks:
- Released a library of common agent templates, e.g. “Customer service,” “Data enrichment,” “Document comparison,” based on field experiences (14:33).
4. Agent Builder: Entry Points, Export, and Round-Trip Design
Timestamps: 12:22 – 15:06
-
Dual Usage:
- Agent Builder intended both as a playground for non-coders and as a professional tool with a code export path (13:33).
- "You can export it and run it in your own systems...or get all the benefits of us deploying that for you too." – Christina (13:33)
-
Long-Term Vision:
- Future support for bringing external changes back into the builder, running code in Agent Builder, and highly flexible integration paths (13:42).
5. Ecosystem Interoperability & Agent Standardization
Timestamps: 15:06 – 17:25
- Protocol Interop and Open Standards:
- OpenAI considers promoting protocols for agent workflows and stateful APIs, building on lessons from MCP and Responses API (15:23).
- Evals now support third-party and open-source models via OpenRouter, aiming for ecosystem-wide benchmarking and seamless multi-model setups (16:54).
- "Would be great if that could be a standard...developers don't need to build three different stateful API integrations if they want to use different models." – Sherwin (16:18)
6. Evals: Grading and Improving Sophisticated Agents
Timestamps: 17:25 – 21:16
-
Agent Evals and Trace-Based Scoring:
- Current support for grading entire agent traces—a first step toward granular measurement and optimization of long-running, complex workflows (17:44).
- Roadmap includes breaking down traces and integrating more human-in-the-loop evaluations.
- Increasing support for multi-modal and agentic evaluation.
-
Prompt Optimization and Rubrics:
- Automated prompt optimization looped with evals—core part of how OpenAI sees the future.
- "It's a really cool time right now in prompt optimization...Not only are there a lot of products gearing around this, but also interesting research [e.g. DataBricks]." – Sherwin (20:05)
- Prompt engineering more vital than ever, even as many expected it might diminish (20:51).
7. Codex: Power User Tips and Internal Usage
Timestamps: 39:12 – 42:01
-
Letting Codex Do More:
- Power tip: Trust Codex to take on full features, not just snippets; sometimes the “intern” AI can one-shot entire tasks (39:24).
- Codex is now a core internal tool; teams frequently start PRs and even code reviews with Codex’s help, leading to faster context switching and productivity.
-
Memorable Quote:
- "Really lean into, push yourself to trust the model to do more and more...a lot of times it can actually do stuff that surprises me and then I have to like readjust my priors." – Sherwin (39:24)
-
Codex Self-reviews:
- Codex-based code reviews trusted internally for high quality, not just as a novelty (41:02).
8. Service Health Dashboard: Transparency in Reliability
Timestamps: 42:23 – 44:03
-
What It Is:
- A personalized dashboard that tracks the health of your OpenAI API integration, including real-time SLOs, token usage, and error codes.
- "We spent so much time on [reliability] and we feel confident enough to have it behind a product now." – Sherwin (42:27)
- Goal: 4-nines, aiming for 5-nines reliability.
-
Context:
- Responding to previous outages with robust infrastructure upgrades and increased transparency.
Notable Quotes & Memorable Moments
- On Prompt Engineering’s Endurance (20:51):
- “Two years ago people were like, oh, at some point prompting is going to be dead...if anything it's only gotten more important.” – Sherwin
- On the Ecosystem Shifting (07:18):
- “It’s kind of like inverted where there's ChatGPT at the top layer and then your website is embedded inside of it. That is an experience I've never seen.” – Host
- On Internal Workflows and the Future (32:45):
- “A lot of companies have built some version of an agent builder internally...We really wanted to build a platform so that [this] is not, you know, an area that every company needs to invest in and rebuild from scratch.” – Christina
- On Leaning Into Codex (39:24):
- "...Push yourself to trust the model to do more and more. Before I feel like I was in this...safe space—I'm just giving this thing like a tiny bit of rope. And because of that I was kind of limiting myself..."
Additional Highlights and Forward-Looking Themes
- MCP Registry & Connector Ecosystem:
- OpenAI is experimenting with supporting both 1P (first-party) and 3P (third-party) connectors, balancing deep integration quality and long-tail ecosystem diversity (27:34).
- Multimodality and API Ecosystem Growth:
- Strong focus on supporting new modalities, external API expansion, voice, and visual-rich widgets in the ChatKit UI (25:02, 37:34).
- Open vs. Evergreen UX:
- ChatKit runs as an embeddable iframe, ensuring a constantly updated experience and drawing a parallel to Stripe’s “elements/checkout” approach.
- Cost Controls and “Bring Your Own Key”:
- The demand for user-borne infrastructure and flexible cost responsibility in public agent deployments (31:07).
- Developer Feedback Loops:
- Continuous improvements come from direct developer and enterprise user input, shaping everything from workflow node design to evaluation tooling.
Episode Flow: Reference Timestamps
- Intro & DevDay Retrospective: 00:05 – 02:53
- Apps SDK, MCP, Mission: 03:22 – 08:54
- Agent Kit, Builder, Templates: 08:54 – 15:06
- Interop & Third-Party Models: 15:06 – 17:25
- Evals & Prompt Optimization: 17:25 – 22:10
- Codex Usage Tips: 39:12 – 42:01
- Service Health Dashboard: 42:23 – 44:03
Conclusion
This episode paints a rich, dynamic picture of where AI engineering sits as of late 2025:
- The shift from narrow APIs to complete, production-ready agent platforms (with strong visual tools),
- The rise of interoperability and protocol standardization,
- Prompt engineering as an evolving discipline—growing, not shrinking,
- OpenAI’s focus on robust, consumer-grade developer tools drawing from both internal and user experiences,
- And a collaborative vision for the next wave of agents, code generation, and multi-modal AI.
A must-listen for anyone building with (or competing against) OpenAI in 2025.
