Summary6 min read

Podcast Summary: Next in AI – Open Responses: An Interoperable LLM Interface Specification

Date: January 17, 2026
Host: Next in AI
Episode Focus: A deep dive on the OpenResponses specification – an open standard for interoperable AI agent interfaces, examining its technical innovations and their implications for developers, industry, and users.

Episode Overview

The hosts explore the "vibe shift" in AI, moving from chatbots to agentic AIs—systems that don’t just converse, but act on users’ behalf. They break down the pain points of current AI integrations, introduce OpenResponses as a new open-source standard (analogous to USB-C in hardware), and analyze its technical and philosophical impact.

Key Discussion Points

I. The Era Shift: From Chatbots to Agents (00:00 – 01:32)

AI has evolved from simple chatbots (text-in, text-out) to complex agents that execute actions in the real world.
Current agent development is fragmented; every provider (OpenAI, Anthropic, Gemini, Mistral) has unique "plumbing" for their models.
Quote – “It feels quaint, doesn't it? Like remembering the days of dial up or T9 texting.” (B, 00:25)
Developers face high friction swapping between providers; workflows brittle due to API mismatches.
“It's like trying to build a skyscraper using LEGO bricks... if you sneeze, the whole thing falls over.” (B, 01:14)

II. OpenResponses: Defining an Open Standard (01:32 – 02:26)

OpenResponses is not a product or service; it’s a specification—“a standardized ecosystem. Think of it like USB C for AI.” (B, 01:45)
Goal: plug-and-play interoperability, removing proprietary silos.
Review of source documents: GitHub spec, technical charter, and a retrospective blog post, “Why We Built the Responses API.”

III. Why the Old Ways Broke Down (API Evolution History) (02:26 – 05:20)

Three Legacy Phases:
1. Completions Era: Primitive, only completed text strings.
  - “Sophisticated autocorrect.” (A, 03:10)
2. Chat Completions: Introduced roles (system, user, assistant); “built in a single weekend” to support RLHF (B, 03:40)
3. Assistance API: Added tools, but became “clunky and proprietary... debugging was a nightmare.” (B, 05:20)
Major Flaw: Models lose their reasoning context between turns (“Detective analogy”).
- A detective who loses their notes (internal monologue) after every session.
- “They have amnesia about their internal process... the why is gone.” (A/B, 04:53)
- Result: inefficiency, wasted compute and cost.

IV. What OpenResponses Changes (05:20 – 09:44)

Key Principle: Models now retain their internal state and reasoning across turns (“the detective keeps the notebook”).
- Benchmarks: 5% increase on Taobench—a major gain in tool use and reasoning. (A/B, 05:47–06:04)
- 40–80% better cache utilization—massive cost improvements. (A, 06:17)
  - “That's the difference between being profitable and going bankrupt.” (A, 06:27)

a) The Agentic Loop: Perceive, Reason, Act, Reflect (06:35 – 07:39)

Old paradigm: serial, text exchanges; new: agentic loops—models can perceive, reason, act, reflect before giving results.
Technical Shift: Messages replaced by Items
- “Items replace messages. The key word: items are polymorphic.” (B, 07:05)
- Items can be messages, function calls, images, etc., with strict typing.

b) Stateful Items and Semantic Streaming (07:39 – 09:44)

Items as State Machines: Each has a lifecycle (in-progress, completed, failed).
Enables UIs to immediately reflect model state (“alive feeling”).
Semantic Streaming: Structured, event-based updates let UIs switch contexts (from text box to code widget) seamlessly.
- “It was like parsing the matrix in real time...” (A, 09:09)
- “Now, because we have these stateful items, streaming uses semantic events.” (B, 09:11)

Notable Quotes & Memorable Moments

“So when you ask the follow up, they just pick up right where they left off.” (A, 05:43)
“If you're waiting for the model to finish its entire thought process before showing anything, the app feels broken. By exposing the state, you can build a UI that reacts instantly.” (B, 08:13)
“Your UI can immediately switch from showing a text box to showing a code widget, or a map. It enables these really rich, distinct interfaces.” (B, 09:28)
“It really feels like they listened to the complaints developers have had for the last couple of years.” (A, 13:05)

Philosophical and Security Implications (09:44 – 10:53; 16:09 – 17:08)

The Hidden Chain of Thought

By default, the internal reasoning of the model can be encrypted, hidden from the user or even the application owner.
- ”The spec explicitly says that the model's inner monologue can be hidden from the user... as a user, I'm not sure I like that.” (A, 09:59)
- Two safety reasons:
  1. Prevent users from seeing/copying hallucinated errors.
  2. Keep alignment processes hidden (e.g. if the model contemplates an unethical action).
- Rational: “If you, the user, see that thought, you completely lose trust in the system.” (B, 10:35)
Providers can supply a summary of reasoning, but the full internal state can be fully private to the AI.

Who Owns the Standard? Governance and Trust (13:31 – 15:25)

Not vendor controlled: Spec charter mandates no vendor can have a majority of core maintainers—prevents OpenAI, Google, etc., from controlling the standard.
“If this was proprietary, it would fail....” (B, 13:44)
Ensures any LLM provider—commercial or open source—can implement the spec.
Extensibility via Slugs: New features (e.g. “Acme ReasoningGraph”) can be added as custom items, ensuring innovation is not blocked but also doesn’t break other clients.

Developer-Centric Innovations

Allowed Tools: Filter tool access per turn with almost zero cache impact—a huge optimization.
- “With allowed tools, you keep the big list... but you just pass a lightweight filter saying, for this turn, strictly limit yourself to this one tool.” (B, 12:34)
Previous Response Save: Pass an ID instead of entire history each turn—huge boost for latency and cost.
Strict Separation: User content (chaotic, untrusted), model content (strict, validated)—improves security.

The Broader Impact: AI Enters the "Stateful Collaboration" Age

“We're entering the stateful collaboration phase. And the OpenResponses spec provides the infrastructure for that.” (B, 15:31)
Enables LLM competition on “intelligence and price, not on trapping you in their ecosystem.” (A, 14:27)
OpenResponses provides the “rails” for reliable, interoperable, agentic AI applications.

Thought-Provoking Closing (16:09 – 17:08)

Hosts ponder the philosophical ramifications of AIs with “private thoughts”—memories and reasoning steps inaccessible to users.
- “If our AI agents have private thoughts and memories... how does that change the dynamic of trust?” (A, 16:34)
- “We're used to software being transparent. You know, code is code. But here we're saying we want the results of the intelligence, but we're kind of afraid of the process of the intelligence.” (B, 16:58)
- “Are we really ready for software that keeps secrets from us for our own safety?” (A, 16:58)

Timestamps for Important Segments

00:00 – Agent era vs. chatbot era: state of the industry
01:32 – OpenResponses standard introduction
03:05 – History: Completions, Chat Completions, Assistance API
04:23 – Detective analogy for reasoning loss
05:47 – Benchmarks and efficiency gains
06:47 – Agentic loop & Items vs. Messages
07:45 – Stateful items and semantic streaming
09:57 – Hidden reasoning and user trust
12:01 – External vs. hosted tools; tool selection
13:31 – Governance, lock-in, extensibility (slugs)
15:31 – Broader implications: stack maturation
16:09 – Rethinking trust: AIs with secret internal states

Conclusion

This nuanced, fast-paced discussion illuminates how OpenResponses could transform how AI agents are built, deployed, and trusted. The hosts highlight impressive technical advances—stateful reasoning, efficient streaming, true interoperability—as well as the deeper human questions about trusting AI with persistent, secret memories. For developers and founders, this episode offers a clear map of the future AI application stack and the challenges still ahead.

Loading summary

Transcript129 lines

[00:00]
A
Welcome back to the Deep Dive. It is early 2026, and I think we need to have a very honest conversation about the sort of vibe shift that's happened in tech recently. If you have been building with AI or even just using it heavily, you know that the chatbot era, that novelty of just, you know, typing into a
[00:21]
B
box and getting text back, it feels very 2024.
[00:24]
A
It feels very 22.
[00:25]
B
It feels quaint, doesn't it? Like remembering the days of dial up or T9 texting. It got the job done, but you back. You wonder how we ever had the patience for it.
[00:34]
A
Exactly. We are firmly in the era of agents now. We don't want AI that just chats. We want AI that does things, that takes action. We want to book flights, debug our code, manage our schedules. But here's the reality check, and this is the core problem we're tackling today. Building these agents has been an absolute nightmare.
[00:51]
B
It's a huge fragmentation problem. You've got OpenAI, you've got anthropic, Gemini, Mistral. They all have these models that are smart enough to be agents, but the plumbing is different for every single one.
[01:01]
A
Right. So if I'm a developer and I want to swap out, say, GPT5 for Claude in my application, you're rewriting everything. I essentially have to rewrite my entire backend. Different ways of handling tools, different streams, different error codes. It's just messy.
[01:14]
B
It's worse than messy, it's brittle. I mean, we've been trying to build these complex agentic workflows on top of API primitives that were frankly, designed for simple chat. Yeah, it's like trying to build a skyscraper using LEGO bricks. You can do it, but it's going to be wobbly and if you sneeze, the whole thing falls over.
[01:32]
A
So today we are unpacking the solution that promises to fix all of this. We're doing a deep dive into OpenResponses right now. To be clear, for everyone listening, this isn't a product you can buy. You don't sign up for a subscription to OpenResponsys.
[01:45]
B
No. And that's the most important part. OpenResponses is an open source specification. It's a standardized ecosystem. Think of it like USB C for AI.
[01:54]
A
Okay, I like that.
[01:55]
B
Before USB C, we had that drawer full of proprietary cables. Right now, for the most part, things just plug in and work. That is the goal here.
[02:03]
A
We've gone through a serious stack of documents for this one. We've got the official GitHub specification the technical charter and a really fascinating retrospective blog post called why We Built the Responses API.
[02:16]
B
Yeah, that one gives a great look under the hood at the shift to GPT5.
[02:20]
A
And our mission today is really to figure out if this standard actually solves the headache or if it's just another layer of complexity.
[02:26]
B
Well, the core promise here is a fundamental shift in how humans and machines communicate. We're moving away from, you know, turn based chat.
[02:35]
A
You say something, I say something.
[02:36]
B
Exactly. Into what the spec calls a stateful agentic loop.
[02:42]
A
Stateful agentic loop. That does sound a little bit like marketing jargon.
[02:46]
B
It does, but it's actually a very precise engineering term. And to understand why it matters, you have to look at how broke broken the old way was. The blog post we read does a great job breaking down the history into three phases.
[02:57]
A
Yeah, I really like this breakdown. Contextualizes the pain we've all been feeling. It starts with phase one, the the V1 completions era.
[03:05]
B
The Stone Age. I mean, this was simple. Text in, text out. You gave the model a prompt, it predicted the next few words.
[03:11]
A
Sophisticated autocorrect.
[03:12]
B
That's all it was. It didn't know who you were, it didn't know what a conversation was, it
[03:16]
A
just finished your sentence, which was cool for 2022. But then we got phase 2v1 chat completions. This is the API that powered ChatGPT and, well, the entire generative AI, boom, the whole thing. But there's a detail in the source material that absolutely blew my mind. This API, the industry standard that billions of dollars of software was built on, was famously built in a single weekend.
[03:41]
B
A single weekend. It was a rush job. They needed a way to support RLHF reinforcement learning from human feedback, which needs convers.
[03:49]
A
Right.
[03:50]
B
So they needed the model to understand the difference between a user asking a question and the system giving an instruction. So they hacked together the structure of roles. You have system, user and assistant.
[04:00]
A
And it worked. Obviously, it got us this far.
[04:02]
B
It worked for chat, it was designed for humans talking to machines. But as soon as we tried to make these things reason or use tools, the agent stuff, the cracks just showed up immediately.
[04:13]
A
And the biggest flaw, which the blog post really highlights, is that reasoning got lost between the turns.
[04:19]
B
Right. And the blog uses this detective analogy that I think is just perfect for visualizing this.
[04:24]
A
Okay, let's hear it.
[04:25]
B
So imagine you hire a detective in the Chat Completions era. You ask the detective a question, they sit there, they think, they scribble some notes in their notebook, maybe draw a Diagram and then they give you an answer, then they walk out of the room.
[04:40]
A
Okay, standard procedure so far, but here's the catch.
[04:43]
B
When you call them back in for a follow up question, they have amnesia about their internal process. They remember what they said to you, the final answer. But they've lost the notebook, so they've
[04:53]
A
lost the scratch pad, the logic, the why.
[04:56]
B
The why is gone. They have to reduce everything from scratch just based on the transcript of what was said out loud.
[05:03]
A
That sounds incredibly inefficient. You're just burning compute and money to get the AI back to a mental state it was already in a minute ago.
[05:11]
B
Exactly. And that inefficiency just kills complex agents. Phase three, the Assistance API. Tried to fix this with hosted tools, but it was clunky and proprietary and
[05:21]
A
nobody wanted to be locked into one vendor's black box.
[05:23]
B
Nobody. Debugging was a nightmare. So that brings us to the hero of our story. Phase 4v1 responses the unification.
[05:32]
A
So going back to the detective, under
[05:34]
B
the responses spec, the detective keeps the notebook. It's really that simple. When the turn ends, the reasoning state, the internal monologue, the intermediate steps, it's preserved.
[05:44]
A
So when you ask the follow up, they just pick up right where they left off.
[05:47]
B
Exactly.
[05:48]
A
And this isn't just theoretical fluff. The sources cite actual benchmarks. They saw a 5% increase on Taobench,
[05:54]
B
which is a massive jump in the AI world. I mean, for those who don't follow the benchmarks, Taobench measures complex tool use and reasoning. Gaining 5% just by changing the API structure is huge.
[06:05]
A
It means the model isn't wasting brainpower just trying to remember what it was doing. Right. But the metric that really caught my eye was the efficiency. They report 40 to 80% better cash utilization.
[06:18]
B
That's the money stat. Literally, because you aren't resending and recalculating all that context every single time. You save huge amounts of bandwidth and
[06:28]
A
processing power for a startup burning through VC money on tokens. That's the difference between being profitable and going bankrupt.
[06:35]
B
It really is.
[06:36]
A
Okay, so that's the why. It's cheaper, faster, smarter. Now I want to get into the how because the architecture here is quite different. The spec talks about this agentic loop. How is that different from just a back and forth chat?
[06:47]
B
The whole philosophy changes. It goes from talk to me to perceive, reason, act, reflect. The API is designed to let the model do multiple things and then yield a result. In the old days, you sent a
[07:00]
A
message which was just a blob of
[07:01]
B
text, a Blob of text. Maybe some JSON, if you were lucky. And now we have items, right?
[07:06]
A
Items.
[07:06]
B
This is the biggest technical shift in the spec. Items replace messages. And the key word here is that items are polymorphic.
[07:14]
A
Let's unpack that. Polymorphic means they can change their shape.
[07:17]
B
Exactly. An agent does more than send messages, right? It calls a function, it generates an image, it searches a file. So in open responses, an item changes its structure based on its purpose. A message item looks completely different from a function call item.
[07:32]
A
So you don't have to, like, parse a string to figure out if the bot is trying to call the weather tool. The API just hands you a function call object.
[07:40]
B
Precisely. It's strictly typed, but it goes even deeper. These items are also state machines.
[07:45]
A
State machines. Okay, this feels like we're getting into some deep computer science territory, but stick with us, because this really matters for the user experience.
[07:53]
B
It's actually very practical. An item isn't just a static thing. It has a lifecycle. When the model starts generating, the item is in progress. Once it's done, it moves to completed. If the model crashes or runs out of tokens, it might be incomplete or failed.
[08:08]
A
And why does a developer need to know that state? Can't they just wait for the text to pop up?
[08:14]
B
Because in a real time, agent latency feels awful. If you're waiting for the model to finish its entire thought process before showing anything, the app feels broken. By exposing the state, you can build a UI that reacts instantly.
[08:29]
A
So you can show a spinner when the state is in progress, or a little error icon if it hits failed.
[08:34]
B
Yes. It creates that alive feeling in the application. And the spec makes a crucial distinction here. This state is ephemeral. It's about the flow of the response. It doesn't mean you have to save it to a database forever. It's about flow control.
[08:49]
A
And this ties directly into semantic streaming. I think this is my favorite quality of life improvement in the whole spec. In the old days, streaming was just raw text hitting the screen, raw text deltas.
[08:59]
B
It was so dumb. You'd get a chunk of text, then a space, then another word. If the model decided to call a tool, your code had to basically guess that was happening by scanning the stream with regex.
[09:09]
A
It was like parsing the matrix in real time.
[09:11]
B
It was a total nightmare. Now, because we have these stateful items, Streaming uses semantic events. The API sends explicit signals like response output item added or response contentpart done.
[09:27]
A
So the UI never has to guess.
[09:29]
B
Never. The API literally tells you hey, I'm stopping text generation now, and I'm initializing a tool call. Your UI can immediately switch from showing a text box to showing a code, widget, or a map. It enables these really rich, distinct interfaces.
[09:44]
A
That's a huge leap forward for user experience. Now I want to circle back to something we touched on earlier. The Detectives Notebook. The reasoning. The spec has a whole section on how to handle reasoning. And this is where I think things get a little weird. A little controversial, maybe.
[09:58]
B
You're talking about the hidden chain of thought.
[10:00]
A
Yes. The spec explicitly says that the model's inner monologue can be hidden from the user. As a user, I'm not sure I like that. I mean, don't I want to see exactly how it got to its conclusion?
[10:10]
B
You'd think so, but the source material argues that exposing raw reasoning is actually pretty dangerous.
[10:15]
A
Dangerous how?
[10:16]
B
Two main reasons. First, hallucinations. The model might explore a wrong path, state a false fact, and then correct itself in the final answer. If you see the draft, you might believe the lie.
[10:29]
A
Okay, I can see that. And the second reason?
[10:31]
B
Alignment.
[10:32]
A
Meaning the model might consider doing something bad, but then decides not to.
[10:36]
B
Exactly. A model might think, I could just lie to the user to make them happy, but my instructions say I must be honest. If you, the user, see that thought, you completely lose trust in the system.
[10:48]
A
Right.
[10:49]
B
Or worse, the model might reveal biases it's trying to filter out.
[10:53]
A
So the OpenResponses spec allows providers to just encrypt this reasoning?
[10:57]
B
Yes. They can send encrypted content. It's totally opaque to you, the client. You can't read it, but you can pass it back to the model in the next turn. So it remembers its own thought process.
[11:06]
A
That is wild. So the AI has memories that I, the owner of the application, cannot access.
[11:13]
B
It's framed as a safety feature. Alternatively, the spec allows for a summary, a sanitized explanation of the reasoning that's safe to show the user.
[11:20]
A
That creates a fascinating asymmetry. And speaking of asymmetry, the spec also differentiates between user content and model content.
[11:29]
B
This is really a security and validation choice. User content is total chaos. Users upload broken images, weird binary data, malformed text, anything and everything.
[11:40]
A
Right.
[11:41]
B
Model content, on the other hand, is very strict. It's usually just serializable UTF8 text or specific tool calls. By separating them, validation becomes so much easier. You know, if it came from the model, it follows the rules.
[11:55]
A
Okay, let's talk about the acting part of the loop tools. The spec distinguishes between external and Hosted tools, right.
[12:02]
B
External tools are what we're used to. Functions. The model says get stock price and your server runs the code. Hosted tools are things like a code interpreter or file search that run entirely on the provider's infrastructure.
[12:12]
A
But the real power move for developers here seems to be tool choice. It's like they've finally given us the steering wheel.
[12:18]
B
They have. In the past, getting a model to stop using a tool or to use a specific one was hit or miss. Now you have strict modes, auto required, none. But the real gem is a field called Allowed tools.
[12:31]
A
How's that different from just listing the tools you want it to use?
[12:34]
B
It's all about optimization. Imagine you have this massive system prompt with 50 different tools defined. Sales tools, HR tools, whatever. That's a huge amount of context to cache if you want to restrict the model to only use the sales report tool for just one turn. In the old days, you had to send a whole new prompt, which breaks your cache and costs you money. With allowed tools, you keep the big list in the main definition so the cache stays hot. But you just pass a lightweight filter saying, for this turn, strictly limit yourself to this one tool that is smart.
[13:06]
A
It really feels like they listened to the complaints developers have had for the last couple of years.
[13:10]
B
It's very pragmatic. And that pragmatism extends to the context window itself with something called previous response save.
[13:17]
A
This is for resuming the conversation.
[13:19]
B
Yes. Instead of re uploading the entire chat history megabytes of text every single time, you just pass the ID of the last response. The server says, oh, I have that state in memory, and just logically appends the new input.
[13:31]
A
It solves the latency, it solves the cost, it solves the structure. But here's the elephant in the room. Who actually owns this? Because if this is just OpenAI or Google trying to force their standard on everyone, isn't it risky to adopt?
[13:45]
B
That is the critical question. If this was proprietary, it would fail. But the technical charter addresses this head on. OpenResponses is a community governed specification. It is explicitly designed not to be an OpenAI product.
[13:59]
A
But how do they actually enforce that?
[14:00]
B
Through the governance structure. You have contributors, maintainers, core maintainers, and there's one hard rule written into the charter. No single vendor may control a majority of core maintainer seats.
[14:10]
A
So OpenAI can't just vote to change the spec to Favor Some new GPT6 feature that breaks everything else.
[14:17]
B
Exactly. It prevents lock in. It means anthropic Gemini or Even someone running LLAMA 4 locally can implement this spec. If everyone speaks the same language, you can swap providers easily.
[14:27]
A
It forces them to compete on intelligence and price, not on trapping you in their ecosystem. Right, but what about innovation? If I'm a startup and I invent a totally new type of AI feature, say a reasoning graph, and it's not in the spec, am I just stuck?
[14:42]
B
This is where the extensibility comes in. They use a system called slugs. Slugs like the garden, pests like a namespace URL slug. If you're a company called Acme and you have a special tool, you can add an item type called Acme Reasoning Grab.
[14:56]
A
Ah, I see. So it's a prefix.
[14:58]
B
Right? And the spec mandates that clients have to handle these gracefully. If a generic chat app sees Acme ReasoningGraph and doesn't know what it is, it's required to just ignore it without crashing.
[15:09]
A
But your own specialized Acme client can render it.
[15:13]
B
So you get the stability of a universal standard, but with the flexibility to ship custom features precisely, it prevents the standard from becoming the lowest common denominator, which is often what kills these kinds of open standards.
[15:26]
A
So, pulling this all together, it really feels like we're witnessing the maturation of the AI stack.
[15:31]
B
We are. We're moving out of the experimental phase. We spent the last few years just marveling that a computer could write a poem or pass the bar exam. Now we're entering the stateful collaboration phase. And the open responses spec provides the infrastructure for that. It handles the reasoning, the tool use the streaming, and it lets everyone compete on a level playing field.
[15:54]
A
It feels like the Rails are finally being laid down for the trains to actually run on time.
[15:58]
B
And for the developers listening, it means you can stop fighting the plumbing, you can stop writing regex parsers for stream deltas and start building the actual applications that use this intelligence.
[16:10]
A
I love that. But before we wrap up, I want to leave our listeners with a thought that's been nagging me since we talked about the encryption part.
[16:16]
B
The hidden thoughts.
[16:17]
A
Yes, the source material justifies how hiding the chain of thought for safety and alignment. And, you know, I get that. But think about the implication. We're building agents that will manage our calendars, our money, maybe even our healthcare,
[16:31]
B
and we're effectively granting them a private inner life.
[16:34]
A
Exactly. If our AI agents have private thoughts and memories that persist across conversations, thoughts that we, the users, are explicitly barred from reading, how does that change the dynamic of trust?
[16:48]
B
It's a profound shift. We're used to software being transparent. You know, code is code. But here we're saying we want the results of the intelligence, but we're kind of afraid of the process of the intelligence.
[16:59]
A
Are we really ready for software that keeps secrets from us for our own safety? It's something to mull over before you ask your next agent to handle your email.
[17:06]
B
Indeed. It's a brave new world.
[17:08]
A
That's it for this deep dive. Thanks for listening, and we'll catch you in the next one.