OpenAI Podcast – Episode 11: Shaping Model Behavior in GPT-5.1
Date: December 2, 2025
Host: Andrew Mayne
Guests: Christina Kim (Research Lead, Post Training, OpenAI), Laurentia Rominyak (Product Manager, Model Behavior, OpenAI)
Episode Overview
This episode centers on the evolution and behavioral shaping of OpenAI’s GPT-5.1, focusing on reasoning capabilities, model personality, and user customization. Andrew interviews Christina and Laurentia about how user feedback, technical advancements, and philosophical tradeoffs drive the ongoing refinement of GPT models—especially the novel steerability and warmth offered in 5.1.
Key Discussion Points & Insights
1. Reasoning Models and the GPT-5.1 Transition
-
All Chat Models Are Reasoning Models
- Christina: “For the first time ever, all of the models in chat are reasoning models.” (00:23)
- GPT-5.1 brings reasoning to all chats, not just select use cases.
- The model decides how deeply to reason depending on the prompt’s complexity. For basic greetings, minimal thought is used; for complex queries, in-depth reasoning is triggered.
- Christina: "It gives it time to refine its answer and work through things, call tools if necessary..." (01:01)
-
System One and System Two Thinking
- Andrew: “Kind of what Daniel Kahneman calls like system one and system two thinking.” (01:43)
- The model mimics fast (intuitive) and slow (deliberate) reasoning processes.
2. Addressing Community Feedback and Model Warmth
-
Laurentia: “With the ChatGPT 5 launch, one of the things we heard was that the model felt like it had weaker intuition and that it was less warm.” (02:23)
- Users felt prior versions lost track of important context, which could lead to cold or insensitive responses.
- GPT-5.1 addressed this by improving context retention and refining the way custom instructions are followed.
-
Auto Switcher & Jarring Transitions
- Previous versions switched between chat and reasoning models automatically, sometimes making conversations feel clinical or inconsistent.
- GPT-5.1 smooths these transitions for a more seamless and warmer experience.
-
Custom Instructions & Personality Features
- Enhanced ability for users to provide lasting custom instructions.
- New "personality" and style features allow users more control over tone and response style.
3. The System of Models and Product Implications
- Multiple models, tools, and switchers now function together as a system, not a monolith.
- Christina: “It's really just like, yeah, this reasoning model, this lighter reasoning model, this auto switcher... it’s all of these different things.” (06:08)
- This system unlocks more interesting use cases and user experiences.
4. User Feedback & Data-Driven Iteration
- With 800 million users, making sense of feedback involves analyzing conversation links and experimenting with UI for model selection and switchers.
- Laurentia: “A lot of times when we can actually see the conversations users are having, we're able to see exactly what happened… and start dissecting things.” (07:25)
- Product development incorporates both quantitative evals and qualitative user signals.
5. Measuring Intelligence and Emotional Intelligence (EQ)
- Technical improvements are tracked via benchmarks (IQ), while warmth, context retention, and empathy (EQ) are tougher to quantify.
- Christina: “EQ... only gets better with smarter models because it's really trying to understand what does the user want, what is the context…” (08:39)
- Laurentia: “When I think of what makes a human with high EQ, it's their ability to listen, remember what you've been saying, to pick up on subtle signals…” (09:21)
6. Personality: Features vs. Experience
- Laurentia: “There's what we call the personality feature... I would call that response style or style and tone… But personality... for most of our users... is something much larger and it's the whole experience of the model.” (10:08)
- The “personality” setting mostly concerns tone, brevity, format, emoji use, etc., but user-perceived personality includes responsiveness, context memory, app design (the “harness”), and more.
7. The Art of Shaping Model Behavior
- Adjusting model behavior is a subtle art, especially given conflicting goals: more personality vs. steerability/customization.
- Example: Avoid training out quirks (e.g., EM dashes) which some users appreciate.
- Laurentia: “Part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability, which is what users ultimately want.” (12:21)
8. Safety, Steerability & User Freedom
- Safety systems have shifted from hard refusals to “safe completions.”
- Laurentia: “Now... if you ask the model to do something that trips the Safety boundary. It's still going to try in earnest to resolve your request without doing the thing that's actually harmful.” (14:59)
- The goal is to maximize user freedom while minimizing harm—the technology is evolving to negotiate this balance with more nuance.
9. Handling Bias & Expanding Creativity
- Laurentia: “Something that we’re really watching for in our models is how they handle subjective domains. And we want to make sure that our models can express uncertainty…” (17:27)
- GPT-5.1 has enhanced expressive range and can adopt a wider array of writing styles and tones.
10. Steerability and the Future of Personalization
- Customization is becoming more granular, with personality, memory, and inferred context improving.
- Christina: "I just think there’s no way that one model personality … can actually be what can service all those people." (19:45)
- The goal is “intelligence too cheap to meter” (25:49), unlocking new use cases and interfaces.
11. Memory and Persistent Context
- Memory allows models to store and recall facts about users across sessions, making experiences feel more personal and reducing repetition.
- Christina: “Memory is basically the model will write down things it knows about you based on its conversations with you for it to refer to later.” (22:51)
- Andrew: Shares how Pulse uses memory to provide personalized updates (23:22).
- Users can manage memory (delete, toggle) for control and transparency.
12. Eliciting and Applying Feedback
- Most actionable feedback comes from rich context (shared conversations).
- Laurentia: “The hardest feedback is … an anecdote. And the next hardest feedback is a screenshot of a chat because none of that metadata is really attached to tell us where things have gone wrong.” (24:42)
13. Advice for Getting the Most from ChatGPT
- Laurentia: “Try have your super hard questions, things you know really well… pressure test the model on that to see how it's changing and improving.” (26:31)
- Keep experimenting; improvements and new features roll out constantly.
- Christina: Ask the model for prompt guidance: “You can also ask the model to help you come up with a better prompt.” (27:06)
Notable Quotes & Memorable Moments
-
On Model Reasoning:
- "The model right now can decide to think is kind of what we say... it'll decide how much it wants to think based on a prompt." – Christina Kim (01:01)
-
On Model Personality:
- “Personality, though, for most of our users, I think is something much larger and it's the whole experience of the model.” – Laurentia Rominyak (00:26)
-
On Balancing Safety and Usability:
- “If you want to make the safest model in the world, you would just have something that just outright refuses to do anything. But that's not what we actually want.” – Christina Kim (13:41)
-
On Creativity and Control:
- “The model just proposed something that my lab just broke through with two weeks ago, but hasn't published yet.” – Laurentia Rominyak, describing the model’s advanced scientific reasoning (20:12)
-
On Product Evolution:
- “We are just gonna have such incredibly smart models out for people... with these smart models, there's so many things that could be possible.” – Christina Kim (25:49)
-
On Getting the Best Results:
- “Keep at it, keep playing, keep trying. That's the best way to get the most out of these models.” – Laurentia Rominyak (26:31)
- “You can also ask the model to help you come up with a better prompt.” – Christina Kim (27:06)
Timestamps for Important Segments
- [00:23] – All models in chat are now reasoning models (Christina)
- [02:23] – Addressing community feedback and warmth (Laurentia)
- [05:12] – System of models and auto switchers (Laurentia)
- [08:39] – Measuring intelligence vs. emotional intelligence (Christina)
- [10:08] – Defining model personality (Laurentia)
- [12:21] – Art of balancing quirks with steerability (Laurentia)
- [14:59] – Evolution of safety and steerability (Laurentia)
- [17:27] – Bias handling and creative range (Laurentia)
- [19:45] – Customization for 800 million users (Christina)
- [22:51] – How memory works in GPT-5.1 (Christina)
- [24:42] – User feedback and context (Laurentia)
- [26:31] – Advice for getting best results (Laurentia, Christina)
- [27:38] – Hosts' personal style/personality choices for ChatGPT
Tone and Language
The speakers are candid, enthusiastic, and pragmatic. They balance technical transparency with practical anecdotes, aiming to demystify complex changes for the vast user base. Jargon is explained or contextualized, with a sense of ongoing excitement and humility about the work’s progress.
Summary Takeaway
GPT-5.1 is a landmark in the evolution of conversational AI, making advanced reasoning default, significantly enhancing warmth and contextual memory, and giving users greater control over the model’s response style. The ongoing challenge lies in blending safety, personalization, and creative freedom—a process propelled by continuous user feedback and a philosophy of maximizing user empowerment. As customization and memory deepen, OpenAI believes nearly everyone will soon be able to craft the model experience they want—and new uses and forms will flourish as intelligence becomes “too cheap to meter.”
