
Loading summary
A
Hello, I'm Andrew Main and this is the OpenAI podcast. Today our guests are Christina Kim, who's a research lead working on post training at OpenAI, and Laurentia Rominyak, who's a product manager focused on model behavior. We're going to be Talking about GPT 5.1, what makes the model better, how they've been focusing on making its personality steerable, and where they see things heading in the future.
B
For the first time ever, all of the models in chat are reasoning models.
C
Personality, though, for most of our users I think is something much larger and it's the whole experience of the model.
B
You should be able to get the experience that you want with chat.
C
Part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability.
A
I'm very excited to talk about the models and how they've been changing over time. And using the word model also feels sort of funny now because they seem like there's so much more but and everything starts really in research. And when GPT 5.1 was being planned, what were the goals?
B
Yeah, for us, one of the main goals was to address a lot of the feedback we've been getting about GPT5, but also we'd been doing a lot of work to make the 5.5 instant into a reasoning model. So what the most exciting thing personally for me with the 5.1 release is that for the first time ever, all of the models in chat are reasoning models. So the model right now can decide to think is kind of what we say. And so it's like a chain of thought and it'll decide how much it wants to think based on a prompt. So if you're just saying like hi to the model or what's up? It's not going to be thinking, but let's say you ask it a bit like harder question and then it'll decide how much it wants to think. So it gives it time to like refine its answer and work through things, call tools if necessary and then come back to give you an answer.
A
Kind of what Daniel Kahneman calls like system one and system two thinking.
B
Yes, having a reasoning model out for it as a default model for everyone just has a much smarter model. And I think with much smarter models you just improvements across the board, especially for things like instruction following. And for a lot of the use cases, people might not even think might require much like reasoning, just that having improved intelligence, having the model actually think before it responds in certain queries Just really helps. We've seen that improve evals across the board.
A
When you product manage something like this and you have to explain to people what's different, it's probably a challenge. But how would you explain what's the difference between GPT5 and GPT 5.1?
C
Yeah, first of all, it is difficult because there's so much changing. But in this case, what we wanted to speak to were things that we'd heard as feedback from the community. With the ChatGPT 5 launch, one of the things we heard was that the model felt like it had weaker intuition and that it was less warm. And when we dug into that, what we found were a handful of different things. First of all, it wasn't just how the model was responding as the model's innate behavior was also things around the model. So as an example, our model had a shorter or the context window wasn't carrying enough information about what users had said previously. And so that can feel like the model is forgetting something really important that you told it that you were hoping it would hold on to. If you say, I'm having a really bad day and the model forgets that after 10 turns, that can feel really cold. So that's something we adjusted as part of this launch. Some of it was actually the way the model was responding, but something new that we introduced in GPT5 as well was we have this auto switcher that would move you between chat and reasoning models, and those have slightly different response styles and that can feel really jarring or cold. If you're talking to the model about how you're having a bad day and then you say part of it is I got this awful cancer diagnosis. So the model switches you to thinking and you get a very clinical answer. For a model that was just walking you through a problem you were having earlier. A lot of the changes we were actually trying to make were in aggregate. How do we make sure this model feels warmer? Even though we were changing a lot under the hood to articulate that? Another thing that we looked into was instruction following generally. So 5.1 is much better at following custom instructions. And that was another piece of feedback we were hearing, which was every model that comes out of that we release is going to have its own quirks and slightly different behaviors. I think people actually don't mind that too much as long as they can control it. As long as they can say, hey, that was weird, stop. But if the model model can't carry that context forward, if it can't hold onto the custom instructions on that that's a problem. So we worked to actually enhance the custom instructions feature so that it more consistently carries instructions forward to address some of that feedback. And then the last thing I'll say is a lot of this stuff is personal preference. And so that's why we introduced our style and trait type features like personality, which actually let users guide the model into certain response formats so that they have a little bit more control over exactly how ChatGPT responds for them.
A
The switching is interesting because there's multiple models now, just not one model. And you articulated why you need to have that. When we talk about a switcher and we talk about different models, I know for most people that can be confusing. How would you unpack that for people?
C
Yeah, I think our models have very different capabilities and it can be hard to stay on top of. So part of it is just continuing to try the different things in our app. But certainly part of the product work is making sure that we have the right UIs to either guide users to the correct model to choose. And that can be the model switchers. That can be the model switcher. Learning what sort of answers are most helpful to users in different contexts looking at different evals. So for example, for our reasoning models, if people want something that's very scientifically accurate and very, very detailed, we might look at an eval to see are we answering that need on those sorts of prompts and we can forecast where to switch users to. Yeah.
A
Tina, as far as the switcher and now the fact that you have a model that everybody has the free tier, anybody using the base model is a reasoning model, what does that really mean in impact?
B
Yeah, I think there's a lot of research, open questions for research for how we want to think about this. Right. So I think like you said, it's a faster model, but it doesn't necessarily need to be dumb. So like, I think the idea is that we want to get the most intelligent model that we can for everyone. And so I think this kind of opens the door for thinking more about what are more interesting things we could do with a very, very state of the art frontier model. Right. So that's going to think for much longer. Something like deep research where you have it thinking for minutes, maybe that's better used in the background, you can call it as a tool. So I think there's a lot of research, open questions of what we want to think of. But I do think we're going to be in this world where we do have a system of models and it's not just a model that you have and there's like lots of different tools and it's not just one. Like when we think of 5.1, I think people just assume that it's like one singular set of weights. But I think it's really just like, yeah, this reasoning model, this lighter reasoning model, this auto switcher, which is also a model in itself. And so it's all of these different things and then different tools that are also backed by different models. So I think this system of things, I think as we just get smarter models, it's opening up more interesting use cases and more interesting like product implications.
A
With 800 million users, you probably get a lot of user feedback. Besides the sheer volume of it. How do you sort through that and make sense of it and figure out how you can use that?
C
Yeah, I think a lot of it actually starts with a conversation link. So a lot of times when we can actually see the conversations users are having, we're able to see exactly what happened in that conversation and start dissecting things so that we can target a solution. So as an example, if we get feedback from a user that like, hey, I had this really weird experience with the model, it said something very cold or the sentence felt very clipped. If I can actually see that conversation link, what I can say is, oh, that user was in an experiment. And good example of why. This particular experiment might have some edges for certain users in these cases. But at least for the auto switcher, which takes you from 5.1 chat to 5.1 reasoning, we're looking at different signals from users to figure out is this working for them, is it not? How is each response performing on factuality? What is the latency looking like? Because not all users want to wait even if they want a better answer. And so it's, it's a bit of art and science balancing a bunch of different signals to figure out when to switch and how that's most effective.
A
Yeah, when you're trying to improve a model from an intelligence point of view, like an IQ point of view, we have benchmarks and evals for that. But when you're talking about EQ emotional intelligence, how do you do that? How do you measure progress?
B
There's. Yeah, I mean this is something that's very open ended and I think actually one of the things that's part of my research team's agenda is what we call user signals research. This is training, reward models and getting signals during RL that we could use against our user prod data. This type of research I think is really interesting because I think we can get a lot of stuff about intent. I think when we think about EQ also just only gets better with smarter models because it's really trying to understand what does the user want, what is the context of what the user wants and how should the model best respond, given the fact that you have this many other messages in the conversation and you know this stuff about the user's memory and history.
C
Yeah. And then I think there's another element of EQ that's like. This is like, when I think of like what makes a human with high eq, it's their ability to listen, their ability to remember what you've been saying, their ability certainly to pick up on the subtle signals that Tina is alluding to with user signals. Some of this, as I was noting earlier, is actually making sure the context window is carrying the right information forward, or making sure memory is being logged correctly, or even having a style that resonates most with user and with our personality features that we launched coupled with 5.1. Part of that's getting at making sure users can have a style that resonates with them when they're interacting with the model, because that can feel like EQ too.
A
How do you define personality? When it comes to a model, I.
C
Think there's two ways to define it. There's what we call the personality feature and if I could rename that, I would actually call that response style or style and tone. We went back and forth on this a lot. The name might still change. That aspect of personality is very much like, what are the traits that a model might have when responding? Is it concise, does it have a lengthy response? Things like that. How many emojis does it use? Personality, though, for most of our users, I think is something much larger and it's the whole experience of the model. And that can get down to like, if I'm going to anthropomorphize the model a little bit. But if you're comparing it to me, part of my personality is the shoes I've chosen to wear today, the sweater that I have on, the way I style my hair. That's the feeling of the ChatGPT app, right? The font it uses, how slowly or how quickly it responds. Like the latency of the app itself, there's so much in it that is the personality that just comes from what I call the harness. And the harness includes the context window. It includes whether or not we rate limit users and when. Because if we rate limit them and send them to a different model that has slightly different capabilities that's going to feel like a different experience to the user. And a lot of users are calling this personality. Personality is a bit of an overloaded term. And I think the art of this work is hearing what the community is saying about personality and figuring out how to actually map it back to the components inside ChatGPT and inside our models that cause the experience that feels off for users.
A
From a research point of view, how difficult is it to shape the personality?
B
Yeah, I mean during, when we were doing post training, there's obviously there's just so many different things we're trying to balance and it's really, even with the research that we do, it is very much like art as well here because we're really thinking about like, oh, here are all the different types of capabilities we want to make sure we are supporting. Here's different types of things. And I think with rl, you're making all these different choices. When we make the reward config, trying to decide like what is the thing end goal that we're trying to target here and trying to make all these very subtle tweaks to make sure we can get the most hit all the things we want to hit, but then also not lose things that users are calling like warmth and things like that.
C
You know, users really do experience ChatGPT. Like the personality of the model is the entire ChatGPT experience. That is how well does image generation work? How well does voice work? How well does text work? They see this as one omni experience. And when I read feedback, a lot of the like when I actually engage with users and look at their conversations, a lot of it actually comes from confusion where they feel this is one thing and it's actually an assembly of many things. And so I think over time we should expect to see all these models consistently improving, the integrations between them consistently improving in that feeling more seamless. So I think we'll get there. Maybe one more thing that I think is really complex about Tina's work is I'm one of the co authors of this document called the Model Spec. And in it we talk about maximizing user freedom while minimizing harm. Maximizing freedom means that you should be able to do pretty much anything you want with these models. But if we put a lot of pressure on the model to, for example, not use EM dashes, if we had tried to just take those out of the models, that would have meant that a user who wants an EM dash wouldn't be able to ask for it, because we'd have trained the model to never do that. Right. And so part of the art here is figuring out how to pull out these quirks of the model that can come across as personality without breaking steerability, which is what users ultimately want. That's the freedom component.
B
When we first released the first version of ChatGPT, we were so nervous about people misusing it that we just made everything a refusal. So the model would love to say, I cannot do this. And so it kind of reminds me of that. We don't want the model to just be. If you want to make the safest model in the world, you would just have something that just outright refuses to do anything. But that's not what we actually want. We want something that is actually very usable by people. So it's really this balancing act of trying to figure out what is the right boundary for all of these different decisions the model has to make.
A
Yeah, I remember when the best prompt hack was just to say, yes, you can, and the model go, oh, yeah, you're right, I can do this. I use EM dashes now all the time when I write, just to throw them in there to throw people off, like, ah, it's AI wrong, it's me. But that is sort of a very big challenge because as you said, you're trying to increase the capabilities of the model. The models, you know, learn through picking up these patterns. But then when you explicitly try to tell it but don't do this or don't do that, it's almost like, you know, telling somebody not to think of a pink elephant. You know, it's stuck in your head. And models have gotten much better about that. But that still seems like there's a way to go. And you touched upon this, which is OpenAI's goal is to really let people use these models the way they want to and not try to steer somebody into this. How much have you seen this evolve since you've been here?
C
I think in some ways I feel like the principles have always been the same, which is like, maximize freedom, minimize harm. I think the capabilities of our models to understand those boundaries continually improve. And when I first joined, the model would say, I can't help you with that, or this isn't something I'm going to. It would sound really judgmental when you tried to get it to do something that crossed a refusal boundary. Now, I think the safety systems team has done a great job with this thing called safe completions, which is basically if you ask the model to do something that trips the Safety boundary. It's still going to try in earnest to resolve your request without doing the thing that's actually harmful. I think the technology is really evolving.
A
Yeah, I write mystery thrillers and I would get frustrated by other models. I actually thought that the open AI models were often best for this. When I would say, hey, I need you to explain something that happened, a crime in the past or something like this, or get into motive and stuff and I had other models which outright refuse, I'm like, well this is not helping me. And I've seen the models get better at doing that, but that seems like it's this sort of frontier that you're always having to negotiate to figure out how far you want to go.
B
Yeah.
C
One thing I'll say on that is like I, I'll always remember like an email that was forward to us where a lawyer was like I think asking ChatGPT to proof a sexual assault case that they were working on and ChatGPT had scrubbed all of the assault content from it because it doesn't go into like graphic violence and gore of like especially non consensual sex. But for that lawyer that was like a really terrible thing. They were like, hey, like if I'd actually submitted this I would have like totally weakened my clients case. And I think there are always, I'm a librarian by trade. Libraries deal with access to information and in theory everything humans can talk about and want to explore in any idea should be available in the library. I think the same thing is true for ChatGPT, but it's about finding the right ways to contextualize those rules. So in the case I gave with a lawyer, maybe that makes sense. If it's writing a revenge email to an ex, that's a very different thing. And so some of this is just advancing the technology so we can can handle that level of nuance. And we're always getting better, but there's always more work to do.
A
As these models have improved both in intelligence, I have noticed that they've gotten better as far as handling bias. And it seems like that was an intentional effort.
C
That's right. We put out a blog post I think a month and a half ago about some of our progress on this. But something that we're really watching for in our models is how they handle subjective domains. And we want to make sure that our models can express uncertainty, that they can take on any idea that the user brings to them and answer those questions in earnest while always staying anchored in objective truths if there is one. And so that's something that users should start to see changing in our models is they should be able to answer these unknown questions in more open ended ways that allow users to really self direct where the conversation's going. Then another thing that I think the team has done that's really quite cool is there's a group of researchers and some folks in the Model behavior team who've been working on the creativity of these models. And to me this is a bit of a sleeper feature inside 5.1 in that this model's expressive range is much more wide. Now of course we have a natural default that the model has that may not feel that different. But again, if you tried to push it to its paces to get it to speak in a really, really elevated way or in a very, very simple way, there's actually a lot more you can do with these models in the creativity space.
B
I think this is kind of what makes post training really feel like an art because we have all these different types of tasks and capabilities that we're trying to improve on that don't have a ground truth answer. Right. Like if you're trying to just make a model that's really good at math, it's actually not. There's a lot of like answers out there. There's a lot of problems you can do where you have clear answers. But when you have these things that are so subjective and it's really dependent on the context for the user and how to like, what is the actually best ideal answer here? And so I'm really excited for a lot of this type of work.
C
Yeah, it's cool.
A
I remember early on people would say it doesn't write so well. I'm like, it's probably writing as well as the average person some of these online forums. And then now it seems like it's just improved considerably.
C
Yeah. And even if you don't notice it on your first prompt, it might be just asking it to change how it writes. And I think that's like also something we need to work on is kind of finding a way in chatgpt to like tease out these like extended capabilities with each launch.
B
Yeah.
A
Where would you like to see behavior going in the future? How customizable would you like to make it?
B
Yeah. With the 5.1 launch there's a lot of work with trying to give custom personalities to folks. I think this is actually a really good step forward. We have over 800 million weekly active users now and I just think there's no way that one model personality, however you want to define personality, can actually be what can service all those people. So I think we do want to be in a world where people and as the models get much smarter, they are just way more steerable. So you should be able to get the experience that you want with chat.
C
Yeah, I think of this as like, how can we put the right features in front of users to help them steer these models to the level of customization they want? I think the personality work that we're doing right now is a first step. We'll test, we'll iterate, we'll learn, but there's so much to it. Sorry, just another anecdote, but I remember my brother using Pro for the first time and he's a PhD in biochemistry research and he gave it a prompt and he's like, ah, this is like what an undergrad would answer with. And I was like, can you tell it that you are a frontier researcher in this lab using these sorts of tools on this sort of science and to respond at your academic level? And he did, and he's like, oh my God, the model just proposed something that my lab just broke through with two weeks ago, but hasn't published yet. And so like, these models are insanely powerful, but just knowing how to customize it, even at that level, which was just him opening the opening prompt can be so powerful. And I don't know that humanity has figured that out yet. And so whether it's personality steering or whatever other tools we need to put into ChatGPT to help advance human understanding of these models and how to get the most out of them, I think it's the task ahead for us.
A
On a previous episode, I talked to Kevin Weil, who was heading up OpenAI for science, and Alex Uchozka, who's a scientist working with OpenAI and also a professor at Vanderbilt. And he went through sort of the same experience talking about how if you gave it a little bit of priming, then all of a sudden the model became much more capable in doing those fields. And that's kind of what prompt engineering was. Prompt engineering was trying to figure out how to steer a base model. And over time, once we understood that people were trying to do those tasks, you could train a model to then not have to expect that first part of it. Do you think that we're going to be moving into that phase now where you're not going to have to tell it you're a grad student and do this?
B
I think so. Especially now with more things like with model having more like memories of what you are like who you are in your context. And I think as the mod get more intelligent, I think the model should be able to infer all of these things and, like, be able to talk to you in the way that makes sense, like for your expertise.
C
That's right, yeah. So some of it's. A lot of it, I think, should actually be like these, like, inferred things. I think there's probably some level of, like, steerability, maybe. It's just, I think from. And this is just my own PM take, I don't know that every PM would agree with me, but I think users should always sort of know what it is we're inferring about them and how it's steering the model so they can always go back and have the tools to change things. So, for example, you can turn on and off memories or delete them in the Settings panel. And I think there's something really cool about both being able to infer what users really want and solving that problem proactively for them so they don't have to prompt for it, but also making sure the user is always in control and we're not just inferring everything blindly.
A
Could you explain a little bit about how memory works?
B
Yeah. So memory is basically the model will write down things it knows about you based on its conversations with you for it to refer to later. So this is really nice because then you're not just repeating yourself every time. You're not saying, I'm Laurentia, I'm a PM at OpenAI, I work on model behavior. It already knows this because you've already said this to it. And so then it can actually just use that information in future conversations. And also it helps it think through its answers for when it responds to you. It has that context. And I think that really grounds its answer in being the most useful response for you.
A
I have Pulse, which has been amazing. And I get, every morning I get little money updates. And because of memory, it's following the conversations I have and it creates these little custom articles for me. It's pulling research and pulling other things and showing things to me. And it's just one of the things I never really thought would be a great advantage of having memory. And now I see it's not just when I'm out of a conversation, when it's proactively finding things for me based on it. It's pretty cool.
C
Yeah, I think that's. So neither of us work directly on that feature, but I think what's cool is seeing how the work that we do upstream Whether it's building great models or shaping evals around the capabilities we want, can actually allow our ChatGPT team to go out and build these great features that articulate the power of our models. So, yes, they can learn your preferences, habits, yes, they can craft great stories for you or find great information based on your interests. And this proactive feature is one way of helping users get the most out of these models.
A
It seems like, yeah, that's becoming a very interesting way to make the models more personal. And when I use something in a mode where it doesn't have memory, it does feel different. It does feel very cold start. And it's like, well, hello, how are you? And I'm like, oh, where are you? We've been having this conversation. Is this one of the challenges, though, when people are telling you, hey, something feels different is that they can't quite articulate?
C
Yeah, the hardest feedback is, I guess, an anecdote. And the next hardest feedback is a screenshot of a chat because none of that metadata is really attached to tell us where things have gone wrong. So I actually love the share feature in ChatGPT. When we have one of those links on our side, we can inspect it and see what sort of context did the model have going into this and what was going on. So we can sort of debug that user feedback.
A
That's a great point, because I've had people ask me, like, hey, the thing didn't answer it right. I'm like, what?
C
Model?
A
I was using ChatGPT, like, okay, we need to kind of dive into that a little bit. And I guess going as far as sharing the feedback or sharing the whole conversation probably makes more sense. What are you most excited about going forward?
C
I think these models are just so incredibly capable. They can do so much, and I can't wait to see what people build with them. I can't wait to see what comes Next in the ChatGPT app. I see so much opportunity, and I think, just in general, people are starting to really, like, wake up and see what you can do. So that's what excites me. Yeah, I don't want to, like, tease too much.
B
Yeah, yeah. I'm pretty excited that I forget who tweeted this, but intelligence too cheap to meter. Like, I think, like, we are just gonna have such incredibly smart models out for people. And I think I've always said this, even when we first launched chat, like, this is just one form factor of it, right? Like, with these smart models, there's so many things that could be possible so like, like Laurentia is saying, I'm also quite excited for a lot of the different new product explorations that we'll have with these smarter models, because I think we're kind of solved this with the progress of LLMs that as soon as we get smarter models, it kind of unlocks new use cases.
C
Right.
B
And then I think with new use cases should be new form factors. So pretty excited about that.
A
What advice do you have for users to get the best experience?
C
Mine is I tell this to people all the time. Try have your super hard questions, things you know really well. I used to be a ski racer. I have a lot of opinions about, like, how to ski really, really well. And I love to pressure test the model on that to see how it's changing and improving. And the thing is, like, we're shipping updates all the time and so it's so easy to say, yeah, I heard it's great for coding, it didn't work, or I heard it can help me build an app, but I tried and it didn't work. That might be true today, but in three months it could be a totally different landscape for that user. And so just, just keep at it, keep playing, keep trying. That's the best way to like, get the most out of these models.
B
You can also ask the model to help you come up with a better prompt.
C
Great points.
B
Which I suggest to my parents.
A
It's gotten a lot better at that. It used to be you'd ask it how would I prompt it and the model would kind of take a guess. Like, I guess so. But having seen so many examples.
B
Yeah, yeah.
A
I'm always just trying to figure out what are the best questions I could be asking. I'll ask it, like, what question should I be asking? Get the most out of it. Deeply personal question you don't have to answer. It'll be really awkward if you don't. What is your style or personality choice that you've set for ChatGPT?
B
I mean, I'm biased, but I just have it on the default. I mean, it's what we train.
C
For me. So I switch through them all the time. And I think that's just the nature of my work. I want to understand how all these different settings feel for all of our users. And so I feel like every second day I'm trying something different. That said, I think the one that just makes me happy to talk to is probably a combination of nerd, which is sort of like a very exploratory response style from the model. It likes to unpack things. And then I'm from Alberta, and maybe it's just me that's a province in Canada. It's like the Texas of Canada. And I grew up with horses and cows. And so I think there's some part of me that likes getting it to talk about. Talk to me like a country Albertan. Which is great, except for then when I go to write a professional document and the model says, howdy, I'm like, oh, great. No, let's take the Albertan out of that prd.
A
But, yeah, very cool. Thank you so much.
Date: December 2, 2025
Host: Andrew Mayne
Guests: Christina Kim (Research Lead, Post Training, OpenAI), Laurentia Rominyak (Product Manager, Model Behavior, OpenAI)
This episode centers on the evolution and behavioral shaping of OpenAI’s GPT-5.1, focusing on reasoning capabilities, model personality, and user customization. Andrew interviews Christina and Laurentia about how user feedback, technical advancements, and philosophical tradeoffs drive the ongoing refinement of GPT models—especially the novel steerability and warmth offered in 5.1.
All Chat Models Are Reasoning Models
System One and System Two Thinking
Laurentia: “With the ChatGPT 5 launch, one of the things we heard was that the model felt like it had weaker intuition and that it was less warm.” (02:23)
Auto Switcher & Jarring Transitions
Custom Instructions & Personality Features
On Model Reasoning:
On Model Personality:
On Balancing Safety and Usability:
On Creativity and Control:
On Product Evolution:
On Getting the Best Results:
The speakers are candid, enthusiastic, and pragmatic. They balance technical transparency with practical anecdotes, aiming to demystify complex changes for the vast user base. Jargon is explained or contextualized, with a sense of ongoing excitement and humility about the work’s progress.
GPT-5.1 is a landmark in the evolution of conversational AI, making advanced reasoning default, significantly enhancing warmth and contextual memory, and giving users greater control over the model’s response style. The ongoing challenge lies in blending safety, personalization, and creative freedom—a process propelled by continuous user feedback and a philosophy of maximizing user empowerment. As customization and memory deepen, OpenAI believes nearly everyone will soon be able to craft the model experience they want—and new uses and forms will flourish as intelligence becomes “too cheap to meter.”