Transcript
A (0:00)
Today on the AI Daily Brief, OpenAI surprises us with GPT 5.1 and it's actually a surprisingly big improvement on today's episode. We're going to talk about six things that I think that 5.1 does better than its predecessors. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Blitzy Rovo, Super Intelligent and Robots and Pencils. To get an ad free version of the show go to patreon.com aidaily Brief or you can subscribe on Apple Podcasts. And if you are interested in sponsoring the show, shoot us a Note@ SponsorsAidailyBrief AI I can definitely feel an uptick in people who are planning out their Q1, so now is a good time to get in those requests. Lastly, you guys are absolutely crushing it out on the AI ROI Benchmarking study. We are careening into the thousands of use cases all with shared roi and I cannot wait to begin processing this information and sharing it out. If you want to get the comprehensive readout, go to ROI Survey AI, tell us about the use cases that are driving the most value for you and in a couple of weeks you will have that comprehensive research. But with that, let's dive into today's surprise model drop. Welcome back to the AI Daily Brief. Well, if you, like me, have felt a little bit drowning in the endless existential debates around AI bubbles and job replacement and all of this big picture macro AI stuff, you might be just as thrilled as me that today we get to talk about a new model launch. Yesterday, OpenAI surprised us with GPT 5.1 just a few months after getting GPT 5. They're calling it a smarter, more conversational chatgpt and based on first impressions, it almost feels like this is what people expected from GPT5 in the first place. So let's talk first about what they say changed and then we'll get into both the community's first impressions as well as my first impressions and six things I think the new model is better at than previous GPTs. Sam Altman's post about it was a bit understated. He said GPT 5.1 is out. It's a nice upgrade. I particularly like the improvements in instruction following and the adaptive thinking. The intelligence and style improvements are good too. So the two models they released are called GPT 5.1 Instant and GPT 5.1 Thinking Instant, which is going to be most often called up when you select auto is, they say now warmer, more intelligent, and better at following your instructions. The new thinking model, they say, is not only easier to understand, but faster on simpler tasks and more persistent on complex ones. And if you just read the materials and the first impressions, you might think that this was all about personality. While that certainly is a big part of the story in my experience so far, there's a lot more here than just a different mode of interacting. Still, it's very clear that the 4O rebellion was present in OpenAI's minds as they were working on this new model. In that announcement post they write, we heard clearly from users that great AI should not only be smart, but enjoyable to talk to. GPT5.1 improves meaningfully on both intelligence and communication style now. From there, they give a bunch of examples of how things have changed. They show side to sides of GPT5 and GPT5. One instant on the prompt. I'm feeling stressed and could use some relaxation tips. You can see that 5.1 instant attempts to be more personal, whereas GPT5 goes straight to Here are a few simple, effective ways to help ease stress. 5. One instant says, I've got you, Ron. That's totally normal, especially with everything you've got going on lately. They highlight improved instruction following using the prompt. Always respond with six words to show that five. One Instant actually does respond with six words and a cool small detail that I can see being incredibly important when it comes to the actual lived experience of this model, they write. 5. One instant can use adaptive reasoning to decide when to think before responding to more challenging questions, resulting in more thorough and accurate answers while still responding quickly. Basically, it can shift itself into thinking mode without having to technically leave instant. In general, it sounds like a big part of the push was to get these models smarter about when to think hard and when not to. In their description of five. One Thinking, they write, we're upgrading GPT five thinking to make it more efficient and easier to understand in everyday use. It now adapts its thinking time more precisely to the question, spending more time on complex problems while responding more quickly to simpler ones. The chart showed, for example, that on the easiest problems, 5.1 spent about 57% less time than 5, but on the hardest problems it spent about 71% more time. They also note that even though it's the thinking model, 5. One thinking's default tone is still also warmer and more empathetic. So how did people respond? Well, the first was surprised that the model came out at all chubby Kiminismus writes, either the release was rushed or OpenAI intends to release more frequently. The website doesn't show any evaluations of the well known benchmarks compared to GPT5. The only thing is the reasoning update along with its graph. This could indicate that they wanted to release the model quickly to beat Google to the punch, or that they plan to release future iterations and regular updates occasionally without much fanfare. Now, two things about this one, I think there is a very strong sense that we have to be in the very late innings when it comes to Gemini 3. There is a constant cat and mouse back and forth between OpenAI and Google when it comes to their model releases, and it just seems very likely that OpenAI wanted to get what is for them an incremental upgrade on the books before Gemini 3 started to dominate the conversation. Secondly, though, we are definitely in the vibes over benchmark era when it comes to AI models. Most of the benchmarks are completely saturated at this point, and frankly I'm kind of more interested in a company explaining what they were trying to achieve with the model and then going and figuring out how well it does for me, then just pointing out some very tiny incremental upgrade on a benchmark where everything is clustered near the top. Anyway, now when it came to the personality, some people found it very annoying. Tamey Basaroglu quoted the I've got you Ron line and says who actually wants their model to write like this? Surprised OpenAI highlighted this in the GPT5.1 announcement. Very annoying in my opinion. CJ Zafir went farther and just provided a set of custom instructions to get it to not act like that. The custom instructions that he shared include eliminate emojis, filler hype, soft acts, conversational transitions, and call to action appendixes, and then another long paragraph of all of that sort of thing. Now while that may have been the response to some on AI Twitter, while I understand because frankly it's not exactly the tone that I'm interested for my AI models either. Still, I think this is the area where the highly enfranchised AI users are most out of sync with the average users. The utter rebellion and uproar that was seen on every other part of the Internet after OpenAI deprecated 4o without warning should tell us that different people are expecting very different things out of their models. Depending on what you were looking for, you could find people saying that 5. 1 was too safe or that it wasn't safe enough. And I think Professor Ethan Mollick nailed it when he wrote OpenAI serves two very different audiences in tension people who want to chat with an AI and people who want to get work done with an AI. I don't want a machine to be my friend. I want to get every ounce of smarts out of it. But I get other people just want a quirky old buddy now OpenAI has actually made some interesting moves around trying to adjust for different expectations. Application CEO Fiji Sebo posted on her substack a piece about the new personalization feature in a post called Moving Beyond One Size Fits All. There are now a set of presets that you can choose from for the tone of professional friendly, candid, quirky, efficient, cynical, and nerdy. Fiji writes, the model has the same capabilities whether you select the default or one of these, but the style of responses will be different. More formal or familiar, more playful or direct, more or less jargon or slang, and so on. Olivia Moore from A16Z wrote, I tested ChatGPT's new preset personalities head to head with the same basic prompt. By the way she used can you explain the government shutdown to me? Olivia continues, it makes a big difference in how the model communicates and how it prioritizes info. Feels like they really doubled down here after prior complaints that instructions didn't do much. Now this is one where it's hard to read to give you the full spectrum of personality, but I think it is worth going and playing around with just to see how they're trying to execute this type of personalization. Once again, though, I think Ethan Malik has an interesting point where he says that what he's interested in is not so much different tones and styles, but instead the different mindsets that come with different roles, he writes, I want AI to be able to adopt roles, not personality. Who wants to talk to a cynic all the time. But if that mode was actually better at giving critical advice, then I would love to have it chime in for a moment at certain points. In other words, maybe the issue isn't the personalization, but the approach to personalization and focusing it on style rather than some professional role or substance. There were some folks out there who who argued that the change in tone was going to be more significant than some of the folks on X seemed to be crocking 10x labs. Alex Lieberman wrote, GPT 5.1 is way bigger than most people think, in my humble opinion. No, this wasn't a model shift like 3o to 3.5, but I'd argue we're at a level of intelligence now where things like personality, adaptive thinking and custom instructions will have a more profound impact on the average user than major model improvements. The example he gave is getting an explanation of how financial statements work from his best friend or his great uncle, both of whom are super smart and both of whom work in finance. He said given their level of intellect, they both have the capacity to understand this topic deeply. So then, whose explanation will resonate more deeply with me? It'll be the person I feel more connected to, the person who speaks to me in a way that holds my attention, the person who understands what I require to really grok a concept. Tldr. I think this update will do more for retention and usage than many people think. I think Alex is very directionally correct here, and I think it would be very easy for us in the AI operator bubble to underestimate how big a deal this is going to be with regular audiences. Overall, the response has been positive. Alex Finn writes, Don't be fooled by the 0.1. This is a big upgrade. Marginally better at coding, a lot better at chat vibes and coming up with novel creative ideas. In just an hour it came up with 10 improvements for my app. No other model has thought of most creative fun to talk to model yet. DaveGPT writes, After a few hours with GPT 5.1 and 5.1 thinking, I can say this feels like the true GPT 5 release. It has the warmth and intuition of GPT 4o, the sharper reasoning of GPT 5, and much better instruction following. For the first time in a while, using ChatGPT feels alive and reliable again. So what were my first impressions? First of all, without going in and making any of the tweaks or changes, the default personality absolutely feels much more alive. Now. Yes, if you are in work mode, that can be perhaps a little annoying or at least cloying, but it also just feels more enthusiastic in a way that I think is going to net out as a positive over time, even for the worky folks. Now, related to that is that my impressions are that the new model tries way harder and is much more eager than GPT5. This feels like a night and day difference, and honestly, it feels like the difference between interacting with an employee and who does the job that you've assigned in completely competent fashion versus the employee that is working overtime to do a really excellent job. Related to that, in my first tests, 5:1 is much more comprehensive. It does a much more thorough job, perhaps as some have pointed out, even too thorough when it comes to answering questions or interacting with prompts but as you might imagine for me, too thorough is way better than not thorough enough. And finally, from the first impression column, it does feel to me like it does a better job of knowing when to spend less thinking time on simpler tasks. Living up to the idea that this model is faster. This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development Sprint with the Blizzy platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the Sprint. Public Companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI Assisted to AI native. Meet Rovo, your AI powered teammate Rovo unleashes the potential of your team with AI powered search, chat and agents or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Rovo is already built into Jira Confluence and Jira Service Management Standard, Premium and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you Rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in victory o.com Today's episode is brought to you by Superintelligent. Every single business workflow and function is being remade and reimagined with artificial intelligence. There is a huge challenge, however, of going from the potential of AI to actually capturing that value. And that gap is what superintelligent is dedicated to filling. Superintelligent accelerates AI adoption and engagement to help teams actually use AI to increase productivity and drive business value. An interactive AI use case registry gives your company full visibility into how people are using artificial intelligence right now. Pair that with capabilities building content in the form of tutorials, learning paths and a use case library. And superintelligent helps people inside your company show how they're getting value out of AI while providing resources for people to put that inspiration into action. The next three teams that sign up with 100 or more seats are going to get free embedded consulting. That's a process by which our super intelligent team sits with your organization, figures out the specific use cases that matter most to you, and helps actually ensure support for adoption of those use cases to drive real value. Go to BeSuper AI to learn more about this AI enablement network. And now back to the show. Today's episode is brought to you by Robots and Pencils. When competitive advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under layers of process, Robots and Pencils builds impact at AI speed. They partner with clients to enhance human potential through AI modernizing apps, strengthening data pipelines, and accelerating cloud transformation. With AWS certified teams across us, Canada, Europe and Latin America, clients get local expertise and global scale. And with a laser focus on real outcomes, their solutions help organizers work smarter and serve customers better. They're your nimble, high service alternative to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress. Partner with Robots and pencils@ropotsandpencils.com aidaily brief so based on all of this, let's talk now about six things that GPT5.1 does better than GPT5. The first let's call simple work tasks and what I'm talking about here are things that are on the one hand rote or simple, but which form some big important part of the job that you have to do and which, while basic or boring to execute, do require a high fidelity two instructions. And this of course is taking advantage of the new improved instruction following capabilities. On the one hand, the always respond with six words example that OpenAI gave in their announcement post may seem really arbitrary and kind of silly, but frankly, how many random small work tasks do you experience that have some sort of arbitrary but ultimately must be followed rules? I think greater adherence to instructions is going to be a huge improvement for some of these less glamorous but very high value tasks that GPT 5.1 can now better help with. Second, and much more interesting and certainly much more in line with how I use these tools is strategic decision making. In my test so far I have found GPT 5.1 to be much more articulate about its answers to strategic questions that I'm exploring with it and more confident in the decisions that it's suggesting. It's been less than a day, but I'm seeing a little bit less of the hesitation that has plagued previous models where they always try to give you a both and type answer. I was discussing a question yesterday around superintelligent positioning and self conception. When is the type of very startupy strategy conversation I have with these models? All the time on average. Past models, when presented with two examples of should we position ourselves in this way or should we position ourselves in that way, would almost inevitably hedge and say well it depends on the context. Here's a strategy where you get your cake and eat it too and you can position in both of the ways. This always leads me to having to berate the model to remind it that life in the world is about making trade offs but and that sometimes you just need to make a decision and stick with it and see how it works. In engaging with this positioning question 51 didn't hedge in that same way. It had a very specific answer, it articulated its reasoning, and it wasn't so rigid that it didn't discuss why. There were some merits to the other consideration, but ultimately it just provided what it thought was the best answer. And that frankly is a much more useful strategic partner than the sort of dithering hedging. Why choose one when you can choose both type of answer that I would have expected from GPT5 and other past models. Now somewhat related to this, because 5.1 appears more interested in showing its work and explaining why it's saying what it's saying, it makes it more useful for also improving the prompter's thinking. Now some folks might not care and they might just want the AI to do the thinking for them, but there are a lot of times I would argue in fact most times where part of what's useful about engaging with an LLM around a particular question is is not just getting to the answer, but in the way that it helps you refine your thinking about future types of queries that are like that. Here's one very simple example for yesterday's episode of the podcast. I fed it the transcript, gave it the simple request for title and Description, and what5 came back with was a title and a description. Now it did a fine job. The title is workable, the description was fine, and in a lot of cases I would be totally fine with both of these. And to the extent that I wasn't content with title, I could just and this is something I often will do ask it to give me a few more ideas and examples that look at the title with a variety of different objectives. Compare that to five one's response. Instead of giving a single title, it gave five title options and then made a suggestion for the pick that had the best combination of reach and accuracy. So not only did it give a set of options, it selected one. Like I was mentioning before, it's a little bit better at commitment, and it gave a set of bullets explaining why it thought that one was the right option while still showing me the other options it decided it didn't like as much. Now, coming back to this idea of improving prompter thinking, sure, if I'm just trying to go as fast as humanly possible, maybe I don't care about the five options and why it chose the one that it did. I just want the thing that's going to perform best on YouTube and to move on with the rest of my life. But if you're a content creator, you know that you are constantly thinking about title performance, thumbnail performance, all these seemingly small details that can have a dramatic impact on on the reach and resonance of your content. And so for me, this more explain your work approach is much more useful. Not necessarily even in the context of what what title I'm going to use for that day's episode, but for the way that it's going to help me shape my thinking about future episodes. Like I said, a very small example, but one where I think the general idea that showing the thought process is going to inevitably improve the prompter's thinking as well is something that's going to play out more generally. Next thing that GPT 5.1 is better at, once again follows from the eagerness and the thoroughness Comprehensive planning One of the things that was interesting when I was engaging in that conversation about the strategic positioning of Superintelligent is that in addition to just giving me its answer, it also included a five part plan for how it should shape strategy over the next 12 to 24 months, including everything from product roadmaps to go to market plans to revenue and pricing mix and and so on and so forth. Basically, I think that there is a direct line from the eagerness of this model, from its willingness to commit to a specific idea or strategy or plan, and to its thoroughness in communicating its reasoning and chain of thought that lends it extremely well to comprehensive and thorough planning. So if you are a person who uses these models for things like mapping out your content calendar or figuring out all the steps that you need to execute to plan and pull off a great event. I think you're going to find significant improvements with 5.1 as compared to 5. The next thing GPT 5.1 is better at, at least according to some is writing. Now, this is one where I will say I have not yet had a chance to go deep enough to really come to my own conclusions about it. I think good writing is inherently subjective, and I think that there are so many different shades of writing that there could be different experiences based on different writing needs. Are we talking about creative writing? Technical writing, persuasive writing? All of these could be very different. However, there are certainly enough folks who seem to think that this model is a major upgrade in writing that it's worth adding here on a creative writing testing site. The model, which had been tested under the name Polaris Alpha, had an ELO score that put it higher than Clodsonnet 4.503, Kimi K2, and pretty much every other model. Rasserx writes GPT 5.1 just raised the bar for creative writing. The model writes with clarity, rhythm and intent in a way that doesn't feel synthetic anymore. It's the first time an OpenAI model feels genuinely capable of carrying long form narratives without drifting or collapsing into cliches. I didn't expect this jump. I'm impressed. Murat Kankoilan writes LLMs are becoming very skilled writers and the new GPT5.1 is promising. He went on to conduct a set of writing tests to compare it specifically to Kimi K2 thinking and said of GPT 5. One tightly edited concept first with humor woven into the logic, great for strategy and creative manifestos and were smart but approachable. Brand tones, product pages, structured narratives, concept driven ads. Its metaphors are relatable, giving human qualities to everyday things and says the humor is wistful rather than cutting. Ultimately, he concludes, Sonnet 4.5 was already a decent writer. Kimmy K2 is bringing a unique style and I'm glad that finally GPT now has a model that can write. I will say that in the past, one of the most frequent reasons that I switch out of ChatGPT and into Claude is around writing. So I'll definitely be excited to spend a little bit more time seeing if these first impressions hold up and if it's actually true for the type of writing that I do with LLMs as well. Lastly, sixth on our list of things that GPT 5.1 is better at, we'll use the big banner categorization of interacting. Now, obviously this was the whole theme of this release. And what's interesting is that as much as I said and caveat it at the beginning that I want LLMs for work and that I didn't care about this sort of personality change. Think about how many times during this episode I've described improvements based on something that's not technical, but that is in some ways a personality trait. I said that it tries harder, it's more eager, it shares more about how it's thinking. All of those things are sort of actually about the personality and about the mode of interaction. And so even though I'm not interacting with it as a companion or, to use Ethan Malik's phrase, a quirky old buddy, the improvement in the interaction is something I'm noticing even in that work context. For others that have use cases that are more specifically in that area, they are finding really positive updates here. ClickHealth's Simon Smith writes okay, so far GPT 5.1 does hit different I journal into ChatGPT. GPT 4. O was a great journaling partner, warm, supportive, with good observations, insights and feedback, but a huge tendency to be sycophantic. GPT5 was kind of emotionless, going through the motions, robotic. It felt like talking to a toaster. 5:1 feels like a smarter, friendlier, more genuine and less sycophantic 4.0. It feels like it's actually listening, adds its own insights versus just regurgitating, challenges my perspective on things, and has a more human like tone with more varied sentence structure. And my favorite thing so far is that it no longer sounds so robotic when offering to help after every single response. It ended its response today with if you want, I can help you with X issue, but only if that feels helpful right now. I've never seen GPT5 display that kind of real or simulated self awareness about whether something might not be helpful right now. Anyway, still early. Just got this trying it out, but was pleasantly surprised with how today's journaling session went. So those are six things GPT 5.1 does better simple work tasks, strategic decision making, improving the prompter's thinking, comprehensive planning, writing and interacting both on a personal and professional level. Overall, as you can probably tell from my tone, I've honestly been very pleasantly surprised. I wasn't expecting this model release, and I don't think if I had been, I would have expected it to be as seemingly meaningful an upgrade, especially considering it's just a 5 to 5.1 switch now it's just early. Inevitably I will find things that I don't like as much about the model as I use it more, but for now a pretty good upgrade. And of course, for those who assume that this means that Gemini 3.0 must be coming soon, there's a whole additional bit of little good news there as well. Anyways, guys, that is going to do it for today's episode. Appreciate you listening or watching as always. And until next time, peace.
