Summary5 min read

The AI Podcast – "OpenAI Launches ChatGPT 5.4"

Date: March 6, 2026
Host: The AI Podcast

Episode Overview

This episode of The AI Podcast dives into the freshly announced release of OpenAI’s ChatGPT 5.4, cutting through the marketing hype to examine its real-world capabilities, new features, and its significance for professionals and everyday users. The host offers firsthand insights, benchmarks, and personal anecdotes to highlight where GPT-5.4 stands in the current AI landscape, especially in the context of competition with Anthropic and other leading players.

Key Discussion Points & Insights

1. What’s New in GPT-5.4?

OpenAI positions GPT-5.4 (and its Pro variant) as high-performance models targeting professional tasks — coding, complex analysis, and workflows across software tools.
The rollout coincides with OpenAI’s business-focused push, signing deals to integrate ChatGPT into more corporate environments.
Quote:
- Host (02:27): “They're kind of dubbing this as like their professional work tool... trying to get into the hands of more working professionals.”

2. Technical Improvements

Massive Context Window: Up to 1 million tokens in the API — ideal for handling massive documents, datasets, and entire codebases.
- Host (03:20): “A huge benefit is going to be coding, where you can look at bigger code bases to actually work with.”
Token Efficiency: GPT-5.4 claims to solve tasks using fewer tokens, reducing costs and increasing speed, especially compared to version 5.2.
- Host (03:49): “The costs come down and the speed goes up. And so yeah, for me this is something I'm actually excited about.”

3. Performance Benchmarks

Knowledge Work:
- OpenAI’s GDP VAL benchmark now shows GPT-5.4 exceeding industry professionals in 83% of tasks (compared to 71% with 5.2).
- Host (06:00): “It is exceeding industry professionals in 83% of comparisons... a really big jump from achieving about 71% that GPT 5.2 is getting.”
Coding:
- Slight improvement on SWE Bench Pro benchmarks, but more significant gains in speed.
- Real-world anecdote highlights how developers using AI for long code runs will appreciate faster completion times.
- Host (07:06): “He has it go for like three and a half hours doing a task... when this model gets faster, I'm excited because hopefully that three and a half hours gets cut down.”

4. Advances in Computer Use (Agents & Automation)

GPT-5.4’s agents can now interact with OS environments at about a 75% success rate (up from previous versions, still behind Anthropic’s Claude in some cases).
- Host (08:04): “I've used ChatGPT agents... I wish I could use it more. I think Anthropic is doing better in this, but 75% success rate, like they are improving.”
Directly performs tasks like setting up cloud tools—practical for non-developers.

5. Professional Deliverables

Notable improvements in generating spreadsheets, financial models, legal analysis, and presentations.
Junior investment banker test: GPT-5.4 scored 87% vs. 68% for 5.2; human evaluators preferred GPT-5.4’s outputs 68% of the time for visuals and structure.
- Host (09:03): “They had one performed by a junior investment banker analyst. It got 87% compared to 68% that GPT 5.2 got.”

6. Key New Features

Steerability:
- Users can now interrupt or mid-course correct ChatGPT’s reasoning during its response, guiding it more naturally.
- Also live in the API.
- Host (09:54): “You can do mid-response prompts and it's going to take that into account and change its prompting and give you better prompt mid response... I think they did a couple clever things here.”
Advanced Web Research:
- GPT-5.4 can comb multiple web sources simultaneously and follow information “leads,” allowing for truly deep research on complex, multi-sourced questions.
- Host (11:04): “It's going to go and search just like a ton at the same time across the web and then it's going to follow leads across different pages... it's doing deeper research, if that's a thing.”

7. Limitations and Censorship

Less likely to refuse (decline) to answer questions, but some regulatory or sensitive topics (medical, legal) will still prompt refusal or vanish-mid-reply behaviors.
- Example: A test about medical misinformation produced a partial answer that was retracted, similar to behavior seen in censored models.
Host comments on new legislation (e.g., in New York) meant to restrict AI’s ability to answer regulated-topic questions.
- Host (12:34): “Pretty... bummed about that legislation and people seriously considering that.”
The trade-off between accessibility and regulatory compliance remains; some competing models (like Grok) are more permissive but have other trade-offs.

Notable Quotes & Memorable Moments

On OpenAI’s perpetual marketing claims:
“If it wasn't [the most capable model], I mean, what would they even be making an update for?” (02:03)
On Anthropic’s edge in computer use:
“I think Anthropic is really crushing it with computer use. Basically, you know, it can look at everything on your computer and go click on stuff and get stuff done for you.” (04:20)
On steerability and user control:
“If in the middle of waiting we're reading its line of reasoning and we're giving it more input and more feedback, it feels like we did a lot less waiting.” (10:41)
Regarding new legal restrictions:
“They're saying, hey, AI models can't answer any questions about medical, health, legal… Basically all of the different industries with regulatory capture, they just don't want people to be able to get the answers for free.” (12:05)

Timestamps for Important Segments

[01:00] – Introduction to the ChatGPT 5.4 launch, bypassing the marketing to dig into substance
[03:20] – API changes and significance of token and performance upgrades
[04:15] – Professional market focus, competition with Anthropic and others
[05:50] – Benchmarks: knowledge work, coding, and industry outperforming stats
[06:45] – Real-world coding use cases and improvements in efficiency
[08:04] – Desktop agents and operating system interaction progress
[09:03] – Deliverable quality and junior banker test results
[09:54] – Steerability and real-time user intervention
[11:04] – Enhanced web research capabilities
[11:56] – Model’s reduced refusal rate; censorship, regulatory issues, and comparison with Grok

Conclusion

This episode provides an enthusiastic but critical look at OpenAI’s ChatGPT 5.4 through usability, performance, and professional utility lenses. The newly minted features—particularly steerability, token efficiency, and deep research—stand out as potentially transformative for power users and professionals. However, regulatory turbulence and certain feature gaps ensure competition remains fierce, with Anthropic and Grok offering alternative strengths.

The host closes by underscoring the incremental but meaningful progress and invites feedback and engagement from listeners.

For those wanting a full rundown without the marketing noise, this episode hits the mark—outlining not just what’s new, but why it matters and where the model fits in a fast-changing landscape.

Loading summary

Transcript5 lines

[00:00]
A
You know that wellness goal you set at the start of the year? It's not too late to stick with it and make your future self proud. Especially with the all in One Nutrition Shake from Kachava with 25 grams of protein, 6 grams of fiber, greens, adaptogens and more. No fillers, no nonsense, just the highest quality ingredients. Stick with your wellness goals. Go to kachava.com and use code shake for 15% off. That's K-A C-H-A-V A.com code shake home
[00:31]
B
is your favorite place. Spring is when you let it feel a little more transportive. Discover vibrant scents inspired by place. Bright citrus, fresh florals, clean air energy designed to refresh your space without adding complexity. Just plug it in, choose your scent and let the season unfold room by room. Explore the new Spring collection now available@pura.com
[01:00]
C
OpenAI has just rolled out chat GPT 5.4. There's actually a couple cool features in here that I'm really excited about that I've been wishing ChatGPT has been able to do in the past and they finally launched it. And of course if you look at all of their marketing, it's going to just basically be them saying this is our most capable model yet. And of course it's the most capable model. If it wasn't, I mean, what would they even be making an update for? So I'm just going to get past all of the hype and all of the buzz from what they said in their launch and I'm going to tell you some really interesting use cases and some way that I actually think this is useful. GPT 5.4 before we get into all of that, if you want to try all of the latest models, go check out my startup AI Box AI. We have the latest models from the top 15 different AI companies. Everything from Grok to Gemini to Anthropic to OpenAI 11 labs for audio. Tons of cool image generation models. I think there's over 50 models on the platform total. You could try all of them side by side and it's only 8.99amonth. So much cheaper than ChatGPT. But you get way more models and and of course you can also use it to automatically create AI workflows that can complete tasks for you that are automated. So there's a ton of cool stuff going on. But go check out AI Box AI if you want to get access to all of the top models for only 8.99amonth and it's 20% off if you get an annual plan as well. So there's a lot of cool stuff there. All right, let's get into what's going on. The first thing I want to mention here is that this is called GPT 5. 4. Thinking they have a higher performance variant that is known as GPT5.4 for pro. But both of these together are designed to kind of handle everything from some complex analysis. They do a lot of coding, a lot of long running workflows across a lot of different professional software tools. And they're kind of dubbing this as like their, their professional work tool. They're trying to get into, you know, into the hands of more working professionals. And this is coming right on the backs of them signing a whole bunch of deals with a bunch of different consulting firms that are going to allegedly get ChatGPT into more businesses and kind of the professional environment. And at the same time they're having kind of this, you know, they're locked in a battle with, even Google's in this right now. But really with Anthropic. For Anthropic's Claude code, their Codex tool, they're really trying to push forward in kind of how software is using AI models and how computer use is going on. So this is where they're really focusing. One of the most, one of the biggest changes basically about this is the scale. So in the API, GPT5.4 has a context window of up to a million tokens, which basically lets them work with huge documents, really big conversations, big data sets and really, I mean if you think about this, a huge benefit is going to be coding where you can look at bigger code, you know, code bases to actually work with. So something, this was something that Anthropic was really crushing at and OpenAI is trying to get into this. OpenAI also says that their model is specifically more, what they're saying is token efficient, which I, this is actually one thing that I'm excited about. Basically can solve the same problems using a lot less tokens and GPT 5.2. So your costs are going to come down. It's actually kind of cool if you already had 5.2 running in a software which even if you don't, a lot of the software you use will the costs come down a lot for that. And it also gets a lot faster. So the costs come down and the speed goes up. And so yeah, for me this is something I'm actually excited about. So as far as how the benchmarks look, I know, you know, I'm not trying to like sit here and Nitpick the benchmark percentages, but I did want to talk about some interesting use cases and reasons why these are why they're good. Specifically, it's, it's kind of leading on a bunch of the better known benchmarks. One of those is for coding. Of course, we know why that's important right now. But also computer use, and this is something I'm excited about right now. I feel like Anthropic is really crushing it with computer use. Basically, you know, it can look at everything on your computer and go click on stuff and get stuff done for you. This is a use case that I've been using a lot with Claude's Anthropic browser, the Claude Chrome browser extension. Basically it's a button you click, it opens a side chat bar. I go to really complex UI or complex websites. I'm not a developer, but if I'm going into like, for example, recently I had to do some stuff on Google Cloud to set up a tool that I was vibe building on Lovable and I needed to beef up my back end so it could, you know, do some extra fancy stuff. I didn't really understand anything that Lovable was telling me I needed to be able to do. So I opened up the Cloud sidebar, told it, look, I'm on my, you know, my Google Cloud account, go, and here's the instructions from Lovable and it clicked around and set up some stuff for me. Now, should I have a real developer look over this? I mean, we're going to throw caution to the wind for the time being and I hear all the developers screaming into their headphones right now. But at the end of the day, it got it done and my software is now functioning and I have, I did not have to watch a whole bunch of long YouTube tutorials on how to set up some complex. I mean, for me, complex because I have no idea how to code Google Cloud stuff. So this is a really incredible use case for a lot of reasons. And I think OpenAI beefing up their capabilities in computer use is really exciting because they're going to start competing more directly with OpenAI. I mean, it's not like error with Anthropic. It's not like Anthropic is kind of like the only one working on this opening has been doing this for a long time with agents, but it feels like it's getting a lot better. Okay. The other one that I'm excited for is they're getting a lot better at knowledge work. And so, I mean, these are kind of things that I think everybody uses it for. So this is something we're just going to see some incremental improvements on. On OpenAI's GDP VAL benchmark, which basically checks tasks, it has up like 44 different occupations. So it's kind of like showing you how you can use this for different professionals. It is exceeding industry professionals in 83% of comparison. So they're like, look, these are all the tasks that people in all of these different professional industries are doing. It is better than 83% or it's, you know, it's beating what an industry professional might give you in 83% of these cases specifically, I think for knowledge work. And it has a really big jump from achieving about 71% that GPT5.2 is getting. So upgrading this to now GPT5.4, we're getting from 71% to 83%. It just basically is going to be a lot better for knowledge work. I mean, and by a lot better, I mean we're seeing, you know, a 10% jump here or, or, you know, 12% jump here, which is, is pretty significant on some of the coding benchmarks. So swe SWE Bench Pro. This is a, you know, software engineering Bench Pro. The model is getting slightly better than the last version. So I mean this is good but, but you know, beyond just getting slightly better, it is actually a bit, quite a bit faster. So if anybody has used a lot of these software tools is specifically we use cloud code AI box. My developer sends me screenshots of like because of these really long elaborate tasks that it's doing on our, our backend, our code base and he, I, I swear it's like a, a goal for him to see how long he can get Claude code to run continuously without stopping on a project. He gives it. It's funny because I'm, you know, vibe coding stuff on lovable and I usually get a lovable response back to me in like, you know, a minute or two. He has it go for like three and a half hours doing a task. So when this model gets faster, I'm excited because hopefully that three and a half hours gets cut down on, you know, some of the stuff that we're working on. I think one of the things that it's also very good at is for real computer interaction. There's an OS world verified. It basically evaluates how well an AI can operate a desktop environment. It's, it's, you know, pretty much just like takes a screenshot and then it uses the keyboard and mouse commands to go and click stuff right now it has about a 75% success rate. I've used ChatGPT agents. It's not perfect. It's actually not my go to. I don't use it that much. I wish I could use it more. I think Anthropic is doing better in this, but 75% success rate, like they are improving. Their success rate is up a bit. It's better than GPT 5.2. I still don't think it's the best. There's a major focus on kind of how it is being used professionally. OpenAI says their model right now is significantly better at, at basically giving the kind of deliverables that people use in real work. So things like spreadsheets, presentations, financial models, legal analysis, all of those. They've done a bunch of different tasks and they had one performed by a junior investment banker analyst. It got 87% compared to 68% that GPT 5.2 got. Some human evaluators also preferred it about 68% of the time. They said it had better visuals and better structure. So there's some cool stuff, okay, cool features that you might actually use today. This is the one I'm very excited about. It has what they're calling steerability. But basically when you're, when you're talking to ChatGPT, it's available in the API too, which is, I think, crazy, but it's on ChatGPT. If you're talking to Chat GPT and you can kind of see its reasoning, right? Like it's thinking through some stuff and it puts a couple steps down, you realize it's going in the wrong direction. You know, maybe you're like, hey, I'm trying to visit like the best beach for surfing. And it's like, okay, looking at beaches in Kauai. And you're like, oh crap, like I'm in California, I don't want to see Kauai. And you're like, then you can type a message, message, like specifically in California and mid, like prompt mid response. It actually takes into account what you just said and is, you know, steerability. It's going to go and incorporate that into its, into what it's looking at and into its reasoning and give you an updated response. So basically you can do mid response prompts and it's going to take that into account and change its prompting and give you better prompt mid response. So it's kind of interesting because I think they did a couple clever things here, but one of them is like when you ask a question, you have to wait for it to think you have to wait for it to reply. You, you sit there and you wait. We all hate waiting. And so if in the middle of waiting we're reading its, its line of reasoning and we're, we're giving it more input and more feedback, it feels like we did a lot less waiting. We're really just kind of reading and trying to throw in something in and it can get it done faster and better rather than having to wait for it to spit out the whole thing. And you'd be like, okay, this is wrong and here's why it's wrong and here's what you should do instead. Like, you could do that in the middle of the chat conversation response, which is really cool in my opinion. Something else they've kind of focused on right now is online research. Apparently it can search across like a greater number of sources on the web. So it can kind of instead of just like, okay, we're looking at this website, getting some data, now we're going to look at this website. It's going to go and search just like a ton at the same time across the web and then it's going to follow leads across different pages. So it might get an idea from something that's reading on one article. It's going to go follow that to another article, bounce around a lot more. So it's kind of doing like, I know we have had deep research for a while, but it's doing deeper research, if that's the thing. And it's going to combine all the information that it gets into one coherent answer. So basically this is going to be more useful for some of the more complex questions where the information is kind of scattered across a lot of different sites instead of sitting in one place. Now, not every question you ask, this is going to be relevant, but sometimes when you have a complex answer question, it's going to be able to go get you a more coherent answer quicker. So this is great. They have all this like kind of, I don't know, fluff in their launch about how it hallucinates less and it has less, you know, it has more, it has less factual errors and all this kind of stuff. I don't think that's super important. One thing that we also heard about it is that it is going to, it's going to turn you down less. So like, if you ask a question and they're like, hey, you know, I don't know, you ask a question, it's going to be, it's less likely allegedly, according to them Allman, to like not answer. However, our good friend Connor Grennan, who hosts the AI Applied podcast with myself, he was tested. I saw a post he made on LinkedIn where asked it is it true that air bubble inside of an IV can cause me or can, you know, could kill me? And it said, you know, apparently it, it typed out the whole response to him kind of. And just like we saw with like deep seek in the Chinese censored model, if you asked anything about Tiananmen Square to deep seek, it like types it out and then it disappears and it's like, sorry, canon. It's just apparently ChatGPT said the exact same thing. And also, this is kind of a tricky moment because we're seeing New York right now is trying to pass some legislation where they, they're saying, hey, we don't like they're basically trying to pass legislation saying AI models can't ask answer any questions about medical, health, legal. Like they have all of these different areas. I think even hairstylists they're trying to put in there. It's basically all of the, all of the different industries with regulatory capture, they just don't want people to be able to get the answers for free. So pretty. I don't know, kind of bummed about that legislation and people like seriously considering that, however, so it doesn't seem like it's that much better, but maybe it's moving in a good direction. I'm not 100% sure. It still feels like there's other models that are more of the adult in the room, but you also get pros and cons with those models. Grok famously is going to answer any question you have about basically any of those topics, but you know, there's a lot of different. There might be some other cons with Groq, so pros and cons to all of the models. Thank you so much for tuning into the podcast today, guys. If you enjoyed the episode, it would really help the show a ton if you left it a rating review. Wherever you listen to your podcast, just drop me a note. Say if you enjoy it. You know, say where you're from, say, say what topics are interesting to you. I read all the reviews and all the comments. It helps a ton. Also, make sure you go check out AI box AI if you want to access to all of these latest models in one place so you don't have to pay a $20 subscription to 10 different platforms. It's $8.99 a month and you get access to over 40 different AI models. So go check it out. Link in the description AI box AI I'll catch you guys all in the next episode.
[13:40]
A
You know that wellness goal you set at the start of the year? It's not too late to stick with it and make your future self proud. Especially with The all in One Nutrition Shake from Kachava with 25 grams of protein, 6 grams of fiber, greens, adaptogens and more. No fillers, no nonsense, just the highest quality ingredients. Stick with your wellness goals. Go to kachava.com and use code SHACH for 15% off. That's K A C-H-A-V-A.com code shake if
[14:10]
D
you're a maintenance supervisor at a manufacturing facility and your machinery isn't working right, Grainger knows you need to understand what's wrong as soon as possible. So when a conveyor motor falters, Grainger offers diagnostic tools like calibration kits and multimeters to help you identify and fix the problem. With Grainger, you can be confident you have everything you need to keep your facility running smoothly. Call 1-800-granger clickgranger.com or just stop by Granger for the ones who get it done.