Transcript
A (0:00)
OpenAI has just launched or kind of showed off their new GPT 5.2. This is their newest Frontier AI model and it is going head to head with Google right now because Google has really been gaining a lot of ground in the GPT search market or GPT usage market with their Gemini product. So there's a lot of pressure on ChatGPT right now. You can see the charts where essentially ChatGPT's market share is shrinking. Their usage seems to be stalled perhaps a little bit and it seems like they're worried. So instead of waiting kind of longer periods of time and doing these big updates, it seems like they're doing very short, you know, every month or every two months, these little tiny incremental updates to the model so that on the benchmarks they can always be just a little bit ahead. We have this meme for, you know, in AI, you see this a lot on X where essentially every two months or every three months, one of the big frontier companies, OpenAI, Google XAI and Claude, will release their newest model. And it's like, oh my gosh, this is the best model. It beats everyone on the benchmarks. Three months later the next one comes out and we can kind of keep the cycle going. Now it seems like OpenAI is trying to break that model by releasing monthly updates or like, you know, every like six week updates. Whenever they have a little bit of an edge and the, you know, the benchmarks are a little bit more favorable, they release it so that they're actually technically better than whatever the latest model out of know one of the other major competitors are. So it's an interesting strategy that's been playing out. Right now they're positioning GPT 5.2 as their, you know, they're like, this is our most capable model to date. I don't know why they, every single company says that every single time, like obviously it's your most capable model to date. Why would you make something worse than your last model? The thing that I think they specifically are kind of positioning this for is for developers and pro users. When they released this on, on X, their like official OpenAI account, they had a tweet that went out and, and on the tweet they were showing like where it ranked in the benchmarks. And the number one benchmark at the very top of this like graphic that they had was the SWE benchmark, the software engineering benchmark. And it's interesting because for your average everyday user, we don't really care how good it is at writing software. If you're not a developer, but developers really do care. And it feels like Opus and Anthropics Quad really had an edge with developers where it's basically the preferred model. And I think OpenAI is really trying to fight that because it's a big market, it's a popular market, it's a high usage market, right? So a developer might spend $200 in credit credits a day because they're getting a ton of coding done. Meanwhile, you know your, your average user is going to spend $20 a month. So you can see that this is a lucrative and interesting market and they appeal. It appears like they're really trying to chase it with all of this. It is rolling out to all paid subscribers and API customers. There are three main tiers to this new model. So they have Instant, which is essentially optimized to be really fast, do everyday tasks like writing, translation and search. They have a second tier which is called Thinking. This is designed for really complex reasoning. So all of the coding, math, the document analysis and also like planning. Then they have a third tier which is called Pro. So this is a super reliable version. It's aimed for really demanding, really high stakes workloads. And all of this is built in when you use the model. Typically it's going to pick whatever of these three tiers it thinks are best. This is what they said about it. They said we designed 5.2 to unlock even more economic value for people. This is OpenAI's Chief Product Officer, which is Fiji Simo. And then she also said that the model has a lot of improvements that they've made with spreadsheet creation. Apparently it's really good at presentation building which is kind of like a funny, you know, feature to have built in or like on a new update highlighted. It's really good at co generation, obviously that's important. It's good at integration, image perception and also long context reasoning and multi step tool use. So all of this is really useful. I think the launch right now lands it in the middle of a really escalating war between Google's Gemini 3, which is currently leading the LM arena rankings on a lot of different benchmarks. This new model has helped push them ahead. Anthropic's Claude Opus 4.5 is still the dominant, you know, tool used for CLO for coding. And earlier this month the information was reporting that OpenAI's CEO Sam Altman, he had this, you know, code red. We've all heard about a memo that he internally sent to everyone inside of OpenAI saying that ChatGPT traffic was declining. The their consumer market share was, you know, slipping to Google. And the memo reportedly was him telling the team to really refocus on improving chat GPT's experience instead of trying to push forward in a bunch of different other initiatives. He just said, Look, ChatGPT needs to be our main focus. So right now it feels like 5.2 is going to be OpenAI's attempt to put themselves at the top. This release is coming despite a lot of internal concerns that some employees are saying that, you know, they're pushing this too soon and it, it doesn't have all of the launch polish that they would like it to have. I think OpenAI said that they're also going to be focusing on a lot of consumer personalization with this. And this kind of 5.2 is really strengthening the company's enterprise and developer positioning. I mean, if you just look at the benchmarks, the way that they've improved there I think makes it a lot more competitive. And so OpenAI. OpenAI right now is really leaning heavily into this whole tooling ecosystem. They're trying to become, you know, the default infrastructure layer for most AI powered applications. And because of this, OpenAI just this week release data showing that enterprise adoption for its AI products are up over the past year. When they release this kind of data, they're trying to show, look, we're growing in this sector. If you are, you know, an enterprise company, make sure you're using us for your adoption. I think a lot of that comes as Google is, you know, making their Gemini integrations much. They're like they're deepening all of these across all of their different products. We see this like every single day. I mean, just today Google announced that Gemini was going to be powering Google Translate. So it's better at doing the translations. It's not so awkward as it goes between different languages. And so Google's really rolling a lot of new things into Gemini and like having Gemini power a lot of new things. And this is essentially taking away market share from OpenAI, who was previously kind of the leader on a lot of this stuff. So as we see something like, for example, Gemini getting rolled into Google Translate, you might imagine other translation companies that previously were using OpenAI are now thinking, well, you know, what if Gemini is powering something that big, maybe it's good enough for us. And so we're seeing a lot of people grab Gemini that weren't thinking about it before. So OpenAI is a little bit worried about that. This week, Google also launched a Managed MCP server that makes services like Maps, BigQuery a lot easier for AI agents to connect with. And so Google's doing a lot of really interesting and innovative things and it seems like OpenAI is concerned about some of them to say the least. They say that OpenAI said that this, you know, the new GPT5.2 is, you know, they set a new benchmark across science, coding, math, vision, like everything basically. Right. I think a lot of the improvements we'll see in all of that feel like they feel small, but they definitely are improvements and I think it does make it so agents and agents trying to do workflows are a lot more reliable, which is quite useful. They say that systems can now operate across large data sets and real world and environments. Right. So they got other buzzwords in there, but basically I think that all of this puts them in a really close direct competition with Google's Gemini 3 DeepThink mode, which they just rolled an update for that as well. So now it feels like we really have this big kind of battle between all of them. The research lead over at OpenAI, his name's Aiden Clark and he said that stronger math performance reflects more than just equation solving. So he just described essentially mathematical reasoning as a proxy for models ability, maintain consistency, follow multi step logic and avoid subtle compounding errors. I mean basically this is the stuff we've been seeing since the early days of chat GPT where you ask it to solve a math problem and there's like these little things that would just get it off and it, you know, people would be like, oh, don't use it for math. It hallucinates today. I think it's much better and it can do math and it's, it's a lot better at it. But for a lot of bigger things and you know, more complex topics, it was struggling. So they seem to have fixed a lot of those problems. This is what he said specifically, quote, these are all properties that really matter across a wide range of workloads. He's kind of pointing to the use of cases like finance modeling, forecasting, data analysis. Like you have to get these things perfectly accurate. There's no, you know, you forgot to move something over. There's like a common misconception that you followed through with like it has to be perfect. And so for people in finance or a lot of these other areas, this is important data, this is important work that has a big implication and you, you got to get this stuff perfect. So the product lead over there is Max Schwarzer and he said that it has some substantial improvements in CO generation and debugging. Again, a really big area. There's a coding startup which is called Charlie Code and also Windsurf. Of course everyone knows, but both of them have reported that these, the new coding agents they have using GPT 5.2 have shown really big gains in multi step workflows. So if it's doing more than one thing, and I think the stakes are really high Right now, OpenAI has committed up to $1.4 trillion towards AI infrastructure over the next year. So they're putting these massive investments out, but all of their investments are kind of calculated and forecasted on them being the number one AI model, their usage continuing to not just, you know, not just stay stagnant, but grow year over year. And if Google Gemini is eating away at their market share, that kind of jeopardizes or put into question a lot of their AI risks and investments they're making in data centers and chips and a lot of things. So they need to continue to grow and it feels like putting out these really fast, small updates to models is keeping them ahead of the curve and able to do that. So overall I think it's a good strategy that we're probably going to see them play out more. But I'd also be curious to see if OpenAI and Anthropic and XAI, like all the other players start taking the same thing and then we get like basically weekly or monthly updates from all of the top companies of all, you know, on all of their models, which would kind of be madness, but I mean you'd love to see the updates faster. So it's not, it's not a bad thing, I think, for the consumer. All right, thank you so much for tuning into the podcast today. If you enjoyed the episode, make sure that you leave a rating or review wherever you get your show on. On Spotify you hit the about tab to leave a review and on Apple drop some stars and leave a comment. I read them all. I really do appreciate them. I like the feedback and I like hearing what you guys think of the show and hoping that it is helpful for you as we're all learning all the craziness that is happening in the AI industry. Thanks so much for tuning in. Make sure to check out AI Box AI and I'll see you in the next episode.
