
Loading summary
Gael Breton
Chatgpt just got a massive upgrade for all users with the release of the GPT5 family of models. I just spent 12 hours extensively testing the models from the perspective of an online business owner and have seven tests to show you that show really surprising results. Now this release is also a massive deal in the API space, as OpenAI has made these models very affordable and beating pretty much anything else. But here's the twist. This model still hallucinates a lot more than I thought it would, and I'll show you some exact examples where it literally tried to deceive me. So if you want to know everything you need to know about GPT5, stay tuned. Hey everyone, welcome to the Authority Hacker podcast. Today I'm alone because Mark decided to have a baby when we were supposed to record this podcast. But don't worry, I still nerded out on GPT5 as much as I could. That's why I probably have a little bit of tired eyes here. And before I say bad things about the model, because everyone's saying such good things and I want to also highlight the limitations, I want to say this is going to be a bit of a game changer for most people that use ChatGPT in a very normal way that are not power users. And one of the main reasons for that is that as you can see now in the ChatGPT app, you don't have the insane model selector anymore. You literally just have GPT5 and then if you're paying, you have GPT5 thinking. But to be honest, you don't even need to use the thinking mode because I'll show you this discussion here. When you ask a simple question, GPT5 does not reason, it just uses the normal GPT5 and, and gives you a quick answer. And honestly, the answers are just the right amount of length and the tone is a little bit more emphatic. And it's said to have more EQ as well. But when you actually start asking more complex questions, like for example, my follow up chat here, that said, okay, I want to code with GPT5, I use cloth code. What should I do? It's a bit more of an open question. It needs to think a bit more about how we do that. You see, I did not really do anything and it triggered the reasoning mode and it started thinking deeper and then tried to give me a better answer basically. So that's really powerful because now you can just use chatgpt with without thinking do I need 04 mini or do I need 03 or do I need 4.0 and every single time you're going to get the best possible answer. And if you really want to force the thinking, you can actually just select the thinking model and you can use that. Now, in terms of the limits, for the normal GPT5, you get 80 messages per hour. So for most people is going to be enough. And for the thinking mode, you only have 200 messages per week. So make sure that you don't just keep this on because you're going to run out and it's not going to be able to trigger when you actually need it. My recommendation would be just stay on GPT5. Another big news is that the GPT5 model is actually available to absolutely everyone, even if you don't pay for ChatGPT. The caveat to that is that you get smaller context window and you don't get the thinking mode, so you just get the basic mode and that's good, but it's not as good as like the O3 model was or something like this. Now, in terms of how smart the models are, I mean, so far the benchmarks that came out, GPT5 is basically crushing everyone, including Gemini 2.5 Pro, that was at the top of every LMRFINA, which is a benchmark website here. So you can see for text, it's the best, for web dev, it's the best, for vision is the best. And it doesn't really have the rest of the benchmarks done yet because it just released, but you also can see on artificial analysis that it's topping even Grok 4, and that literally, even the medium version of GPT5 is literally above everything else. So it's very, very, very smart. The speed is decent when the API is not slowed down, as I will say later. And the, yeah, it's just, it's the best model out there in terms of benchmark smartness. Now, benchmarks are one thing, but the vibe is another thing. It's like, how nice is the model to use? Like, for example, Groq has very good benchmarks, but it kind of sucks to use. Actually, in my opinion, this model feels very good because it does exactly what you want without doing something else. It just does what you want and it follows the instructions very well, which feels very good to use as a user, especially when you do complex forms. Now, one thing I want to talk about though is that OpenAI claims that it has a lot less hallucinations. And across literally like five chats that I had this morning, two of them had pretty bad hallucination cases. So I'm going to show you some examples here. I actually messed up my initial prompt, which is, can I connect MCPS to a ChatGPT plus account? Because I want to use GPT5, but I want to use my MCPS as well. So I asked ChatGPT, Can I do this? And I said MCP. It understood what MCP was. So even though the hallucination came later, you can see it knew what I was talking about. And it says, okay, you can do it on the desktop app. You need to configure a file called MCP JSON, blah blah blah. And it's like, do you want me to show you how to do it on Mac? I'm like, yep, let's do it. And it just tells me a bunch of stuff. And then none of that stuff actually exists. So I ask it, like, after it wrote all the guide, which, by the way, this step by step guide looks very good, I really like it. I was like, can you confirm this is actually real with web search? And it did the web search and it's like, actually MCP does not exist on the ChatGPT app, there's no native integration, I was wrong, etc. And that's literally like one of two or three chats that I had this morning where I did that. So I would not be surprised if we have a bit of a hallucination gate coming in the next few days, because I think while it's a very smart model, hallucinations are still a problem with it. On the topic of integrations though, while we're here, I really like ChatGPT with GPT5 right now. The problem is that it does not connect to many things. Like you can connect it to OneDrive or to Google Drive or something. It's a bit shit. Basically, they just announced that you'll be able to connect it to Gmail and Google Calendar very soon, but it still feels a lot like a walled garden. And after you've tasted something like cloth code connected to literally every single system you use, even the increase in smartness of GPT5 does feel a bit limiting because you're essentially becoming a copy paste machine between whatever app you're using and chatgpt to do anything. This reminds me a little bit of the Apple ecosystem, where you're essentially, for example, locked into Siri and it cannot connect to other apps and it cannot do the things that you want. But mostly because the company has decided to keep that walled garden. I'm a bit afraid OpenAI is going in that direction and While the experience of ChatGPT is very delightful, the same way iPhone is very delightful. Once you've tasted like open environments, like cloud, for example, it's quite difficult to come back to this. And I'm not sure I will stick to using ChatGPT as my chatbot. I think I will stick to my cloud code terminal. Now, Another thing that GPT5 is supposedly pretty good at is writing. And I think that's a use case a lot of you guys have, whether that's writing emails and we'll see an example of that in my tests, or helping you craft better social posts, that kind of stuff. And so I've been testing it a little bit and here's like an example from my social writer. You can see it has like a bunch of documents that it actually uses. And here's the post that it wrote for me. The first thing that I noticed is literally the fifth world still has an EM dash, so it still uses EM dashes a lot. You can see they're like literally everywhere here. It really likes them. But let's read the copy a little bit at least. The beginning GPT5 just landed and it's a huge deal for way more than just take notes. I just spent the last 12 hours hammering it with seven different tests from the perspective of an online business owner who actually runs stuff with AI every day. Some blew me away, some not so much. And then the good stuff, the not so good what I actually tested, I think it's okay. But to be honest, as a social post, I think there's a lot of bullets. I don't think I could post this on LinkedIn, for example. So I think it's better than Foro was. It looks like I asked it to even make it authentic. Like the first version was even more kind of robotic. But I still think it has a lot of like, what I call AIisms with like, a lot of stuff that would make you tell it's written by AI. So without, like very specific instructions, it might not be great. But I think with very good instructions, it might come close to something that you might want to use on your site. This is with very little instructions. Now, that's the ChatGPT side of things. So for people who just use ChatGPT, really the thing that changes is like, you don't have to fight with the models. And overall it's a big upgrade from four. Zero, no question, even though I can kind of nitpick here and there. However, for anyone who uses things like N8N or who uses the API on typing mine, for example, one very big deal is that the GPT5 API is actually very affordable. It's priced exactly at the level of Gemini 2.5 Pro, except it actually is priced at this price for the entire context window, which is now 400,000 tokens, whereas before it was 128,000. Whereas Gemini actually gets more expensive after 200,000. So it's actually cheaper than Gemini 2.5 Pro. And if you look at the intelligence, you can see that 2.5 Pro is still pretty up there, but actually 2.5 high and medium are both beating it by a fair amount, actually. And not only that, but the real main event in my opinion, is not the main GPT5 model that you get in ChatGPT, but actually the GPT5 mini model. Why? Because it's very cheap and it's only a bit less smart than GPT5. We're talking like 20% on some of the benchmarks. I saw it was just slightly below Gemini 2.5 Pro. But again, look at the price difference. We're talking five times less for input and for output, we're talking also five times less. So it's a model that's very cheap. And even when you compare to Google models like for example, 2.5 flash, which would be the direct competitor, you can see that the input price is actually also cheaper Here you can see that we are actually 0.3025 and then 2.5 for Google and then $2 for output for GPT5 mini. So you get a model that's much smarter than Google's models for much less money, actually. And when you compare to Entropic, we're not even close here. So Opus 4.1 was just released and you can see the price. $15, $75. We're like miles ahead, seven and a half times higher on output than the normal GPT5. And even Sonnet doesn't even come close. All right, now that we've talked about the API and ChatGPT, we want to actually test the model and see how well it performs. So I've done a few tests that I want to show you. I've compared GPT5 with GPT5 mini, with Opus, with Sonet, with Gemini 2.5 Pro, etc. To see how much better it is at real world tasks, rather than just whatever benchmarks that we see. So here's the first one that I did, which is essentially a difficult situation email to respond to. So it's a pretty long prompt. But the point is that you receive an email of a frustrated client that says I spent $8,000 three weeks ago for this marketing automation system. My team keeps asking, well it's ready and it's not ready and we have our meetings, et cetera and what the fuck are you going to do about this basically? So the point was who's going to write the best answer email with the best tone. It's quite important for this kind of high stakes emails with tension and everything. So the first one I got was 2.5 flash write up. I'm not even going to read it. But look at the world blocks here. There's no way I would send this regardless of how it's been written. Then next to it we have GPT5. I think it's like the fact that it's not saying like Dio for example. It's already a big one for me. In this case there was not so many em dashes actually, unlike ChatGPT, I understand the bold pressure you're facing and the optics. Here's the technical reality and the path forward to hit Thursday ship the core automation provide. Na na na na and it just goes into the problems but it actually talks like in a pretty good business way in the sense that you're not being overly deferring to the person that you're talking to even though they're your client. But at the same time it remains professional, it doesn't go too far, even says hey, I'm going to give you this or whatever. And they actually prepared a call for me and made me an analysis of the situation. So for example, it said that it first validating the statue ego and the escalate immediately, then talk about the stakeholders, then talk about the legal navigation. Overall pretty good email. Then next to that I put GPT5 mini because again I think MINI is going to be underrated but I think that EM dash here already kind of kills it for me. And again, big world blocks here. Not my favorite. I did nano, it looked terrible. So don't use nano for writing. Then I used Opus, the very expensive model from Anthropic. I think it's okay. But again this feels like a lot more AI written than GPT5 felt and I think that's where they talk about the improved writing. I think there's nothing wrong with this email, but if I receive this I will suspect AI wrote this. Whereas if I receive the GPT5 email here I might not suspect this is AI written. The fact that there's a lot of variation in a paragraph length, the Fact that there is bullets, but not too many bullets. And yeah, it just says goodwill for example. It doesn't say as a sign of goodwill. In my opinion, this feels a lot better. The writing style is actually pretty good. I think the one I want to compare it to now that Opus kind of lost on this sonet is probably going to be quite similar. Yep. But 2.5 Pro was my favorite writing model until yesterday. And if I check the email, I think it's probably the second best. But again, you know what I was talking about when I was saying it's a little bit too deferring to the client and a little bit too like when you work with companies, people want you to speak directly to them, not overly respect them. And the way it does the sentences here. To ensure we hit this critical deadline, we must focus all resources on the core engine to support your board presentation. It just feels a little bit too distant for me. When you're working with a team, I would not talk like that. I would talk more like I'm talking to you right now. So this is still pretty good. I put deep SEQ on the right as well. It's not bad, but too many bullets. I think GPT5 wins that tone one and by quite a margin. In my opinion, provided the information is correct, I could almost send this email as is. So that's very good. The second test that I did was actually an ad analysis. So I uploaded a bunch of data from our meta ads from a past launch. Essentially the goal for the model is to identify what's winning, what should I do more of. And right off you'll notice that OPUS did not even process. Why? Because Opus, it's not written here, but it's limited to 200,000 tokens. So when you upload big CSVs, all that kind of stuff, it does not work. And so the only two high end models that could compete here was Gemini 2.5 Pro and GPT5. And surprisingly, Gemini 2.5 Pro decided to do a web research to translate the dollars into gbp basically for some reason. And now in terms of the actual output, again, Gemini 8.5 Pro feels very again defearing and very expected. It feels like AI the way it rises. Whereas GPT5 just gets to the point without overly writing things. And actually in the API, in GPT5 you can actually control how verbose you want it to be. But in this case I did not change it. So I guess that's the default settings. But you can see it just tells me like hey, image 5 new price is the best creative. You made 190 to 205 sales for £7,700, spent very good ROI, etc. Essentially, which markets are the best and then who is the best demographic, 25 to 44 years old, 55 plus is expensive, blah, blah, blah. So, like, honestly, if I'm a CEO, I just want this. Basically, this feels like it gives me less information. You can see there is less numbers and almost overly formatted, trying to be too nice. So I really feel like GPT5 is more like an actual coworker and the way a coworker would talk to me. Whereas Gemini 2.5 Pro is kind of like the LinkedIn or like the stock image professional that you might imagine working with. It's not bad. It's still a very good model. It still was my favorite until yesterday, but I think for that kind of work, I will give it to VT5, because as you see, everything's backed by the numbers that are in the spreadsheets. Whereas Gemini talks a lot without giving me any backup numbers. Like the budget bleed, for example. Like where? How? Like large number of countries. Which countries? That doesn't tell me what to do. Whereas every single number is backed by data on GPT5. The next test I want to show you is actually the Vibe coding test, because that's very visual and I think a lot of you guys are going to like that. So what I did is I asked the three best models, so 2.5 Pro, Opus and GPT5 to create a retirement planner. Like, it's pretty basic and it's all the same prompt. And I tried to ask a pretty complicated one where it does, like, Monte Carlo simulations and all of that. And here's the calculators I got. So the first one here is Gemini 2.5 Pro. It's functional, but it's very obviously generated by AI, basically. And so it works. Kind of. Does it? Yeah, actually, let's see, 5.7%. And then if I run, I'm not sure the code actually works. Yeah, the code does not change the predictions here. So actually this calculator did not work. To be fair, I know Gemini 2.5 Pro can make a working project like that, but I think what's most important is the way this looks. I think it could make it work, but it looks pretty basic. Now let's look at what OPUS did. OPUS is by far the best coding model until GPT5. And here is the retirement calculator they did for me. Let's see if it works. Let's see the numbers. Let's see. I put a lot in my Roth ira. Yeah, it actually calculates everything and gives me some kind of projection over time. But you can see that projection is actually not so great. I wanted to see my success rate, which is here. But normally this Monte Carlo thing just shows me some kind of graph where I see the upper end, the lower end, the average, et cetera. And I see it try to do it here, but it's not super good. So classic cloth, purple everything. Yeah. So it's okay, but it's not super great. Now let's see what GPT5 did, so we can see what the canvas look like. And you can see this one looks a bit of a notch above. There is still some glitches, like you can see. But, like, all my buttons have, like, effects and stuff. Like, again, remember what Gemini 2.5 Pro did? But the question is, does it work? So let's change the numbers. Oh, the numbers change live, right? Oh, yeah, they do, actually. So if I move the cursors or if I'm like, oh, I want to be aggressive, it does all of that. And here I kind of have, like, my projection with the upper end, the lower end, success probability, and the distribution. Now, again, it's not perfect. You can see there's a little bit of random gradients here and there, and I wish I could see the numbers when I hover here. It wasn't cut here a little bit, but I think the base. OpenAI wasn't a good coding company. Nobody would consider OpenAI for coding before yesterday, and now this looks actually on par, I would say, with opus. I'm not sure it's much better. Let's look at the Opus one. Yeah. I mean, in terms of design, I prefer the OpenAI one, but there are lots of glitches also on the Opus one, so I would argue that it might even be slightly better. And 2.5 pro is distanced. This is so much more basic than what OpenAI and Anthropic have done. So Google is behind now for coding, actually. So overall, pretty impressive that this model can both write the best copy, but also vibe code. Like, literally potentially the best ui. Like, I would argue the UI is the best out of all of these, even though there are bugs, like, quite a bit. Now let's look at the next test. Next test was actually a product launch plan, and so this prompt is essentially me asking AI to plan a product launch. So we imagine that we launch a cohort program where we train people for six weeks on implementing automation in their company. Very, very appropriate for us right now. And then I asked GPT 5, Claude, Opus, Claude Sonet, because Sonet is used by a lot of people and Opus is so expensive in the API that almost nobody's going to use it and Gemini 2.5 Pro to make a launch strategy. And so the prompt asks for a specific format and so on. And essentially each model came back with their thing with essentially a six week timeline, marketing channel plan. So for email, what are the sequences of email that you send? You can see a teaser failure playbook, launch optimization, launch plus application case study, roundup, blah blah blah. Actually did all the structure for every single channel, which is pretty good and gave me some success metrics. Now it's going to be very hard to evaluate this on the podcast. So what I did is I actually gave these answers back to GPT5 and asked it to grade it for me. And so I did a scorecard and it claimed GPT5 was the best, which I can believe. I think it's a pretty good answer when I read that one. It was pretty good with Opus behind. But Again, Opus costs 7.5 times more for the API and it's still not as good. Gemini 2.5 pro just behind and Sonnet as the last one. It's pretty accurate I think, in terms of the quality of the models, so I'm not too surprised of that result. But yeah, it's a good planner. I've read the GPT5 thing and I quite like that. It's quite specific like last chance plus decision cyclist plus five minute fit, cold shot. This is actually something I could do on a launch. For example, I would actually pick ideas from GPT5, whereas usually when you check the other ones, if we check the week by week, for example, you can see that for example, Opus is just like last chance messaging, 48 hours deadline. It doesn't really give you the very specific stuff that GPT5 gave you. GPT5 gives you an angle for the thing, whereas this is the most generic thing like last chance 48 hours. So I think, I think GPT5 has more personality and makes things feel more human in a way. Okay, the next test that I've done is actually the LinkedIn post test. I think this one is okay because it's easy to evaluate basically. Again, the goal was to brag about some client success on LinkedIn and see how cringy the models are going to make it. And I gave it some examples of my recent social posts so that it kind of like takes some stuff from it and again I put deep SEQ in there but deep Seq in general feels very AI written at this point. I just wanted to see what a good open source model does. Not so good anymore. So let's check the GPT5.1 and I love it because actually it did a cringy AI burrow stuff where it says if you're juggling three to five tools losing leads, comment blueprint or DM me blueprint and I share the exact five step checklist I used to cut the response time by 50% no fluff. It's like this is cool because it actually feels real life. Whereas if you check even like let's not check GPT5 mini, let's check like Opus, which was probably the one that was considered the most kind of like the best writer. And you can say it raw a lot more. And I think the storytelling at the beginning might be a bit better. Like One of my B2B SaaS clients was bleeding leads through four different tools. Marketing was manually uploading CSVs. Sales was chasing ghosts. Everyone was pointing fingers. I would argue this is actually a bit better. The CEO told me we're working harder than ever but closing less deals. Sounds familiar. Sound familiar? Is very AI. Here's what we did in eight weeks. Honestly, honestly like the body of the post I would argue might be better on Opus here for this very specific case. So it's like I think GPT closed the gap, but it doesn't always win. Now Again, Opus cost 7.5 times more to write and then if you check the Sonnet, which is the one that is maybe more comparable in price, it's not super good then I would pick the GPT5 one actually. So yeah, who's going to use Opus in their pipeline if we check the 2.5 Pro though again, 100k new pipeline in 90 days. That's what happened after one of my B2B tech clients stopped drowning in four different marketing tools and a CM manual task. The problem was clear leads were falling through the crack between marketing and sales. The handoff was slow, costing them 25 hours a week in wasted effort and worse, losing deals. Here's what we did differently. Honestly, not too bad. I would say 2.5 pro fights here. I would probably put it on maybe on the same level as GPT5. I think GPT5 was okay here, but it wasn't. I think the two bullet lists are not ideal but it does feel like more straight to the point that I like actually. So overall like, it's good, it's competitive, but I would argue some other models can actually give it a run for its money, but for kind of like social media writing. And finally, the last test that I did was to give it a meeting transcript that I had a few weeks ago and essentially extract all the important data so that we could use that to update our task list, et cetera. And so for this, I would argue GPT5 is probably actually overkill. The models I'm looking at here are more like GPT5 mini because it's cheap. And Gemini 2 points. I have Flash mostly because it's stupid to pay so much in API for these kind of tasks. And so what I like on GPT5 mini over Gemini 2.5 flash is it's a lot less verbose. It just goes directly into the key decisions and agreement without writing a whole paragraph, like Gemini did before. In terms of action items, it looks like they got it like 2.5 flash got 5 of them and GPT5 mini actually got 6 of them, so it maybe caught a little bit more as well. Now, in terms of the timeline, I would argue Gemini did a better job because it actually broke down in three phases, whereas 5 mini actually just hit phase one and then just stop there. And so, yeah, next steps, I would say they're quite similar and follow up. Email. They wrote an email. And just looking at this, this does not match the tone of the call at all on 2.5 flash. It's much more verbose and not very well written. Whereas I would say the 5 mini is good enough, I would say 5 mini is good enough to write emails for me, but with the right guidelines, whereas 2.5 flash is a bit shit. So, yeah, overall I would say GPT5 mini looks a lot better than 2.5 flash for less money, actually, and quite a bit less money because the cash tokens are actually cheaper. So overall, I think this is the most lept on model. GPT5 mini is going to be my workhorse for automation, for API, et cetera. And then GPT5, I'm going to use it a bit for coding and so on. But overall it's an Excellent update from OpenAI. Whether you're just using ChatGPT for work, that's going to be a big upgrade, or whether you're using the API to do a bunch of stuff, it's also a big upgrade, mostly because the price is very good. And the best part about this is that everyone gets to benefit from it because it's also available to free users. So if you're not paying for a chatbot, or even if you are paying for another one like Gemini or Claude, it's probably worth reinstalling ChatGPT right now just to use GPT5 because it's a good model. That's my messy notes and tests on GPT5. I hope you enjoyed it. Let me know in the comments if you enjoy this kind of format and what you think of GPT5. Is it going to change the way you use AI, or is it just a minor upgrade for you? Don't forget to like and subscribe and I'll see you again in two weeks.
Episode: GPT-5 vs Claude vs Gemini: 7 Brutal Real-life Tests
Release Date: August 9, 2025
Host: Gael Breton
Gael Breton begins the episode by announcing the release of GPT-5, highlighting its significance for online business owners and marketers. He shares his personal experience, stating, "I just spent 12 hours extensively testing the models from the perspective of an online business owner and have seven tests to show you that show really surprising results" [00:00]. Gael emphasizes the affordability and competitive edge of GPT-5 in the API space, noting its potential to revolutionize how businesses utilize AI without overwhelming tech complexities.
Gael discusses the streamlined user experience with GPT-5, comparing it to previous versions:
Despite the advancements, Gael points out that GPT-5 still experiences hallucinations. He shares a specific instance: "In two out of five chats, I encountered significant hallucinations where the model provided misleading information" [Timestamp Not Provided]. This highlights that while GPT-5 is highly intelligent, users should remain cautious and verify critical information.
Gael conducts a thorough comparison of GPT-5 with Claude and Gemini, focusing on several real-world tasks:
Email Response Test:
Ad Analysis:
Coding Test (Retirement Planner):
Product Launch Plan:
LinkedIn Post Creation:
Meeting Transcript Analysis:
Gael highlights the cost-effectiveness of GPT-5's API:
While GPT-5 excels in many areas, Gael expresses concerns about its integration capabilities:
Gael concludes that GPT-5 represents a significant leap forward for both general users and developers:
Overall, Gael Breton praises GPT-5 for its intelligence, versatility, and cost-effectiveness, positioning it as a game-changer for small businesses and marketers. However, he advises potential users to be aware of its limitations, particularly regarding hallucinations and integration flexibility. Gael encourages listeners to experiment with GPT-5 and share their experiences, fostering a community-driven understanding of its impact on AI-driven business operations.
Notable Quotes:
This comprehensive summary encapsulates the key discussions, insights, and evaluations presented by Gael Breton in the episode, providing depth and clarity for listeners and those who haven’t tuned in.