Transcript
A (0:00)
Foreign. It's been a Busy Week because ChatGPT has been throwing out some updates, Google's been throwing out some updates, and everybody's talking about moving to Claude right now, even though both the big powerhouses are throwing down. So today we're going to talk about those changes and if it's actually worth moving everything from ChatGPT to Claude. So welcome back to Bot Bros, where we cover the news and separate out the help from the hype that is going on all the time in the AI news cycle. Just for marketers. I'm Dan Sanchez and I'm joined by my brother, Travis Sanchez. Hey.
B (0:37)
Hey.
A (0:38)
And today the big leading story is ChatGPT 5.4. Now this is 5. 4. It's interesting. They didn't do a cross the line upgrade. The 5, 4 is specifically the thinking model, the more powerful of the ones you have in your typical Chat GPT account. And it was a little bit of a spec boost. Like if you look at it, you know, every time everybody launches a new model, they compare it to the other models and they're like, look, it's this much better than the leading models and they have all these evals. They do it and it was definitely better generally across the board in every way. And the one that caught my attention, there's one eval, it's called the GPT VAL or the GPT eval or something like that. And it is a measure of how well ChatGPT performs against experts. So it does expert tasks and then compares it to human experts and then has a panel of human other experts in that particular field that grade it did the AI win or did the person win?
B (1:37)
Wow.
A (1:38)
So it's competing against actual experts in a bunch of different fields. And ChatGPT is now winning a total of 82% of the time on expert level work tasks. The reason why that eval matters more to me is because that we're in marketing. Like, like marketing tasks are part of that evaluation, even though they're measuring it across multiple fields. And AI is just winning across the board. Now it still can't do cross browser tasks and all that, so you still need a human loop on a lot of this stuff. Because AI still messes a lot, but so do humans. So it's good in a lot of the evals. But let's talk about what else it got good at. One is it's got better at web research, so it's going to be more clean, it's got less hallucinations. So that's something that you know Hallucinations become less and less. Every time we update the model, it hallucinates less. Remember that used to be a big deal. Like that used to be the main thing we were talking about a year ago, two years ago, it was like, oh, hallucinates. Oh, it hallucinates. Do you hear people talk about that anymore?
B (2:38)
