Transcript
Host (0:00)
OpenAI and Google have both announced that they have received gold medal scores in the International Math olympia. That's the IMO this year in 2025. Now, I want to break down exactly what this competition is because it might not be as, I guess, intense as some people would think it is. And. And like, I'll also preface this by saying this is really impressive, but I'll break down exactly what it is, why they've achieved this. And I think most hilariously, I cannot do this podcast episode without covering all of the drama that is actually going on right now between OpenAI and Google. Google throwing some massive shade over at OpenAI for the way that they announced their results, which technically seems to have been against some rules and stuff. So anyways, we're going to get into all of that. Overall, these AI models are progressing very fast, I think is the big takeaway. But I do need to explain this, I think for people to fully understand the context of the situation here. But before we get into that, if you want to try all of the latest models out of OpenAI and out of Google, I would love for you to try my own platform, which is AI box AI. So I actually have the top 40 AI models. We have text, we have image, and we have audio models. You could chat with all of them in one chat thread. So it's like ChatGPT, but you can switch between Anthropic, Cohere, Deepseek, Google, Meta, OpenAI, XAI, tons of other AI companies, including a bunch of image generators you may never have tried. And. And the nice thing that I love to use it for is when I'm having a conversation, I'll use ChatGPT sometimes to do some sort of analytical breakdown of something. But then I like the tone that Claude comes up with. So just right in the same chat, you just switch the model to Claude and all of a sudden Claude is talking with you and it will, you know, finish the conversation, look at the context of what you talked before. You can switch between tons of different models. You can also get it to regenerate the response. So if you're like, hey, I want, you know, anthropic, Cohere, deepseek and Google all. To answer this question, you can regenerate the response with all of the models and see which one you like the best. You can compare them side by side. So a lot of really cool features. I'll leave a link in the description if you want to try it out. It's $20 a month, so you save a ton of money Instead of having to have subscriptions to all of these platforms. But you can go check it out. Currently in beta and I would love to hear your thoughts on it. All right, let's get into everything going on between OpenAI and Google. So this kind of all started with a tweet which is from. Well, actually backing up a week ago almost or last week, OpenAI announced that they had achieved a score. So basically that means they got four out of five questions quite correct on the Math Olympiad this year. And so four out of five is a gold medal score. These are really complex problems and this is usually competed on by high school students. And so, you know, people in their. The end of high school are going to go and kind of compete in this. Okay, so this is. That was last week and then we just got the news that Google also got a gold medal. And when I heard this, I was like, oh, that's funny. Like, Google must have, you know, heard OpenAI did it and then they tried to compete. I wasn't super aware of the competition and didn't realize they, they're both competing at the same time in the same competition. Just OpenAI, I guess, wanted to get a leg up on Google and announce the results first because it made big headlines and now it almost feels like a tag on that. Google's like, hey, we did it too. Google evidently is not happy to feel like they are a tag on to this. So this is Demise Hasabi. He is a Nobel Laureate. He's a co founder and CEO of Google DeepMind. If you probably know him, this is what he said on Twitter. He said, official results are in. Gemini achieved gold medal level in the International Mathematical Olympiad and advanced version was able to solve five out of six problems. Okay, sorry, not four out of five. Five out of six problems. Incredible progress. Huge congrats to I'm Thang and the team. Okay, so he also said we achieved this year's impressive result using an advanced version of Gemini deepthink. So. So advanced means not everybody has access to this. If you're going and using it's, you know what? This is exactly like. This is exactly like when car companies will make, you know, they'll, they'll show off in, you know, Mercedes will have this absolutely insane car that they built and it goes in races and sets all sorts of records at that. What is it? The people are going to hate me for it. But it's the, that famous racetrack in Germany. The, the Nombre. I know that's not the name case of you y' all can Hate me for forgetting the name of that, but the famous racetrack in Germany that everyone sets the records at, they do not send production cars to that racetrack. They make all sorts of insane modifications. And I remember when Porsche and Tesla a few years ago were having this big competition there, and Tesla was like, look, we can make this electric car just to, like, beat all the records and. But they sent this, like, Tesla Model S plaid with a whole bunch of customizations that were not in, you know, your standard Teslas, and it went inside a bunch of records. But the thing is, it's like we show off the absolute fastest our car could be, but they never actually put that car into production. Right? And so it's kind of like annoying for consumers. This is exactly like these AI models. They're like, there was an advanced version of Gemini deepthink. It's like, basic people don't get it. We just like gave it tons of compute and tons of money to spend on these questions. And if we give it unlimited money and compute, it can solve really cool questions. But for your Gemini version you're using, you do not get any of these features. That was kind of funny. They said our model operated end to end in natural language, producing rigorous mathematical proofs directly from the official problem description, all within the 4.5 hour competition time limit. We'll be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers. Okay, unlike the cars, they will be giving some people this now. Google AI Ultra go look at the price tiers on that. It's not cheap, so definitely most people are not getting this, but I'm kudos to them for making it available. Okay, now we get to the drama. So basically, these models are getting way better. And I will bring up that these models have actually attempted this test before. This isn't the first time they've taken the test, so we've, I believe Google got a silver medal score last time it took this. But here's the caveat. OpenAI and Google, last time that they took this test, probably, I think, was last year when they took it. They essentially allowed the researchers to take the math questions and convert them into something that the AI model could understand. Right? So like, if there is a picture of a really complex math problem, the researchers could like, write it down in like a computerized text format that's easy to understand and do. This time they did not do that. These models were just basically, with their vision, with everything that they had, were able to look at the problems and solve them to this level. So this is actually a really big step, I think, in that, like, obviously we're not going to have mathematicians sitting there breaking down questions to make it easier for AI models to solve. That definitely feels like cheating. So this is really cool. They were able to see the problem and they solved it better than before. From silver to gold. And it did the whole thing itself. Okay, so I am impressed. Like, consider me very impressed on that. The progress. But now we get into the drama with OpenAI. Why did OpenAI release theirs a week before Google? What happened here? Demi says, by the way, as an aside, we didn't announce on Friday because we respected the IMO's board's original request that all AI labs cough. Cough. OpenAI. He didn't say that. I did share their results only after the official results had been verified by the independent experts and the students had rightly received the academic or the acclamation they deserved. We've now been given permission to share our results and are pleased to have been part of this inaugural cohort to have our model results officially graded and certified by the IMO coordinators and experts. Receiving the first official gold level performance grading for an AI system. So he said, receiving the first official gold level performance. But Open I also got one. The one thing I will say so, I mean, the first, also the second, the tie with OpenAI. You both got. You both got a gold medal status. But whatever. What I did think was interesting here. So basically what happened was OpenAI, I think TechCrunch, like, reached out to them was like, what the heck happened? Like, how come you guys did your results early? They weren't supposed to. Basically, the IMO was like, hey, like, take, come take the test with all the students and then let's, like, give the students their flowers. So this whole thing isn't just a big, you know, a big celebration about AI model progress and we'll announce whoever wins, then you can do it. Well, OpenAI didn't really want to wait for that. Um, so apparently what happened was OpenAI hired their own, like, researchers after it took the test to go and look at the results and grade it themselves. Typically, you would wait for the IMO to have their, their, you know, their researchers or their officials to go look at the, look at the content and then grade it. Um, so apparently OpenAI hired third party evaluators, three former IMO medalists who understood the grading system. So to be fair to OpenAI, like, they did a really good job. They probably paid a Lot of money. They found people that formerly worked, they use them to grade the AI model's performance. After they learned of their gold medal score, the company reached out to the imo, which then told them to wait to announce until after the Friday night ceremony. Did they do that? No, they didn't. Then they just announced it anyways because they were, you know, excited or whatever. So, yeah, Google's not really wrong here. They probably should have waited. One thing that I will say is interesting is it feels like OpenAI used to have a huge lead in the industry and it feels like it's a lot closer now. Google has roared right up next to them. We have XAI that, you know, according to the benchmarks. People get mad at me all the time and go read some, like, YouTube comments. I mentioned recently that XAI was as good as a PhD researcher on a lot of topics, which is basically what their benchmark results showed. But in any case, people got mad at me. I understand that all the controversy around xai. I think they do a lot of dumb things. I think they're doing some smart things. Whatever, don't get mad at me. Don't shoot the messenger. Their benchmarks are really good. They have the best benchmarks. It doesn't mean it's the best in the real world. You all can argue that, but it has the best benchmark. So there's a bunch of these top companies and it feels like they're a lot closer. Now, I will give the caveat that OpenAI allegedly is going to release GPT5 soon, but I don't put too much weight in who the hearsay of who's going to release what next because I like to just play with the cards that are, you know, the cards that are on the ground. I feel like I've been disappointed by a bunch of big announcements of, oh my gosh, this latest model, GPT4.5 is going to be insane. And it wasn't as crazy as you might have thought or hoped for, or it was too slow to be too useful or whatever. There's all these different things I'm not saying pointing fingers at opening eye all the companies. So I will say it feels like the industry is quite matched right now and every company that releases the latest model is the best model. We just had xai. Before that it was Google, before that it was Anthropica. Before that it was OpenAI. So it seems to just kind of go in a circle, but in any case, so we'll have more, but it feels like it's pretty Evenly matched right now, no one is like leaps and bounds ahead of anyone else, and you would hope not. The only one I feel like that is leaps and bounds behind is Apple is like a million miles under the sea, 10,000 leagues under the sea. And then we have, of course, Meta, who feels like they're lagging behind, but Zuckerberg is spending billions of dollars. The latest is that he offered a researcher, or $1.2 billion in AI researcher over four years from OpenAI to come join Meta's super, super intelligence team. And apparently they turned it down. Now, not all of them have turned it down. Many have actually left OpenAI to go over to Meta. Basically, Meta has to look at how, what the value of the OpenAI shares are that that person has that if they leave, they'll lose the vesting. They're like, I can't. It's the golden handcuffs, right? They're like, I can't leave OpenAI because I have like half a. You know, I have $500 million worth of shares that are going to invest over the next five, four years. And Zuckerberg's like, that's fine, we'll just. We'll just pay you for whatever those are worth. Just come over. Just get started now and you'll, you know, double your money and get paid for those shares. So it's been pretty crazy. And so I expect them to catch up quickly based off of the money that they're spending on it, but they're definitely not, I feel like at the head of head of the pack right now. So it feels like a lot of these companies are pretty close. And. And while it doesn't really matter who announced it first, OpenAI or Google in the same week, they basically, the models had the same results. So they both have gold medals now. Beyond all the drama and who's best, whatever, I think this is a really good point to just say, these models have progressed so much faster than I actually anticipated in the last year. This is very impressive that on this particular test they went from silver to gold, and to be able to do it all by themselves, we're going to see some amazing advancements. I'm really excited. Hey, if you enjoyed the episode today and if you learned anything new about these AI models and basically what's going on, I would love for you to leave a review of the podcast. It's the number one way you could say thank you if you've got anything out of it. I make these podcasts. It's a ton of fun, but it really helps me stay motivated to keep making them when I see your guys amazing reviews and subscribe over on YouTube if you're watching it there as well. So thanks so much for tuning into the podcast. Make sure to check out AI Box AI for the top 40 AI models on one platform for one price. You don't have to pay for subscriptions for every single AI model and I will catch you in the next episode.
