Transcript
A (0:00)
What can 160 years of experience teach you about the future? When it comes to protecting what matters, Pacific Life provides life insurance, retirement income and employee benefits for people and businesses building a more confident tomorrow. Strategies rooted in strength and backed by experience. Ask a financial professional how Pacific Life can help you today. Pacific Life Insurance Company, Omaha, Nebraska and in New York, Pacific Life and Annuity, Phoenix, Arizona.
B (0:29)
Welcome to the podcast. I'm your host, Jayden Schafer. Today on the show we're talking about Gemini 3 because Google has just launched it, it has a new coding app and it has record breaking scores on a bunch of top benchmarks. But beyond just benchmarks, it has some really cool use cases that have now been enabled. So I'm going to dive into all of that on the podcast today. Before we do, I wanted to say if you want to get access to all of the top AI models that I talk about on this show and you don't want to have to sign up for different subscriptions, go check out AI Box AI. That's my own start. You get access to the top 40 different AI models from Google, OpenAI Anthropic. You get 11 labs for audio and a bunch of cool image models all for $20 a month. There's a link in the description to AI box AI. All right, let's talk about Gemini 3. So this is just fresh off of the press and it's actually coming just seven months after Gemini 2.5 was released, which as everyone knows was also a really impressive model at the time that it was launched. The new model now Gemini 3 is of course the most capable LLM that Google has built and it is an immediate contend tender for some of the, I think basically the best AI tool on the market today. It's, it's competing directly with OpenAI 5.1. What I will say in most of the benchmarks that Gemini 3 does put itself up against, to be completely fair, it's pitting itself against OpenAI5 chat GPT5 because evidently they're probably running a lot of these benchmarks before 5.1 came out, which was just recent. Or maybe they realized that 5.1 wasn't quite it. Maybe it beat Gemini 3 frankly. So, so we're seeing it compared to Gemini 5. In any case, it is a really incredible model and it's getting rolled out to billions of people around the world. Sundar Pichai actually had a personal note in his announcement where he said that because of the fact that they've rolled out Gemini to power the snippets at the top of Google search that they now have over 2 billion users every month that are seeing those AI overviews and are using them, and that the Gemini App itself has 650 million users per month. We know that OpenAI's ChatGPT has about 800 million per week. So it's, it's gaining ground quickly and catching up with Chat GPT. More than 70% of Google's cloud customers use Gemini and 13 million developers have built with Gemini models. So it is getting a lot of traction. It's not, you know, like meta AI, which I feel like is probably struggling in this regard. It's getting a lot of traction and seeing a lot of use. I think a more research intensive version of the model called Gemini 3 Deepthink is also going to come out to Google AI Ultra subscribers. I am one of those and I've been able to access and test this out and it's quite impressive, I will say. It's also able to generate video right inside of Gemini, which I'm going to be honest, it did have a little accident when I was testing it out. I was like, hey, I had it like write me a bunch of implications of something that had just come out. And I was like, make a video script for me with this. And then instead of making a video script it, I literally generated a video of like all of these Personas standing around these charts like talking like some marketing lingo. Which was crazy for me to see like an AI model just spit out a video for me that was cool, but it also didn't do exactly what I wanted. So that was kind of funny. What I will say is that this is coming less than a week after opening I released GPT 5.1 and this is only two months after Open Anthropic released Sonnet 4.5. So we are seeing these frontier models come out at a very crazy pace. One thing that Tulsi Doshi, who's Google's head of product for the Gemini model, said about this, they said with Gemini 3 we're seeing this massive jump in reasoning. It's responding with a level of depth and nuance that we haven't seen before. I mean, obviously they're talking their own book, but it is showing up significantly higher on a lot of the benchmarks. So some of that reasoning power is on independent benchmarks. They have a score of 37.4% on humanities last exam benchmark. This is a famous benchmark because people are like, when an AI model can get 100% on this it's, you know, humanity's last exam, obviously. They just make these like crazy names. Humanity's last exam, for those that don't know though, is a really impressive benchmark. It essentially has, I think like 60 to 100 questions that are super, super in depth. Now I know people are going to like roast me in the comments. Like it's not 60 questions, it's like 72. Okay. I don't know the exact amount of questions. I'll be fully transparent with you on that. It has a number of questions which I believe is over 50, and they're going to be covering some things that are very highly technical in very specific niches. So it's like if you are an expert on how to, you know, like wood filler using a certain type of tree when you're trying to build a table, right? Like just really nuanced bits of information. Or perhaps if you are a physicist and there's a specific field that you study quite in depth, you might know the answer to. You know, you have a PhD and you might know the answer to one specific question, but you're not going to know. But most people are never going to hear about that or even know what you're talking about. Or let's say you're a historian and you've studied like a specific, you know, area in Greece from a certain time period and a certain artist and a certain style of art that most people in most AI models do not know the answer to. These are the kind of questions that are on humanities last exam, very niche, very specific questions that if you're an expert on something, you might be able to get one of the questions on this exam. Right? But you'll never be able to get a hundred percent. No human basically can. And so the fact that they were able to get 37% on that is impressive. And I will also say that I believe they got that without using any tools. So it's just the raw model answering the questions. If they add tools like calculators and different like, you know, physics kind of model tools, you could do a lot better. But without any tools, this is impressive. The previous high score was held by ChatGPT5Pro and it was 31.64. So to go from 31 to 37, that's a big jump. You know, 0.6, almost 0.5. Zero point, you know, 8 is, is a good jump. They also hit the top of the leaderboards for LLM or for LM Arena. LM arena is awesome. It's basically a human benchmark it's going to measure user satisfaction. So what that means is it's going to throw, it'll give you like a question and then it will have the Response generated by Gemini 3 and by GPT5 or another model and it will compare them side by side. And it's a blind test, so you just look at the responses and you pick which one you like the best. And then it will say, okay, you know, this is the, this is the chosen model. And it will compare you to all the other models against a whole bunch of people. And so getting a good score on that like basically means in a blind test people prefer your model. So um, they, they were able to score a high scored there which was really impressive. And according to Google, the Gemini app is picking up a ton of steam. Like I mentioned earlier, they're getting a ton of new users. I mean over 650 million monthly active users is a lot beside the base model. Google also said that a Gemini powered coding interface called Google Antigravity which is going to allow you to have multi pane agentic coding. So this is going to be similar to agentic IDEs like warp or cursor 2.0, specifically anti gravity is going to combine this sort of like, basically it's gonna have like a chat GPT style interface or I guess Google or yeah, Google Gemini style prompt interface. And then it will have a command line interface and a browser window that can show what you're actually generating. So you're chatting with it and it's, it's you know, writing code but then also creating what you have built. I've seen some really cool demos where people are creating 3D models and images. Google itself had an interesting demo just to give you an idea where they wanted it to create like a nuclear fusion simulator for the specific type of thing. And it was not just making like this cool shape as the nuclear fusion, you know, reactor was like turning up its, turning up its power, but it was also like making the pitch, the frequency that the nuclear or the, yeah, the nuclear reactor was like creating and the pitch was going up as the dial turned up. So anyways, I bring this up to say like it was creating these 3D objects and it was also creating sound with them. And it was, it was just, you know, making these really complex things. You don't see other places. The CTO of DeepMind, which is Corey said, quote, the agent can work with your editor across your terminal, across your browser to make sure it helps you build that application in the best way possible. Overall I think this is honestly a very impressive tool. I highly recommend you try it out. And you got to get the Gemini Ultra Pro tier in order to do that. But it is a fantastic model and like I mentioned, it can do a lot more than just responding with text. It's creating 3D models, it's creating videos. Of course there's amazing image models. Gemini is coming out swinging, so I'm really excited to follow some of the advancements and what they're able to rollout. A lot of exciting stuff coming down the pipe. Hey, thank you so much for tuning into the podcast today. If you enjoyed this episode, make sure to leave us a rating and review wherever you get your podcasts. And also make sure to go check out AI Box AI if you want to get all of the top AI models in one place, you don't have to pay subscriptions for everything. You could just go test out the latest from everyone over there. Thanks so much and I will catch you in the next episode.
