Google's New Gemini Omni AI Is Too Much. All The Good Stuff From Google I/O. - AI For Humans: Weekly AI News, Tools & Trends

Summary6 min read

Podcast Summary: AI For Humans – Google's New Gemini Omni AI Is Too Much. All The Good Stuff From Google I/O (May 20, 2026)

Episode Overview

In this lively episode, hosts Kevin Pereira and Gavin Purcell recap the bombshell announcements from Google I/O 2026—focusing on the emergence of Gemini Omni, Google’s increasingly powerful AI, and other exciting releases (like Gemini 3.5 Flash, Genie 3, Ask YouTube, Docs Live, and AI glasses). The duo share hands-on experiences, demo highlights, and critical perspectives on the technical and societal impact of these next-gen AI tools, peppered with their trademark wit and banter.

Key Topics & Discussion Points

1. Google’s Gemini Omni & Video Model Evolution

Gemini Omni Unveiled: Google's upgraded AI model merges world model concepts (understanding/simulating the world) with remarkable video, physics, and reasoning abilities (00:00–03:13).
Editing in AI Video: Omni's advanced editing enables fluid transformation of video elements, characters, and backgrounds—undeniably pushing AI video forward (03:13–07:12).
- Gavin's Demos: Role-shifting a scientist explaining flatulence into a Viking, then a Viking in a bee costume with a TikTok-dancing Viking wife, even animating a cartoon fart cloud. Omni keeps character consistency impressively well.
- Quote:
  
  "If you're audio only, you're missing out on the fact that the character consistency is there... it's keeping elements of the original video as you swap to the next one." — Kevin Pereira [07:12]
Cameo Feature: Borrowed from Sora—users can upload their own face for inclusion in AI videos. Gavin finds it visually accurate but the audio “a little funky” compared to competitors (08:01–09:45).
Stylized Explainers: Omni excels at generating varied educational video formats (claymation, comic, stop motion), suggesting a future where creating motion graphics could be as easy as issuing a prompt.
Physics (& Its Limits): While Omni claims better understanding of physics, there are clear limits, e.g., capybara skateboarding prompts yield broken physics and inconsistency (09:45–11:00).

2. Guardrails & Deepfake Prevention

Content Credentials Verification: Google doubles down on watermarking for AI-generated content, adding invisible “content credentials” to both generation and edited AI videos (11:24–12:47).
- Quote:
  
  "I want to be excited for this...I’d love just very basic for every social network to automatically flag AI generated content the way they claim they do, but they really don’t." — Kevin Pereira [12:06]
Some skepticism remains, as even subtle watermark patterns have reportedly been reverse-engineered (12:06–12:47).

3. Interactive World Models: Genie 3 & Maps

Genie 3 Launches in Google Maps: Users can now prompt real-time experiences (e.g., F1 cars in Vegas, raccoons biking) within Street View, powered by AI world models (13:04–13:27).
While whimsical, these features offer tools for both entertainment and new forms of interaction with real landscapes.

4. Practical AI Tools: Ask YouTube & Docs Live

Ask YouTube: An AI-powered search and navigation tool for YouTube, which delivers step-by-step video guides and jump-to-chapter features (14:07–14:46).
- Quote:
  
  “If you ask a question of Ask YouTube...it can make you a step-by-step kind of video carousel...I just know that it’s going to do nothing for our view count. But I’m very excited for it, Gav.” — Kevin Pereira [14:31]
- Raises flags about AI search eroding traditional content creators’ traffic, as AI explainer agents (like TikTok’s “TACO”) become the new gatekeepers (14:46–15:44).
Docs Live: Integrates real-time voice-to-text dictation with seamless integration into Google Drive/Docs, allowing users to narrate, extract info from emails, generate tables and even slide decks on the fly (15:44–17:03).

5. New Gemini 3.5 Flash Model

Not the Flagship—But Fast!: Gemini 3.5 Flash is a lightweight, high-speed foundational model, available before the much-anticipated 3.5 Pro (17:34–19:20).
- Quote:
  
  "The real unlock was the speed...swarms of these capable models delivering results far faster than the other models. I’m super impressed by it." — Kevin Pereira [18:17]
Price, Speed & Accessibility: Discussing the arms race of “cheaper-faster-better,” Gavin argues Google could wield its resources for global accessibility by slashing AI prices (19:20–20:52).
Potential Hesitancy: Speculation that “flagship” model releases (and their supreme capabilities) may be slowed out of caution or technical gating, referencing the Mythos rollout and recursive self-learning (20:52–21:03).

6. Gemini Spark: Always-On AI Agents

Persistent, Cloud-Based AI Agents: Gemini Spark runs agentic workflows in the background, staying live across devices and enabling hands-off automation for personal or business tasks (21:03–22:29).
- Allows for “vibe coding” your calendar, Drive, email—telling the agent to research, email, even plan your life, then return later (like a hands-free “do it for me” AI butler).
Real-world benchmarking shows growing but spotty strengths: ChatGPT may hallucinate, Claude may underdeliver, but new Gemini integration with Google Maps impresses on local recommendations and planning (22:29–24:46).

7. AI Glasses & Wearables

On Your Face, Not Just Your Phone: Google teases new AI glasses partnerships, positioning wearables as the next “face” for personal AI once models are truly robust. Joking about chomp detection and real-time food analysis (24:46–25:35).

8. Breaking News: Karpathy Joins Anthropic

Andrej Karpathy Move: The AI community is rocked as star researcher and educator Andrej Karpathy decamps for Anthropic, signaling major innovations—possibly in recursive self-learning and smaller, self-teaching models (25:54–27:15).
- Quote:
  
  “Karpathy going to Anthropic...for the insiders of the AI world, is like a giant thunderbolt...hard to imagine one person makes a big difference, but he does.” — Gavin Purcell [26:54]
  “He is the Bob Ross of AI, as I like to say.” — Kevin Pereira [26:58]
Hosts suggest the next wave of cutting-edge LLMs may be led by OpenAI and Anthropic, echoing previous shifts in video AI innovation (27:15–27:44).

Notable Quotes & Moments

On AI Video Editing (Character Consistency):
- “It edited the room to have torches on the wall...the animated character has its own style. The face paint remains consistent. The beard styling...You can start with a single source image or video and have it generate a whole storyboard and bring those to life.”
  — Kevin Pereira [07:12]
On AI Search Eroding Video Traffic:
- “There’s a real movement right now where AI is absorbing the what used to be video search...this is a little bit about that same thing about these AI companies kind of wrenching back traffic, which is an interesting thing as well too, but probably more useful ultimately to the normal person.”
  — Gavin Purcell [14:46]
On Deepfake Defense:
- “You just gotta not trust your eyes anymore. We’re in a world where by default, if you see a capybara skateboarding, maybe that’s fake.”
  — Gavin Purcell [12:47]
On AI Assistants:
- “The real win here is to tell something, to go out and do something and then not have to worry about it again...if they can actually find a way that that works, using these different agents to kind of talk to each other. Yes, that’s great.”
  — Gavin Purcell [22:29]

Timestamps for Important Segments

| Timestamp | Topic | |-----------|-------------------| | 00:00 – 03:13 | Google’s Omni AI & Video Model Advancements | | 03:13 – 07:12 | Live Demo: Editing AI Video (Scientist → Viking → Bee → Fart Cloud) | | 08:01 – 09:45 | Cameo Feature & Pros/Cons vs. Sora | | 11:24 – 12:47 | Content Credentials Verification & Deepfake Technology | | 13:04 – 13:27 | Genie 3 in Maps: AI-Generated Interactive Worlds | | 14:07 – 14:46 | Ask YouTube: AI-Assisted Content Navigation | | 15:44 – 17:03 | Docs Live: Real-Time Voice-Narrated Documents | | 17:34 – 19:20 | Gemini 3.5 Flash: Speed, Benchmarks, and Release Strategy | | 21:03 – 22:29 | Gemini Spark & Practical Agentic Workflows | | 24:46 – 25:35 | AI Glasses: Next Step in Wearable AI | | 25:54 – 27:15 | Andrej Karpathy Joins Anthropic – Industry Ripple Effect |

Episode Tone and Takeaways

Kevin and Gavin blend deep technical appreciation with skepticism and practical consumer concerns. Their playful rapport (“that’s a lot of tokens!”) is mixed with critical evaluation and real-world anecdotes. They frame Google’s advances as major but not always flawless—balancing excitement about creative tools and new agents with wariness about authenticity, AI content abuse, and the shifting dynamics for creators.

In Short:
Google is redrawing AI boundaries with Gemini Omni and friends, but the AI race is far from over—watch for next waves from Anthropic and OpenAI, and be ready for a world where what you see (or hear, or prompt) may not be what you get.

Loading summary

Transcript97 lines

[00:00]
Gavin Purcell
Google's Gemini Omni model has landed. It was unveiled at IO 2026 and it's got better editing, better generation, better everything.
[00:09]
Kevin Pereira
Sure, yeah. We've got hands on experiences with Omni to share some astonishing hits and some surprising misses. Last year I outlined our vision of
[00:19]
Gavin Purcell
extending Gemini's incredible multimodal capabilities to become
[00:24]
Kevin Pereira
a world model AI that can understand and simulate the world.
[00:28]
Gavin Purcell
But the Google hits kept coming with all sorts of releases and teases of releases, kind of forming a plan.
[00:35]
Kevin Pereira
Oh, a concepts of a plan.
[00:36]
Gavin Purcell
Yes.
[00:37]
Kevin Pereira
An AI powered Ask YouTube feature which is gonna streamline learning and discovering that's on the way. Docs Live will deliver a real time writing assistant. And I guess. Well, I think they also announced something else that might have been kind of big.
[00:50]
Gavin Purcell
Gemini 3.5 flash. Kevin, that's right. A new model is here, but it is not Gemini 3.5 Pro. We will get into that. The latest lightweight foundational model is actually punching above its class and matching up with Opus 4.7 and G 5.5 and it's much, much faster.
[01:07]
Kevin Pereira
That's a lot of tokens.
[01:09]
Gavin Purcell
Plus, Gemini Spark is your always on AI agent. It's vibe coding for you, it's vibe coding for your family, it's vibe coding for your life.
[01:16]
Kevin Pereira
But that's a lot of tokens.
[01:18]
Gavin Purcell
You okay over there? There's a ton of news to cover, including surprising news from Anthropic that might give them a major leg up in the overall race. This is AI for tokens.
[01:29]
Kevin Pereira
It's a. That's a lot of tokens.
[01:31]
Gavin Purcell
Humans. It's AI for humans, everybody.
[01:33]
Kevin Pereira
I'm a human.
[01:39]
Gavin Purcell
Welcome everybody to AI for Humans, your twice a week guide to the wonderful world of AI. My name is Gavin Purcell. That is Kevin Pereira. And Kevin, we have big, big news today. Google dropped their IO. I won't call it a bomb because it is a very exciting moment. So they dropped their IO unicorn like surprise on us, which is pretty exciting.
[01:56]
Kevin Pereira
That's delightful.
[01:58]
Gavin Purcell
Delightful. What they dropped mostly was a brand new model which we're going to get into Gemini 3.5 flash. But Kevin, more. They dropped an update to their VEO model, which they are now just calling Google Omni. And this is based on the world model foundations that we have heard people say many times we've talked about in the show. It is more than just LLMs. This is a video model that allows you to have better physics, it has reasoning, but maybe more importantly, you can edit really, really well with it. So have you taken a look at some of these videos so far.
[02:29]
Kevin Pereira
Definitely have seen the examples. Watch the portion of the keynote where I mean, look, the too long didn't read is this is Nano Banana for video, which means something to a giant portion of our audience and sounds like complete gibberish to the new people. Another portion, yeah, so we can get into that. But this is. This is a world model that has understanding of physics and can do complex reasoning and can express itself in any medium. They're just sort of starting with video, which is interesting here, right? So theoretically in the future, this thing can ingest and output audio and text and video and maybe 3D worlds and all of those other things as well. But let's just get to the goods. Like, is it any good?
[03:14]
Gavin Purcell
I've seen a lot of really interesting examples people have posted online. There was a great one from Fofr who we often shout out here, who actually showed how good it is at representing London. So there's a shot where you see the London skyline. Bawal Sadhu also had a situation where he was playing around with Genie 3, which is still only available to Ultra members, where he was able to use. They've just updated it so that real world locations, you can run around real world locations. But more importantly. So my experience with this so far has been good. I will say this, we'll drop this in, but there's a couple things that have been announced that are really important. One is it's better at generation than VO3 was. First of all, we'll talk about that. The editing is a really big thing to talk about. But then also they have kind of dropped a new feature that's a lot like Sora's cameo. And I think that's a place where it may not be doing as well. But Kev, I do want to first talk about just the rough generations I did. So I want you to play this clip I made where I asked it. I asked it to create a scientist who was explaining flatulence basically in a very reasonable way and to do it kind of in a very scientific way. So what you'll also see though is then I edited over the course of this clip to change what the person saying it was and then what was happening in the background. So let's play this whole thing and then we can talk about it.
[04:26]
AI Generated Voice / Demo Character
We all produce gas. It is a natural byproduct of digestion in your gut. But the truly foul smell comes from sulfur compounds. Blame the bacteria for that.
[04:36]
Kevin Pereira
So I'll pause there for a second. Sorry to the audio only kids, but you are seeing, like gorgeous animated call outs and diagrams of digestion system, gastrointestinal tract, hydrogen sulfide molecules popping up. And then a title card which says, make the scientist a Viking who's just been through a battle and is forced to explain flatulence without warning.
[04:58]
AI Generated Voice / Demo Character
We all produce gas. It is a natural byproduct of digestion in your gut. But the truly foul smell comes from sulfur compounds. Blame the bacteria for that.
[05:08]
Gavin Purcell
So I want to explain again, if you're not watching this, what you're seeing here is a very well rendered Viking with battle paint on, basically saying the exact same lines because I just asked it to change the Viking, he's now in a Viking background, but there's still a monitor behind him where he's kind of describing some of this stuff. So really, really remarkable video to video editing. And then say what the next title says here.
[05:30]
Kevin Pereira
Yeah, it is put the Viking in a bee costume and have his Viking wife pop up on the screen behind him doing a TikTok, which it is
[05:37]
AI Generated Voice / Demo Character
a natural byproduct of digestion in your gut.
[05:40]
Kevin Pereira
Nailed it. But the truth.
[05:41]
Gavin Purcell
Nailed it. Right, so. So in this video, what you're seeing is the bite. The bee costume is fantastic. The Viking is down a bee costume and behind him on the video screen is his wife. And again in a Viking setup, and the wife is Viking doing a dance. And so this is a really formative point for AI video because you and I both know trying to do edits within a system can sometimes be tricky. The other thing that was really interesting here. Now, the audio we can talk a little bit about, but the audio stayed consistent throughout. It didn't change. The voice didn't change, which is a really, really useful thing, I will say. I pushed this a little bit further. I went five. I went five generations deep. Play this last one and you'll just see how it kind of breaks a little bit. I'm always lurking. I'm always lurking. Smell you later, suckers.
[06:30]
Kevin Pereira
So.
[06:31]
Gavin Purcell
So I'm going to describe what that was. My prompt was I wanted to have a cartoon animated, like kind of fart cloud. I'm sorry about this, everybody. Yes, I am 10 years old in my brain. Walk onto screen and say all those lines. Now, what it did is if you're not watching this is it had the wife say with the first line and then kind of all three characters said the last three lines. So it did break eventually. But again, this is a remarkable step up just from that standpoint, which is pretty cool.
[06:57]
Kevin Pereira
And you have to love that during the animate on the fart cloud is also farting its own little cloud. That's a nice little touch. Look, we kind of dove into some hyper specific examples here. And again, if you're audio only, you're missing out on the fact that the character consistency is there.
[07:12]
Gavin Purcell
Yes.
[07:13]
Kevin Pereira
When. When Gavin swapped from a scientist in a lab, he didn't say, change the background to be thematic with the character. It made the decision. It edited the room to have torches on the wall. The flames from the torches are kind of accurately reflecting against the cobblestone of the wall. The little animated character has its own style. The face paint remains consistent. The beard styling, the braid in the hair. It's keeping elements of that original video as you swap to the next one. And now you're starting to get the kind of holodeck ramifications of this, where you can start with a single source image or start with your own video, which is an important thing that they demoed as well, that you can take source video and just tell it how you want it to transfer the style or give it a single start image and have it generate a whole storyboard for you and bring those to life.
[08:01]
Gavin Purcell
Yeah. So all of this is within flow right now. I think people who are listening to the show for a while know our feelings on flow as a system. It's not that solid, but it is about 80 credits per generation, which is cheaper than V3 and 3.1 were. So that's good. This is Omni Flash right now. So we assume there's going to be an Omni Pro model. And then, Kev, before we move on from this, I do want to talk about. There is a version of this. They kind of launched a cameo feature like Sora has, where you can upload your own face in your own video. And it does the same thing you actually do the turn your head to the right and left and you count the numbers down. I found this. Okay, if you watch this video, we'll play this video now of me, maybe listen to a little bit of it. Take a listen to this. This is me foreign.
[08:41]
AI Generated Voice / Demo Character
It's the Jevons Paradox.
[08:43]
Gavin Purcell
Increases in efficiency.
[08:44]
AI Generated Voice / Demo Character
Like this place.
[08:45]
Gavin Purcell
It never ends.
[08:45]
Kevin Pereira
No.
[08:46]
Gavin Purcell
Stay away from me. So that was.
[08:49]
Kevin Pereira
If you sent me that audio sample, I would not say that is my BFF gp. Yes.
[08:54]
Gavin Purcell
So it's a little funky of how it sounds. I think Sora did a better job of this. But just if you're. If you're just listening. That was me explaining Jevin's Paradox and kind of Getting sucked into a computer in kind of a back rooms world. It got the back rooms really well. So visually it's doing really well. I think that maybe feels a little half baked, but I also should say I had a hard time getting that generated. So I know one of the tricky things with Google always is there are going to be really hardcore guardrails on this maybe versus what Sora was, especially in the beginning. So whether or not this takes off, I'm not sure. I do think this is a big deal for like education content or for explainers. One of the things I saw somebody do and Google did was like kind of explaining something in claymation, like a pretty complicated concept. And you could see a world where instead of doing a motion graphics pass on something, you could just say, hey, create me a graph that explains this. And this I would see that would be super useful and I think it would be capable of it.
[09:45]
Kevin Pereira
Yeah, look, this thing has sort of built in styles that it already works really well with. They're fairly proven. The explainer thing, not only can you swap like the template for the video, but swapping the style as well. So claymation explainer to stop motion, to comic book, to whatever you want. Those examples seem to be the most impactful to me. Like the physics stuff is interesting, but I don't know if you saw the volleyball demo. The serve starts to go over the net and then just immediately flies back. The character consistency goes out the window. Now we know that these things are kind of like slot machines and you pull the reel and maybe this was just a particularly bad generation, but there are some broken physics and broken consistency.
[10:28]
Gavin Purcell
I actually, before I jumped on here, I tried to generate a capybara realistic capybara doing a 900 out of a half pipe and dropping in. And you'll notice in this video it's very similar. It's a bad physics, which is funny because it doesn't seem to me like it gets this at all. Now it's possible that there's some sort of thing in this where I kind of push the wrong button and I use the wrong model, but I don't think so. You can see if you're watching this, like you'll see it like it's not really doing a 900. Some of the physics are also a little wonky. So I think we'll have to play around and see where the physics actually work best.
[11:01]
Kevin Pereira
Yeah, yeah, look. Okay, but. Okay, let's take a step back from this like very cool thing. It's out now. It's Maybe the smaller version of it. So in a few weeks time we get another bigger, probably more expensive, but maybe slightly more capable model. Gavin, what about the potential for abuse here? Because if my mom sends me another dancing pug video thinking it's real or more bunnies on trampolines, like I am going to uninstall the Internet.
[11:24]
Gavin Purcell
We got you solved, Kevin. They actually introduced yet another piece. Yes, they introduced another piece of protective software called Code Content Credentials Verification. This is adding on to their kind of like watermarking system. Google has been very, very hardcore about watermarking. In fact, one of the most frustrating things about both Nano Banana and Veo has been like there's always a watermark on the outputs and Chat GBT is playing a little looser with that in some ways. But I do think this new, this new system is designed to be able to know whether not only is your thing AI, but was it edited with AI? Because a lot of times something's edited with AI, it may not be carrying the same sort of thing. So this is a big deal for those people that are concerned about deep fakes or wanting to make sure you understand what is AI content, What is it?
[12:06]
Kevin Pereira
I want to be excited for this, I really do, but I don't know if you've seen that like both the watermarks for Google's Image Gen and ChatGPT were exposed recently. Like you can see.
[12:16]
Gavin Purcell
Oh no, I didn't know that. Wow, that's interesting.
[12:18]
Kevin Pereira
They add these subtle patterns that we can't detect in the final image, but they're pretty easy for a computer, computer to pull out. And it's. They each have their own texture, their own patterns. It kind of on its own. If you generate like on a blank canvas, it looks like a magic eye sort of thing, but like they found those fingerprints already. So I hope this takes it the next step I'd love just very basic for every social media type network to automatically flag AI generated content the way they claim they do, but they really don't like. That would be a nice step in the right direction.
[12:47]
Gavin Purcell
Yeah, I mean X is trying to do this in some form and we know they've had mixed results before, but I also think we've said this before in the show, like you just gotta not trust your eyes anymore. We're in a world where by default, if you see a capybara skateboarding, maybe that's fake. Let's just keep that in mind. Keep that part in mind.
[13:04]
Kevin Pereira
What if I see like a raccoon riding a motorcycle through Google Maps or an elephant on a motorbike or an F1 car going through Vegas streets.
[13:12]
Gavin Purcell
Probably real. Me and my buddy race through the streets of Vegas all the time. So you might have just caught us doing that. But Kevin, I think you're referring to this thing we mentioned before which is within Google Maps. Now you can generate Genie 3 outputs, which is a very, very cool thing.
[13:27]
Kevin Pereira
Yeah, Genie 3 is another one of their world models. But this is sort of a real time interactive model. Think of it like an AI gaming engine. And now you can. Yeah. Prompt whatever you want to be going around street view like. I think that's cool. Look, some people will hear this and go, I don't need this for my life or my business or my whatever else. And that's why I wanted to quickly shout out two things which were just like tiny little nuggets which may or may not come to light. They might be unceremoniously taken behind the Google farm.
[13:56]
Gavin Purcell
Oh, these are cool. I think these are interesting for sure. Yeah.
[13:58]
Kevin Pereira
But Google has a barn. Like Google has a barn and they bring all sorts of cool tech back there and they underlying all the time.
[14:05]
Gavin Purcell
I bet. Barn is in the barn.
[14:06]
Kevin Pereira
Exactly.
[14:07]
Gavin Purcell
They're running the barn.
[14:08]
Kevin Pereira
Barn is in the barn right now. But okay, first is Ask YouTube. And one of the most useful things in a Google AI search for me lately is when I'm searching something, how do I do this? Or what is common with that? It surfaces YouTube videos and suggests different chapters or portions to jump into it. This is that but really amped up. So if you ask a question of Ask YouTube, it does what you think it surfaces a bunch of YouTube videos can make you a step by step kind of video carousel of start here, click here takes you right to that portion of the video. And I just know that it's going to do nothing for our view count. But I'm very excited for it, Gav.
[14:46]
Gavin Purcell
I'm very excited to see if I can Google can what videos Bard failed at. And we could make that the Bard failed behind the barn setup. I think it'll be very fun. It's a very cool idea. I do like. I do like ask you to one thing, Kevin, I think that's really important people understand is there's a real movement right now where AI is absorbing the what used to be video search. Right. And this was a way that a lot of people got views on their videos or on social videos because they would explain something like, you know, how do you replace, you know, a carburetor in a Certain thing. This is also another thing that, that people have to be aware of. I just saw there's on TikTok, there's now an agent called Taco T A K which allows you. Instead of. It used to be when you click on the search button you would get a bunch of people's videos and to explain it now there is an AI explainer that goes there which stops views from happening. So this is a little bit about that same thing about these AI companies kind of wrenching back traffic, which is an interesting thing as well too, but probably more useful ultimately to the normal person. Right.
[15:45]
Kevin Pereira
By the way, killedbygoogle.com, you can see Firebase Studio, Dark Web Reports, Google Jam Board. Oh RIP Chromecast VPN by Google Drop Cam. So there's. There's a lot going on behind the Bard Barn. I digress. Lastly, I want to quickly shout out docs live. Very cool demo using real time voice, which you and I have said we think will be pretty, pretty prominent in the. For future interfaces. But you can launch a live doc, narrate to it and.
[16:15]
Gavin Purcell
Live doc.
[16:15]
Kevin Pereira
That sounds like a live.
[16:16]
Gavin Purcell
That's a great phrase. That's a really good phrase. Live docs.
[16:20]
Kevin Pereira
I like live docs.
[16:21]
Gavin Purcell
Yeah, sorry, it was a quick interruption to talk about live docs, everybody. That's right, keep going, Kevin.
[16:27]
Kevin Pereira
Great.
[16:27]
Gavin Purcell
I've lost my mind. Lost my mind. I'm moving.
[16:30]
Kevin Pereira
You launch a live doc and you just start talking and it will dynamically generate a document of whatever it is you're saying. But then the next step is you can say, hey, go grab that PDF that's in my email and make a table of what information is needed and then format it in this way for me. Give me like slides that I can go through and then craft, craft it in this voice and whatever. And it will, because it's connected to your Google Drive or Google Doc ecosystem, it can pull from all those threads to generate that document live on your phone or in the cloud.
[17:04]
Gavin Purcell
We have a huge new flagship model we have to talk about too. But first, Kevin, we have to tell everybody out there if you want to know more about these flagship models, if you want to know more about Google Omni, you got to click that little subscribe button right down below here. You got to click subscribe. You got to like. Thank you everybody for hyping last week's video. It did very well and we do appreciate that it kind of grew over the course of the week, which is a very nice thing to see. We have a patreon we have a newsletter, all that stuff. It's all the ways you can get the two of us into your face. If you're not getting us enough, please like and subscribe now, y'. All.
[17:35]
Kevin Pereira
And if you got us into any other parts of your bodies, you might need to visit a live doc.
[17:40]
Gavin Purcell
Yeah, live doc. Live doc. All right, let's distract it. Let's jump into the big other news today, which was Gemini 3.5 flash. Now, this is a foundational model. It's an update to their foundational model, but it is not a flagship model. I think the important thing for everybody to know, which it was kind of surprising to me, is that, like, Google 3.5 Pro is coming in a few months, which feels like a miss for Google a little bit. But we can Talk about what 3.5 Flash is and why it's different and maybe why Google is not racing against anthropic and OpenAI to try to kind of push the frontier forward. But Kev, what was your first take when you heard the news about 3.5 flash?
[18:17]
Kevin Pereira
Look, I, you know, I was not. Some people seem to be disappointed by it not being this, gee whiz, foundational something. Look, if we want to get into the benchmarks, and I know the boys want to come off with a benchmark, Those benchmark boys want to come off those bleachers. Like, the numbers are good. The numbers are great for a Flash model, assuming a pro version is on the horizon. Like, it's benchmarking just fine for agentic tool calling for solving visual puzzles in the ARC challenge for computer use. But to me, the real unlock was the speed. The fact that this thing in certain benchmarks is going four times faster than those other models means even if it's not like the peak, peak, peak performance, I will want to run this as the daily driver and run it with all of my parallel agents so that they can have swarms of these capable models delivering results far faster than the other models. I'm super impressed by it. I wish it was a little cheaper, but it's still amazing.
[19:21]
Gavin Purcell
Yeah. You know, one of the things that's so interesting right now, we've talked about, like, the different levels that AI is going to improve, and one of the biggest levels is going to be price and speed. Right. Because even last night I was working on a project for a friend of mine and I get very frustrated even using 5.5 fast with, with high in codecs where I'm waiting for the thing to come back and forth. And again, we have said this a bajillion times, but, like, it does stop you if the speed is slow. And we know that Google has this kind of giant fleet of TPUs, their tensor processing unit, which is their very specific AI processing hardware. And I hope that that is a place that Google can deploy. One of the things I always think is so weird, Kevin, about cost with Google is this is a company that is literally one of the richest in the world. And yes, they're spending a ton of money on building out AI, but you would think that they could actually almost like, buy their way into this, into a lead if they cut costs way down. Like, it feels like they're one of the only companies that could say, like, hey, costs don't matter that that much to us. Let's onboard a bunch of people. But overall, I still think this is a big step. I do think it's a weird thing that they waited for 3.5 pro, and maybe this is some sort of Mythos like thing. They didn't say that. They didn't say that it was too capable, but maybe they're in that world now where they're worried that if they launch something that's the strongest version, that it's going to break something in the world. Because Mythos continues to kind of like, open doors for people. There was a story this week that people who had Mythos were now sharing it with other people to make sure that they could solve all the problems they have. So maybe we're in that weird in between mode where everybody's just taking a breath for a second before they release the next big thing. Yeah.
[20:53]
Kevin Pereira
I think they also probably wanted to do more than just kind of release it as a. Now you can go use it on studio AI lab. Google,
[21:04]
Gavin Purcell
they got a new logo, but
[21:06]
Kevin Pereira
I think, like, look, they're plugging it into Gemini Spark, for example, which we can. We can talk about. It's available in Anti Gravity. It's not just releasing the model. It's making sure that there's harnesses and tooling available to take advantage of that. Gemini Spark really quickly. We talked about it actually last week when it was kind of a little baby rumor, but here it is. It is an agent that runs for you in the Google cloud. So they made a big deal out of the fact that you could close your laptop, Gavin, so you can run long hardware horizon tasks in Spark. These are things that could take hours, days, potentially even weeks. Close your laptop, walk away, access the same project and the same things with your phone. And I know sometimes people Might like kind of gloss over just because our presentation is lackluster, but also because they're like, hey, I don't code. I'm not interested in that. What I thought was really nice was that they showed off and you can do some of these things in cowork, but I don't want to like hedge too soon. They showed examples of people vibe coding their life, essentially using these agents running in the cloud that are connected to their calendar, their drive, their email and giving it tasks. Hey, go research this thing and give me five great answers in a draft email so I can send it to somebody else. Like, it's not just about building a website or vice, coding a game or whatever. It's about having these agents do your bidding and assist you in your real world tasks.
[22:30]
Gavin Purcell
And that is the big thing we're all waiting for, right? I think I'm waiting for. I mean, you and I spent a lot of time doing agentic websites or agentic projects or all sorts of stuff. But like the real win here is to tell something, to go out and do something and then not have to worry about it again. Right? Like if you could book a rest. I know everybody's used restaurants as a good example of like how these things work, but most of the time they're not doing everything. And most of the time, like, if I could set up a system where I could just say to my thing, hey, my wife and I want to go to a new restaurant, make sure it's whatever, blah, blah, blah. Up until now that has been a pretty big fail. And if they can actually find a way that that works, using these different agents to kind of talk to each other. Yes, that's great. I think we are going to have to see what that pathway looks like and how it comes to the consumer. But Google might be best served to do it.
[23:13]
Kevin Pereira
So some real world like anecdotal benchmarking, by the way, Gav using Claude, Google and ChatGPT, a test that I've been using a lot because I'm away, I'm in a foreign country and I say like, hey, I'm in insert area here. I want to go eat at some sort of place. Go find me the like the top three places. I want you to use TripAdvisor, Reddit blogs, Google reviews, Yelp reviews, everything at your disposal. Go, go list the top three, take their menus and translate them for me and segment it out by like vegetarian meat or whatever. And by the way, like they're shockingly capable. Now they can go off and do this just within the chatbot app, just within the cloud app. More often than not Claude will say that it can do something, spend a long time downloading and trying to figure out and then not do it. ChatGPT though has hallucinated full menus for me based on the most thought.
[24:07]
Gavin Purcell
These are the biggest things.
[24:09]
Kevin Pereira
Yeah, yeah, exactly. So like it's hit or miss but when it works it does feel magical. And the also ran sort of shout out because I don't use Gemini as a daily driver but Google Maps with the new Gemini feature in it surprisingly capable when you say like hey, find me a restaurant that is this at the other But I said the other day plan me a walking path that would let me see X, Y and Z or find me a good place to jog or whatever. And it did it. It sourced reviews and it put it on the map for me directly within the Google Maps at. So like I know a bit of a tangent but like real world usage that I was happy with. I wish it were on my face though. Gavin.
[24:47]
Gavin Purcell
Well, to wrap up the big thing with Google AI, there are now new fancy AI glasses that are on the way and they kind of didn't as big a deal about this but like I mean AI glasses are something that Google has talked about, they are partnering. Obviously we have meta's glasses that they have announced there is a push towards this thing and I don't know if we know for sure that people will use this, but once all this AI gets really good, they will hopefully be on your face in a cool way. And you mentioned that idea of if the agent's going to go and tell you where to get and where to go. Maybe you look over at one side and you see somebody sitting in a restaurant. You're like wow, what's that person eating? You look and it kind of tells you what the food is and then it can also tell you what the person's very mad at you because it'll
[25:25]
Kevin Pereira
give you a prompt, it'll say open mouth. So you have to go and physically grab their mouth and look in with the glasses. Yeah, yeah, it has chomp detection. It's going to look and then tell you exactly what they're eating.
[25:36]
Gavin Purcell
There is so much stuff coming out of this. There's also weirdly Kevin just mentioned before we jumped on there's a rumor that GPT 5.6 is going to drop in surprise on Wednesday or Thursday because of some sort of like.
[25:46]
Kevin Pereira
Yes, yes, yes. And we didn't even talk about the new like Google search Because the search box is changing, plus agentic shopping, plus everything else.
[25:55]
Gavin Purcell
But there's so much stuff happening. We will have more on the next show. I do. We do have one big piece of news that is a very insider baseball news. But if you've been listening to this show for a while, this is a big deal. And this gets to more of the frontier models. Andres Karpathy, who Kevin and I both believe is one of the best educators on AI, formerly worked at Tesla, was an OpenAI employee very early on, has agreed to join Anthropic. And if you're still listening to this podcast, you probably know what that means. But just in case, Andres Karpathy is kind of seen as one of the best AI researchers. And what's very cool about this, Kevin, is that he's joining Anthropic and there is a rumor out there that in fact he even kind of says it in his blog post or his Twitter post that he's going to be working on recursive self learning. If you remember, a couple of months ago, we covered a very nerdy story where Carpathy was working on very small models and came out with something called auto research, which was this idea that, you know, you could help the model learn itself. Karpathy going to Anthropic is like for the AI world, for the insiders of the AI world is like a giant thunderbolt that comes and shakes everything up. Because it's hard to imagine one person. Yeah, one person makes a big difference. But he does.
[26:58]
Kevin Pereira
Yeah, it's big signal. He is the Bob Ross of AI, as I like to say.
[27:02]
Gavin Purcell
Bob Ross, but also Einstein. Right. He's like Einstein Ross. Now that's a visual that you'd like to have in Google. Omni.
[27:10]
Kevin Pereira
And let's generate it with Omni right now. And there we go. And it's done. We are nerds.
[27:15]
Gavin Purcell
There we go. Anyway, it's a very big deal. We're going to be tracking that because he just dropped that this morning. But I do think when you think about the foundational models, Kevin, in this Google announcement today, it does start to feel like OpenAI and Anthropic are the ones that are really pushing hard. We have yet to see a couple other places drop. But remember how we talked a while ago, how like Sea Dance and VEO seem to kind of taking the lead in AI video? It's starting to feel like that might be happening on the LLM side as well.
[27:45]
Kevin Pereira
Tough time to quit AI Gavin, but I had a good run. I'll see you.
[27:48]
Gavin Purcell
You did. Yeah. Okay.
[27:50]
Kevin Pereira
Bye.
[27:50]
Gavin Purcell
Bye. See you later. We'll see you guys all. We'll see you all on Friday. Thanks for tuning in. A lot more stuff coming.
[27:54]
Kevin Pereira
Bye.
[27:54]
Gavin Purcell
Bye, everybody.