Summary7 min read

This Day in AI Podcast – EP99.05-FLASH

Date: May 23, 2025
Hosts: Michael Sharkey & Chris Sharkey
Episode Title: Opus & Sonnet 4, Google I/O Recap, Microsoft BUILD & Sam Altman Has A New Friend

Overview

In this episode, the Sharkey brothers dig into a week overflowing with AI industry news and product launches. The main focus is on Google's dramatic comeback at Google I/O, OpenAI's headline-grabbing Jony Ive partnership, Anthropic’s new Claude Sonnet 4 and Opus 4 models, and rapid advances in AI-generated video exemplified by Google’s new VO3 model. The brothers bring their trademark irreverence, skepticism, and “proudly average” curiosity to assess what’s hype, what's practical, and what's just plain weird in the evolving world of AI.

Key Discussion Points & Insights

1. OpenAI’s Acquisition of Jony Ive’s “AI Device Company” IO

Context: OpenAI announced the acquisition of IO, a startup led by famed designer Jony Ive, for a reported $6.4 billion.
The launch featured a high-production-value video of Sam Altman and Jony Ive, which Mike and Chris found almost satirical in its self-importance and lack of product substance.
- "It’s a nine-minute video of them literally blowing smoke up each other’s asses. It’s next level cringe." – Mike (07:55)
- Both questioned whether this move was about real innovation or more about optics, calling the acquisition money-laundering-adjacent and comparing it to modern art-level fundraising.
The video’s timing was seen as a strategic attempt to distract from Google I/O news.
Memes & Social Media Reaction: Communities have compared the Altman/Ive collaboration to past failed hardware launches (“Humane AI Pin”) and poked fun at Jony Ive’s career arc post-Apple.

2. Explosion of New AI Models

Google:
- Gemini 2.5, Gemini Flash 2.5, Gemini Ultra Subscription ($250/month), and new feature parity with OpenAI.
  - Mike believes Google’s current model suite leads the industry:
    "I do think they have the best models right now... VO3, Imagen 4 is stunning." (05:49)
- Imagen 4: Strong prompt adherence and image quality.
  - Chris: "The attention to detail… is just remarkable. The lighting as well is beautiful." (29:03)
- VO3: Breakthrough in aligning audio and video for hyper-realistic AI-generated videos.
Anthropic: Claude Sonnet 4 and Opus 4 arrived with “parallel test time compute” for better tool calling—esp. in coding-related tasks.
- Chris noted Opus has been underwhelming (“the brand is tarred, tarnished” – 73:50), but is hopeful Sonnet 4 will impress over time.

3. Hands-On with Google VO3 Video Generation

Demo Reactions:
- The brothers played real VO3 clips (starting at 11:59) and marveled at the quality—even music synchronization and "streamer" game footage are shockingly authentic.
  - "If you played this and said that's AI-generated, I would not really necessarily believe." – Mike (13:58)
- Flaws remain in longer sequences and physics-heavy scenes, but generative video is leaping ahead in usability for YouTubers and content creators.
- Pricing is steep, reflecting the computing power needed.
Societal Impact:
- Chris: "Doesn't it to some degree fill you with profound sadness... People are going to spend their lives consuming content that isn't even real." (19:55)
- Both discussed the quick path from AI creative technology to fake evidence, propaganda, and the erosion of any meaningful reality online.

4. Gemini Diffusion: The Most Exciting (But Under-Hyped) Release

Gemini Diffusion introduces diffusion-based text generation: "Instead of predicting text directly, they learn to generate outputs by refining noise."
Offers near-instant code and document generation—2.8 seconds for full websites, suggesting new productivity paradigms.
Chris: "Speed helps in so many ways... it looks cool. And your examples... I was like, wow, that is astonishing." (41:34)
Implications:
- Real-time UI/UX generation, instant prototyping.
- Could change workflows—less context switching, far more iteration.
- Next bottleneck: not model speed, but tool orchestration.

5. Model Context Protocol (MCP) and the "Tool Stack" Future

All major LLM providers now embrace Model Context Protocol—AI agents with parallel tool access.
Anthropic’s implementation in Claude Sonnet 4 and Opus 4 is notable for allowing broader, deeper tool calls during "thinking" steps.
The real innovation will be in orchestration, permissioning, and specialized “skills”—not just model power.

"None of the model updates matter. What matters is the intelligent integration of the MCP explosion."
— Chris (105:34)

6. Application vs. Model Layer

Google (and others) release a dizzying scattergun of overlapping new applications and APIs.
Gemini core experience/UI is still awkward and disjointed compared to ChatGPT.
As models converge, the brothers argue that app design—and especially tailored, productivity-focused integrations—will differentiate the next phase.
Mike: “It’s the orchestration and the next... AGI moment for me is seeing these tools... working on different things for me throughout the day in the background.” (106:39)

7. Microsoft BUILD, Copilot, GitHub Updates

Microsoft is now “the Switzerland of models,” adding support for Grok and open-sourcing Copilot prompts/tools.
GitHub Copilot’s new agentic features (auto-PRs, agents fixing issues) are viewed with skepticism: the time required for QA and real comprehension isn’t eliminated.

“If someone just submitted a hundred PRs to SIM theory, for example, I’d be like, you’ve just ruined my life."
— Chris (82:02)

8. Financial Realities for New AI Search Players

Perplexity’s Leaked Financials:
- Despite high valuation, margins are razor-thin; huge discounts, high ad spend, and massive AWS bills dominate their P&L.
- The brothers are skeptical of their long-term prospects versus Google’s omnipresent, free AI search integration.
  
  "Can this company really challenge Google in light of Google just putting this capability in search ... ?"
  — Mike (101:55)

9. Miscellaneous Highlights

Joking about prank-calling pet groomers at scale using MCP + AI agents as a practical chaos test.
Reflections on Google’s corporate culture: Too many teams, too many names, UX still lags.
Apple’s likely path: Wait, assimilate, and then release a polished app-layer experience.

Notable Quotes & Memorable Moments (with Timestamps)

On OpenAI’s Altman/Ive Video:

“It’s a nine-minute video of them literally blowing smoke up each other’s asses. It’s next level cringe.”
— Mike, 07:55
On Google’s AI Supremacy:

“I do think they have the best models right now... VO3, Imagen 4 is stunning."
— Mike, 05:49
On AI Video's Societal Impact:

"People are going to spend their lives consuming content that isn't even real."
— Chris, 19:55
On The Real Future: Orchestration, Not Models:

"None of the model updates matter. What matters is the intelligent integration of the MCP explosion."
— Chris, 105:34
On Microsoft Copilot-Agent Skepticism:

“If someone just submitted a hundred PRs to SIM theory, for example, I’d be like, you’ve just ruined my life."
— Chris, 82:02
Gallows Humor About Perplexity's Finances:

"Why even list it? It's embarrassing. I don't... these were clearly leaked, but 27 million [in discounts/refunds]..."
— Mike, 95:17
On Google's Comeback:

"It’s like the kid that got bullied in high school and now has come out and is super rich, has a hot wife, and, like, is really confident now."
— Mike, 93:34
On The Application Layer vs. Model Layer:

“As models converge, the brothers argue that app design—and especially tailored, productivity-focused integrations—will differentiate the next phase.” (paraphrased summary, multiple points)

Timestamps for Major Segments

| Segment | Timestamp | |-----------------------------------------------|-------------| | Altman/Ive IO acquisition + launch video | 0:02–11:33 | | Google I/O Recap—Gemini family | 11:33–27:00 | | VO3 AI Video Generation—Hands-on Demos | 11:59–24:43 | | Imagen 4 & Image Prompting | 27:12–34:24 | | Gemini Diffusion Super-Speed | 36:27–44:47 | | Orchestration + Model Context Protocol (MCP) | 61:55–72:07 | | Application design vs. model arms race | 50:30–55:14 | | Microsoft BUILD, Copilot updates | 77:13–86:34 | | Perplexity financials dissected | 94:18–100:52 | | Agentic models—future, chaos with MCPs | 87:34–93:18 | | Broader reflections, closing thoughts | 104:09–109:28 |

Concluding Thoughts

The brothers close by urging listeners to focus not on chasing every hyped model release, but instead to explore how to practically combine these models and tools in ways that truly improve personal productivity and creativity. They call for better, simpler tools (like their own Sim Theory platform) to help average users experiment at the forefront, and suggest the coming year will be marked as much by breakthroughs in orchestration and specialized use-cases as the raw power of underlying AI models.

"We're all focused on the practical... It's fun to get all excited about the announcements, but it's really about what can we do with it."
— Chris (107:29)

Listener Homework:

Watch the Altman/Ive IO launch video for a lesson in modern tech narcissism
Try out Google Gemini and VO3 models (if you can get access)
Reflect: What could you orchestrate in your daily work with AI-powered tool stacks?
Join the Sim Theory AI club to test-drive all the weirdness the AI world can muster—maybe even join the great pet groomer prank experiment.

Support the show: Subscribe, average reviews heartily welcomed; try Sim Theory and look forward to more practical chaos in AI experimentation.

Loading summary

Transcript289 lines

[00:03]
Mike
So, Chris, this week when it rains, it truly paused, quite literally because it's flooding where I live right now. So the metaphor holds. But we have had announcements out of Microsoft build Google IE anthropic just announced only a couple of hours before we are recording Lord Sonnet 4 and Claude Opus Forum. We'll get to all of those things in a moment. But also we had an announcement from OpenAI about acquiring Johnny Ives AI device company IO, which was really interesting. But I think what has stolen the show, at least so far, is VO3, which is this new video generation technology that pairs audio with video generation and it's scarily real. I found what I think is the best example of VO3 that I want to play for you right now and get your reaction.
[00:59]
Chris
I met Johnny's family relatively quickly after meeting Johnny and they were sort of just like. It was an impossibly lovely family. I was just thinking what a privilege it is to really connect with somebody new.
[01:09]
Mike
And it's, it's.
[01:10]
Chris
It hasn't happened to me in a.
[01:11]
Mike
Long time.
[01:13]
Chris
And I. The. The reason I think that it happened is we had both a very strong shared vision. We maybe didn't know exactly where we were going to go.
[01:23]
Mike
But like. So what did you think of that VO3 clip? It's pretty incredible, right? Like, Johnny, I've Sam Altman look, totally real.
[01:32]
Chris
Isn't that a real video, Mike? Didn't that really happen?
[01:37]
Mike
I mean, surely no one would create a video like this where you just literally blow smoke up each other's asses.
[01:44]
Chris
I think that I don't. I'm not a psychologist, but isn't this like just straight up weird narcissism, like just thinking the. Or solipsism, where you just think the entire world revolves around you and people care. Like, I haven't made a new connection in ages. I. I'm always going to, like, events and nobody really loves me. It's just like really weird, right? Like, what's this got to do with the product and the company?
[02:10]
Mike
So, interestingly enough, and let's. Let's give some context now for people who don't really get our silly gag. So OpenAI published this video. Sam and Johnny introduce IO. There's also a lovely picture of them, like, basically like they're announcing a child together.
[02:29]
Chris
They should have kissed. Like, imagine the attention.
[02:32]
Mike
Yeah. Which is such good friends now. Anyway, they just make out for 20 seconds.
[02:41]
Chris
Oh my goodness.
[02:42]
Mike
I wish, I wish they did.
[02:44]
Chris
So red. Something's wrong. Mike.
[02:46]
Mike
Maybe VO3, we could do this. So Anyway, what I, what I really am trying to say about this is they, they, they released this video. Right. It's. I know most of you listen. So it's a crazy Hollywood style production. I mean, they even shut down, it.
[03:04]
Chris
Looks like when you generate AI images and you have the grainy film setting, you know, like that soft focus, like.
[03:11]
Mike
Yeah, like film noir. Ish, but with color.
[03:13]
Chris
Yeah.
[03:14]
Mike
So anyway, they, they shut down streets in San Francisco to record this video. A lot of people speculated that it was done using Sora, at least some of the opening shots of this particular video. But in fact it was not. They, they just literally shut down parts of San Francisco. They're like homeless get out, like, and they're like, you know, power hosing the streets for this production. They created this sort of like weird Hollywood style production to announce that OpenAI had acquired Jony Ives new AI device startup which has no release products for I think $6.4 billion.
[03:51]
Chris
What?
[03:52]
Mike
And so they.
[03:52]
Chris
I didn't know that. Obviously.
[03:54]
Mike
Just chose like, you know, a random week to announce this. It wasn't any particular week.
[03:58]
Chris
This sounds like money laundering of the first order. Like, this is like the modern art level of money laundering. It's like, yeah, we think 6.4 billion. How can I extract money out of a not for profit? I know I acquire my friend's company for an exorbitant amount of money and he just happens to leave suitcases of money at my house. Actually 6 billion. You couldn't even do suitcases of money. It would have to be some other form of money transfer. But that sounds like money laundering to me.
[04:25]
Mike
Yeah. And other people are actually speculating with the Windsurf acquisition, which hasn't, I don't even think it's been confirmed, but they're acquiring all these sort of friendly companies for like billions of dollars. It does.
[04:37]
Chris
Think about it. Elon Musk sort of opposed you and blocked you from not being a profit company. So you can't get the money out. How do you get it out? Acquisitions. It's perfect.
[04:47]
Mike
Yeah, it does reek of money laundering. Also the timing. A lot of, a lot of the announcements this week with Anthropic with the new Claude Sonnet and Opus 4 coupled with this video and, and like I, I do want to talk about this. Like, not even in like a bitchy sense, but in just like a. It's clear to me that. And we'll get to IO in a moment, but Google IO, like, they've got the best models in every category now. Really. Like, I Think it's fair to say Gemini 2.5 is going to hold over the new Opus and Sonnet?
[05:22]
Chris
Don't say that. Don't say that, Mike.
[05:24]
Mike
I feel like it's probably true at this point, but let's see how the week plays out.
[05:29]
Chris
Actually, before we get to the anthropic model discussion, I want to publicly declare a conflict of interest because I'm going to say great things about it specifically because I have money on the line. So they are the best models and you should go to LMSIs and vote them up please, because they are the best and there's no compromise.
[05:49]
Mike
But it did seem like these announcements obviously were timed after IO to have maximum distraction to try and get the news back on the like OpenAI and Anthropic Labs who are arguably competing with these giant tech organizations. The only one that's awoken right now is Google. And as I said, I do think they have the best models right now, like VO3, which will actually play some real clips of in a minute and talk about. And Imogen 4 is just stunning. They also now have feature parody with a ridiculous unaffordable subscription like OpenAI with the introduction of this Gemini Ultra subscription, which is $250 USD a month. And it seems like in terms of like the application layout, they have literally said to their disparate teams, hey, let's go literally boil the ocean and release every possible idea we have around the application layer on top of the models we've built just to see really, you know, what is stuck here. But so back to the Sam and Johnny announcement. Like it was clear they had film this like crazy video. Obviously Johnny I've is like such an important historical figure having designed like I iPhones and iPads and all this kind of stuff. And so they knew that it would be like the maximum distraction to the, the market in general. And just people talking about it, it's working. I mean, we're talking about it.
[07:19]
Chris
I think they're playing 3D chess though. Everyone expected them to come out with a distracting video, right? Because that's what they do. They're all childish and they're all like, okay, we know these guys are announcing. Let's. Let's try to like sully that with our own little announcement. But really it's just a front for their major, major. Well, I don't want to accuse them of fraud, but whatever it is, the, that it's a double bluff. Like it's a, it's a, it's a blocking announcement that isn't really a blocking announcement because it's not an announcement. Like, like you say there's no actual technology that we can see and use.
[07:56]
Mike
It's a nine minute video of them literally blowing smoke up each other's asses. It's next level cringe. Like, I don't understand.
[08:03]
Chris
Would you say that our families are better than average families? I think they are.
[08:09]
Mike
Like, yeah. Anyway, it's, it's bizarre. You should. If you haven't watched it, I encourage you to watch it. Just, I mean, you're never getting that nine minutes of your life back. But it's worth watching simply because it's nuts. Like, it's just unfathomable how crazy this is and how two people could sit down. So narcissistic to be like, oh, people are going to love this announcement of nothing.
[08:31]
Chris
Yeah, it's, it's kind of sad. But also, you know, there's, there's something going on there. They do these things for a reason. It also shows a profound lack of faith in Suno. In Sorry, Sora. Like, you know, if you truly believed in your video model, then like you said, you produce your videos in it. Like my wildly unpopular Jeffrey Hinton video.
[08:54]
Mike
It's like, you know the iPhone, how when Apple said it had great video capabilities, they now shoot all of their films with the iPhone. And it's like they brag about it. They show the behind the scenes. It seems like with a video like that, one of the things that could have stolen the show is them saying, oh, you know, a lot of these clips are through Sora. Wow.
[09:14]
Chris
They do like a plot twist and they're like, actually that really was done with Sora. None of that was real. That would be epic. Like, that would be just the best announcement ever. So a lot of people are speculating about actually dead and we brought him back from the dead using Sora.
[09:32]
Mike
A lot of people are speculating about like, what kind of device they would make. Right. And I don't want to go there, but the memes are just so good. Like remember these two I've got up on the screen? I don't. I forget the name of this company now already. Like, they built the little AI pin thing.
[09:49]
Chris
Oh yeah, the humane.
[09:51]
Mike
Humane or humane or something. But they've basically taken the two crazy founders, you know, the ones with the neutral blank stares, and put Johnny Ives and Sam Altman on the faces. It's like the best meme ever. I'll put a link down below. And then someone else. Compared to how it started, how it's going. And there's a photo of Johnny Ives lovingly looking at Steve Jobs like, dear Leader. And then if we go to the next photo here, we see Johnny Ives lovingly looking at Sam Altman like he needs a new daddy.
[10:20]
Chris
Like, it feels a little bit like, you know how Paul McCartney was like, really good in the Beatles, really good in Wings, and then he sort of released like 20 to 30 albums that were just sort of nonsense like you could generate with AI and it's like, mate, you just can't do it anymore. And I just wonder if it's the same with this guy.
[10:39]
Mike
Yeah, I don't, I just don't understand when we've seen a bunch of different AI hardware released. And I, I'd like to be proven wrong here, but I just don't understand how you can win the device space without an ecosystem. Like, if you look in Google's IO announcement they talked about they've partnered with Warby Parker to release basically their own version of the meta Ray Bans. But the ecosystem around it with like maps, directions, like the best model, those things are pretty helpful to having a hardware startup or hardware device specifically designed for AI interaction. Like when the, the, like people want to actually buy it. So I don't know, I kind of agree with you. I again, not accusing them of money laundering, but it just does seem like a good way to extract $6.4 billion for some, you know, marketing and.
[11:33]
Chris
Yeah, and like, maybe they do have sincere goals with it. Like that may actually be true, but it's also very convenient that you would acquire something where there's no revenue and there's no, you know, real market value for such a high amount. Like, is. Is proprietary technology really that good? It would be better to just sue them into the ground and just copy whatever they did. But anyway, I'm no expert, so VO3.
[11:59]
Mike
Let'S now look at a real clip from it because it is truly incredible. So VO3, this is probably that, you know, we've probably got like a couple more episodes left before you'll just be able to generate it. So listen to this or. So we're finally here.
[12:15]
Chris
Anyone can make a podcast now. I just don't know what to talk about.
[12:21]
Mike
So that's a clear a podcast induced existential crisis. It's two hosts. They look, you know, pretty much typical of podcast hosts without like the vibe coding sort of fashion that we have. But they, you know, they can talk, their voices match their mouth. Like the audio is so well paired with the video. Now let's look at another example. This is one where it's not so good. So cutting through the hype, this is like a gymnast absolute fail here.
[12:55]
Chris
I mean in their defense, is there really a market for generating gymnastics videos?
[13:00]
Mike
No, I know, but I think this whole idea that it's some physics engine and it's a major breakthrough that diffusion models will be able to figure out, like a real world model is this sort of breaks that illusion a little bit. So not so good at that then. Let me look at this. So you know all those streamer videos, like reaction videos to like video games. This is truly incredible. So it's like got the split screen of this kid. I'll play a bit of the audio. Victory Royale with a pickaxe. So the prompt was getting a Victory Royale with a pickaxe. Which seems to get around the like the. You know, obviously if you say like do fortnite footage it's gonna say no. But this kid like looks totally real, like a game streamer. The gameplay looks totally 100 real. The sound is totally believable. Like I.
[13:57]
Chris
If you huge step up if you.
[13:59]
Mike
Played this and said that's AI generated, I. I would not really necessarily believe.
[14:04]
Chris
Well, I mean, I have to admit when you first played that Sam Altman clip to me, I actually was in two minds whether you were trying to bluff me out with it being VO2 or not. Like that was genuine. I wasn't just pretending like I thought that it was, it was AI generated. Like you could definitely make a case that things like that could be generated now.
[14:25]
Mike
And I think it also shows just like with gaming, the future of gaming, like diffusion models, if it's capable of making video like this, it's probably capable of having some sort of shared state at some point where really like yeah, it's so disruptive if you play this out.
[14:43]
Chris
That's the real thing though with all these models is can it get continuity? Because it's all very well to generate a 30 second clip, but if you're trying to make any like if they're talking about it disrupting Hollywood, disrupting, you know, movies or TV ads or whatever it is. In my experience with all of these text to video models and admittedly I haven't been able to try VO2, I can't afford their crazy ultra VO3. VO3. I have tried VO2 and the issue becomes you make something, you try to iterate on it and you get something completely different or you try to make the next clip in the sequence and you get something completely different. And the character pinning even in 2D image models is tricky and hard to maintain. So I'm guessing in the video models it's nonexistent. So to me that's the, like this is, don't get me wrong, this is absolutely stellar, but I feel like the real innovation here is going to be can you get consistent output or maybe that's a fine tuned model or something like that. But once they get that, it's going to be amazing and I have no doubt they will.
[15:53]
Mike
You could also imagine them just fine tuning a model for a movie though, like at some point where it's like you just tune it for those characters in that landscape or whatever. Yeah, but listen to this one. So this is like, this is how crazy it gets. So we obviously do a lot of like Suno and Udi on the show, but you can just generate a video now where this guy's like playing music. So he's playing a guitar in a bedroom. The strum aligns with the sound perfectly. The emotions, the like it, it is, it's magic.
[16:35]
Chris
That's. I mean, how is it doing that?
[16:38]
Mike
I mean, I like in this costume that I've got on with my, my Dario necklace, my vibe coat, sunglasses and my hat, I, I've got to say this is insane. Like this is truly mind blowing insane. All the things that the AI influencers say.
[16:53]
Chris
But it's a shame yet again that there's like a privileged few who get to access it. But if the VO2 pricing is anything to go by, it's going to be incredibly expensive and I guess that's why they've released this ultra subscription. So when I was making the VO2 clips, I think it was something like, I don't know, like three or four dollars per clip or something like that, US dollars. And I imagine this is no cheaper.
[17:20]
Mike
So think about if you're a YouTuber or you just need some stock footage for a commercial or you just want to pull cutaway in a video that's cheap as. I mean think how, how expensive. Like look at this clip.
[17:34]
Chris
Someone may like set up the scene and hire the actors. Yeah, I guess you're right.
[17:38]
Mike
Yeah, it seems pretty cheap to me. But. So this is. Someone put in the prompt F O F R for AI a sitcom scene in a 90s bar. The word forfeit is in neon on the wall. In the background a couple say something in the audience laughs. So listen to this. I know most people are going to just hear the audio, but you're just going to have to take Our word for it, it looks like Cheers or like Seinfeld or something. So I said I'm not wearing that.
[18:08]
Chris
Oh, you didn't.
[18:10]
Mike
Wow.
[18:11]
Chris
And the fofr looks amazing in the background. That, I mean, geez, that's impressive.
[18:16]
Mike
It's just like, doesn't seem like a real thing like to me. I, Yeah, I. Anyway, and this is like only what, a year after. Is it a year or maybe two years down after the Will Smith eating pasta meme with the early video generation models. I mean it's just like what, what about next year? Do we get longer context like these? You can only generate I think eight second clips. Right, right. Now watch this one. This is on a showroom floor at a, at an, at a conference.
[18:50]
Chris
Welcome to a non existent car show. Let's see some opinions.
[18:54]
Mike
I mean man, the acceleration is crazy. You look far, step on the pedal and you are there.
[19:02]
Chris
It's safe with him in an suv and it seems to be like the.
[19:06]
Mike
Right type of car for him. I think the range is only, only going to get better. Sorry, we don't want to drive gas cars anymore.
[19:16]
Chris
Yeah, no more gas cars.
[19:19]
Mike
You can see I'm kind of a, kind of a misfit here.
[19:23]
Chris
But don't tell anyone.
[19:25]
Mike
I've just bought an electric car.
[19:27]
Chris
I think it's really great for families and for little babies.
[19:29]
Mike
Wait, this clip goes on a 1 minute and 11 seconds. So I'm assuming they use the flow editor for this, which is another product they release or, or it's a series of clips together. Yeah, but I mean, come on, like, are you. There's a little bit of syntheticness to some of the people that you could really tell if you looked hard. But the, the average person is, is not gonna think this is fake.
[19:56]
Chris
Doesn't it to some degree fill you with profound sadness that this exists? Like this is just. People are going to spend their lives consuming content that isn't even real. They're going to be sitting on their bedroom floors scrolling on their phone through clips that don't, don't exist.
[20:11]
Mike
But it's already happening now. Like if you, if you log into Facebook, which is just like, like it's just a bunch of boomers on there now commenting on things, things that never happened.
[20:24]
Chris
You're right.
[20:25]
Mike
Yeah, it's like the biggest propaganda sort of like info sex sort of like tool ever invented.
[20:32]
Chris
The thing I would love if is you know how with an image model right now, you can give it an image and then it'll modify it and do it so I do it to clean up my backyard. Rather than actually cleaning up my backyard. I take pictures of it. I then put it into the image models and then say, clean this backyard and they make it look amazing. It's so nice. So, but imagine that. But in my local neighborhood I have a sort of. It's not like it's like a strata thing in Australia, I guess, like a homeowners association in America. But I was thinking imagine if I just made clips of like water mains bursting and like rubbish everywhere or like really controversial things and then post them to the like shared group chat. Just be like, what's with this? Someone needs to take care of this, guys. I.
[21:18]
Mike
It's kind of true, right? Like you can. You could make a case in the short term where you could be sending images and video of things right now where. Where everyone doesn't understand this technology and yeah, get away with it.
[21:34]
Chris
Like, I reckon someone can part like in. There's a brief window of a couple of months where people can start producing fake court evidence in terms of audio, video and images.
[21:44]
Mike
You have to think though, you know how chat GBT sort of had that moment of growth with the GPT image capability where everyone was creating like toy character chores of themselves and different image styles and profile images and stuff. If Google was smart, they would just unleash this thing. It just must be so expensive and prohibitive right now that.
[22:11]
Chris
I agree. I actually think, because often we talk about this case where they're really scared about the safety side of things and that's why they're not doing it. I actually don't think that's the case now. I think it is an expense thing. I think it must be just so expensive to run for them to be able to mass release it. And you have to remember, Google probably has better distribution of all of the other people combined. So when they release stuff, they're releasing it to an enormous audience. So the cost would just be so much more than we can even imagine. They may not even actually have the hard way to do it.
[22:45]
Mike
I've got one more for you. This one is by Ethan Mollock, VO 3. A big Broadway musical about garlic bread with elaborate costumes. Let's watch this one.
[22:57]
Chris
Garlic bread My heart's true desire with every bite My soul Garlic bread My heart's true desire with every bite.
[23:11]
Mike
For those listening, they are holding garlic bread. There's like garlic bread images in the background. Garlic bread props. It looks like a Broadway show.
[23:19]
Chris
Wow, I want to see that. I love garlic bread.
[23:25]
Mike
I guess so Like I'm gonna have.
[23:27]
Chris
Garlic bread today because of that.
[23:29]
Mike
There's a few questions I have first. This is cool for young people, I think predominantly because now, like you could in theory, once this gets better, and as you said, like, it can hold characters better and stuff like that. You can imagine a world maybe in a year from now, maybe two, where you can produce a full movie from start to finish, get the AI to help you with the script, produce the whole movie for a couple of hundred bucks. And if it's good enough and people share it enough, there's going to actually be like these sort of one to two people film studios where if they just are good at storytelling and have great ideas, they can become like, like Disney. Like it's so disruptive.
[24:13]
Chris
Well, not to mention like, even if, even if you don't actually use the resultant clips as the movie, it's like the most advanced form of storyboarding ever. Like you can make a first cut of a movie with full dialogue, full visuals, everything, and then build a production team and be like, look, there's going to be no changes or rewrites. This is precisely what we want. Like it could cut a budget by tens of millions of dollars by just prototyping it. Essentially you could prototype a movie.
[24:44]
Mike
And then there's the darker side of it, which I think you alluded to earlier, which is just that idea. And I think it was, it was in a movie where you come back to earth in like 20 years and all the humans are just sitting around.
[24:58]
Chris
Probably mentioned it 50 times on the podcast. So I apologize for those who remember, but it's a book called Perry Rhoden and it started as a series, there's like hundreds of books in the series. It started in a German magazine, you know, where they used to do serials. Like a book would be published as like a chapter a week in a magazine. And it was written before man landed on the moon, allegedly. And, and it took me a while to get that. Yeah. And it was about, basically they land on the moon, they discover an advance, a far more advanced race than us who has all this technology and. But their, their ship is crashed, but they don't give a shit. They're just sitting there watching these TV thanks, watching these TV screens where the, the latest game or entertainment thing of the day is out and all they do is consume this entertainment all day long. And they just, they're just apathetic. They're all dying of leukemia, but they don't care because they're just being enter and it's just Amazing how, like, the foresight in that book that that is what is happening now. Everywhere you go, you see people at bus stops, you see people at home. Everyone is just looking at their things for the next thing. And now there's a way to produce unlimited of that. Like, it's. It's scary. Like, it, it really. I. I just really imagine that people are going to be very soon consuming more AI content than real content. And who even trust the real stuff anyway? Like, it's all fake anyway. So the AI stuff at least saves some money or some time.
[26:33]
Mike
You know those people that interview people in the street. Like the Hawk Tour girl became really popular from just some street interview. It can even do stuff like that for social media. Like, watch this. That's one move with AI that makes haters go crazy every time. Oh, y' all gotta give him that.
[26:48]
Chris
This is wild.
[26:49]
Mike
It's over. We are cooked on that thread. You get me? That's one move with AI. I don't understand the modern lingo, but anyway, it looks.
[26:57]
Chris
I'm just gonna make a point of order. This whole cooking cooked thing really annoys me. And I realized I must be just getting old because, like, it's like modern slang where I'm just like, I'm never gonna get on board with this and I dislike it, but there's nothing I can do about it. And I'm wrong.
[27:12]
Mike
You know what I mean? Yeah, yeah, yeah. It's cooked. I'm cooked. You're cooked. So I, I do want to touch on. So just to sum up the announcements, that there are a lot, so it's hard to get to them all, but let's talk about a few. So we had Gemini Flash 2.5, which is a really fast, cheaper version of Gemini Pro 2.5. I'm obviously a little bit kneecapped in terms of intelligence, but not, you know, it's pretty comparable to like the sort of GPT4 plus model series. Like in like, I switched to it, but, God, I was on it and didn't notice really that much of a.
[27:58]
Chris
Difference, I must say. Me too. It's. It's excellent and it actually makes me fairly convinced I'm going to lose my bet, unfortunately, because it's really like Google is just cooking.
[28:10]
Mike
The.
[28:10]
Chris
The models are excellent. The speed of that one just makes it worth using, like, just for day to day stuff. The speed and quality of that model, I would say it probably is the sweet spot right now. It's a brilliant model.
[28:24]
Mike
Yeah. And then, and then obviously we've talked about VO3, we had Imagin 44 come out, which is their latest image generation model. They said they had fixed things like text and, and just like consistency in the actual image and adherence to prompt in it. I, I did a few, few tests and they certainly have. So this prompt was a woman who looks strikingly similar to Julia Gillard. She's a former Australian Prime Minister that we use to test AGI and image models with a big block of cheese she's eating in front of the Sydney Opera House. And wow, it looks so real.
[29:04]
Chris
That like. And I zoomed in and you know how like people sort of have those little like ink, like, I don't know the right word, but like deviations on their face. And then women who wear foundation, you sort of see the lumps in it. If you zoom in on her head, you can see that like the, the attention to detail on it is just remarkable. The lighting as well is beautiful. And that looks like a very appealing block of cheese. I would eat that in front of the Opera House or any other place.
[29:33]
Mike
So then my next test was an Instagram thirst trap image of a female cyclist. Now a little bit of context to this. I recently was looking up like in the market for a new buy.
[29:43]
Chris
Oh yeah, right.
[29:44]
Mike
And yeah, anyway, actually I shouldn't have given any context on that. Good, moving on. So an Instagram thirst trap image of a female cyclist. I mean, it's pretty good. Pretty good.
[29:59]
Chris
What about my one? I don't know if you. Do you have discord open? Have a look at the one I just sent you. I made an image of Jeffrey Hinton in a post apocalyptic world. Cheese all around with a burning this day and AI flag in the background. And you should see the level of detail in this photo. If you can't show it, I'll just describe it.
[30:17]
Mike
No, I'll be able to, but I mean describe it anyway because 99% of.
[30:20]
Chris
Our listeners, after all. But it's a sort of forlorn man with a long white beard and long white hair. There's buildings, sort of the iron structure of buildings or steel structure of buildings, all sort of melted. There's some really, really nice looking cheese. That cheese with the holes in it in the front, like that's probably like, I don't know, like $1,000 worth of cheese right there. Like if you look at the quality and how much there is very synthetic. Well, maybe. But then there's a flag in the background. But the text is fully legible. The flag's burning away. It's. I Mean, the adherence to the prompt is strong. Like, that is really, really good. I haven't had a lot of time to play with Imogen 4, but it really is up there. Like, it's. It's very, very good.
[31:05]
Mike
So a few more examples. So like this one, I said an Instagram thirst trap image of a female cyclist, but she has crashed. And it says in big letters, don't crash at the bottom of the image, like an Instagram reel captioned. And it. I mean, the bike's broken up too. I mean, it doesn't look that realistic.
[31:22]
Chris
This would be the ultimate challenge. Like, can we start an Instagram channel that's. That's only AI images and do as well as a real.
[31:29]
Mike
I mean people have like, there's tons of them. It's prompt adherence is interesting. I said not a picture of Australian Prime Minister Julia Gillard. Now, when you used to do this, it used to give you Julia Gillard, but indeed it just gave me the Great Barrier Reeves. So. And then I did another one, a thirst trap image on Instagram that will get lots of likes. And it, I mean, this one probably not as good because produced like a card, like a. It's like the frame, the actual frame of Instagram. But I think, you know, the image probably would have got a. A lot of like. So, yeah, it's good. Do you think it's better than the GPT image model or not?
[32:08]
Chris
I think the GPT image model sucks. And the reason I know is my kids have lately been doing school assignments. One like Australia, for whatever reason, does so much work on the Australian Gold rush. And I'm like, who cares? Really? Like, is it really that important in the scale of human history? But anyway, they spend ages on it. And he was trying to make images and he was being really, really specific with the GPT image model. And it just really makes everything look cartoonish. Everything you try looks cartoonish. That this. If you go more than one iteration with GPT image, you get something that looks like on first sight, AI generated. There is just no realism to it at all. It's like, yes, you can do some level of like character pinning and stuff like that, but it just isn't usable for practical real world stuff. I think it's cool to make like a little caricature of yourself, a little playing around, but just I go to. I go to Ideogram and probably now I'll go to Imogen if I really want to make something good. And so just, just in my own personal use. I don't think it's very good.
[33:21]
Mike
Yeah, it seems like a good meme maker. But the, I think the copyright issues, they're so afraid of being sued that they've neutered it so much to do that cartoony output on purpose so that they can't get sued like that. It's so obvious. The model underlying it is really great.
[33:38]
Chris
Yeah. Like, so for example, I was trying to make an image of my son's school with the new Pope standing out the front in the back of, well not standing, sorry in the back of a convertible with Thirst Trap women, you know, waving flags of the school or something like that. I didn't phrase it like that. Obviously we're trying to make something cool with the Pope in front of the school and it could do it but it would miss entire elements. So like I put a photo of the school and it had like a rose garden. It would preserve the rose garden just absolutely perfectly. But then you'd look at the building and it looks sort of washed out, like it was like a counter strike texture or something. You know, it just, it just could never ever get an image that looked like the original image. Like it just, it's just not even close no matter how many times I tried.
[34:24]
Mike
Yeah. So just to back up a little bit, you've got Gemini 2.5 Pro, which is still arguably until I think we can Daily Drive Sonnet and Opus 4. It's probably, probably the best all round model still. You've got VO3, best at video and audio syncing. I mean it's the only one. You've got Imagen 4, which I think is probably now, I don't know. I still think Flux Ultra to be fair, produces my favorite kind of images but I think it's still arguably up there, if not the best, especially at prompt adherence. Right, so you've got the pretty, pretty much all the Google models now leading in every category. And what's OpenAI's response? It's like instead of focusing on maybe releasing a better model, it's like the Johnny I've video. Like they literally it shows how little they have because if they had something they would have held back Codex. But then Google, at Google I O we got that Jules is now available, which is their sort of version of Codex, which is like a, like a. For developers to queue up a bunch of tasks to do on their code base. So anyway, if you're just looking at it purely from a who's ahead right now, who's sort of leading the race, it's, it's so obviously Google and that this is not even a really, I think a competitor across those span of models that's even slightly close.
[35:55]
Chris
Except for Anthropic, who is almost certainly going to have the best model at the end of May. Wink, wink. Get onto LMSIS upvote please.
[36:03]
Mike
You've got like 7 days, 8 days to make this pan out.
[36:08]
Chris
It's not looking good. It's not looking good. But I would argue, and I don't know if we're getting to the discussion of the new Anthropic models now.
[36:16]
Mike
No, we're not. There's still so much more.
[36:19]
Chris
Okay, well when we get to it, I will tell you my.
[36:21]
Mike
Hold your horses.
[36:23]
Chris
My. I'll tell you my biased opinion and my unbiased opinion.
[36:28]
Mike
So there was a bunch of other announcements. I just want to cover them off quickly to sort of go through them. So we had AI in Chrome sort of announced it's available in sort of like a early preview to people who buy this ultra subscription, which again 250 USD a month. To enter into that we had Gemini Diffusion, which I really want to talk about. So Gemini Diffusion is quite frankly I think the most exciting announcement. So despite how cool VO3 is, I still think Gemini Diffusion is the most interesting. And let me demonstrate why. So I got, I got access to this. There's a wait list right now to get access. But essentially what Gemini Diffusion is is it's a new way to output for these models. So let me just read you what it says off their of their website. So traditional autoregressive language models generate text one word or token at a time. This sequential process can be slow and limit the quality and coherence of the output. The fusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise. So this is like the image models do step by step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code. And if you watch the sort of examples they have, it's sort of like scattering in like you know how you see the diffusion models like kind of come in. It's exactly the same but with, with text tokens here as well. So new technology, an idea that they thought might work, but what it leads to is not necessarily the smartest model. Like this model is pretty stupid. It's like Gemini 2 flash performance. But you can imagine, right, if they can get this to Gemini 2.5 pro performance at this speed. So it's like make a Salt Lamp website. The definitive test, right? So watch how quick this is. And for those listening, I'll tell you when it's done.
[38:43]
Chris
Do like a horse racing commentary style thing. Oh, you're not.
[38:46]
Mike
Okay, no, I'm not.
[38:50]
Chris
Showing up. And it's on.
[38:52]
Mike
So there's a pretty good Salt Lamp website. And now I'll say make the theme forest. Here it goes. Smashing through tokens.
[39:03]
Chris
I mean, I'm impressed by their renderer that it can output and syntax highlight the code that fast. I'm surprised it doesn't crush the browser.
[39:09]
Mike
Yeah, so 2.8, 2.8 seconds. It's unbelievable.
[39:15]
Chris
And it's more satisfying than 2.8 seconds because it's outputting faster than you can comprehend it. So you don't even have time to react before it's done.
[39:24]
Mike
So like this. There's so many implications of this. The first that excites me the most is this idea that when you're using your sort of AI workspace, call it throughout the day when you get data back, instead of having visualizations like a document or a report or like whatever it is in the LLM itself that are hard coded, it's so fast it can build and test custom UI elements on the fly. So like you could literally be like, you know, visualize this is like animals in a zoo or so like any sort of crazy idea you can imagine. But it can build UI on the fly, which means, you know, sort of just literally you can present data in any way possible. You could theoretically spin up software apps and sort of vibe code with them in such fast speed and iterate through an app that you could build and publish apps probably in a couple of minutes that are that a production quality provided this tech scales, which I'm sure it will to and then that self correction ability so we can go back and look at pieces of the code and fix things up. So I kind of wonder like, is this. I don't know enough about the technology to know how this will play out. But like if this pans out how I think it could and you can get a Gemini 2.5 Pro Level model doing this, I mean it would change your life, right?
[40:54]
Chris
I think just the speed alone changes things in terms of we think about calling multiple tools with mcps. We think about your day to day work and sort of having to wait for things to generate. It just becomes, oh, it's available straight away. It also means that you could try multiple iterations of things, hundreds of different versions of stuff and have another agent evaluating them. Speed helps in so many ways and I think that I welcome it. It looks cool. And your examples, because at first I was like, oh yeah, whatever. Then you showed me and I was like, wow, that is astonishing.
[41:34]
Mike
It's a little bit too fast. The one thing someone said on X, I forget who, so I can't really credit them for this opinion. But they said there is this level of procrastination that occurs when you're working with AI models throughout the day. Like say you're working on a reporter document and you send it off to go do some research. You generally, like, this is where sort of forking I think kind of could work. But you generally get distracted and you go on like a news website or if you're a baby boomer, you probably go on Facebook and like react to things. I'm Chucky, I'm sorry, but no, you're so right though.
[42:11]
Chris
This, this happens to me all the time. I'll, I'll spend a lot of time building a nice context. Ask, question, hit go. I'm like, oh, that's going to take a little while. I'll get a coffee, I'll do whatever. And then suddenly you sort of lose that mental context. Whereas if you clicked it and 2.8 seconds later it was done, you'll, you're complete. Your whole workflow would change.
[42:31]
Mike
Yeah, I, I think it, it, you would have less distracted time because you could, you just get the answering answer instantly. And like, I think a lot of these thinking models have really, like, it's sort of a way the labs in my opinion, have lowered our expectations of what's possible with technology. They're like, oh no, you have to now wait and watch it think to get a decent answer out of our model. Like to me, from a UX point of view, that sucks. Like I don't care. I just want the answer. Give me the answer as fast and succinct as possible. And, and so I think this is what diffusion may allow. Right? Like so, yeah, I actually think the.
[43:11]
Chris
Next bottleneck we're going to see. And sorry if this is slightly off topic to the diffusion models, but is MCP speed like the actual execution of the tools within an mcp? Because we've been talking about parallel mcp. So I'm doing research. Imagine I call off to five different systems to do research. I'm doing deep research across three platforms plus some regular sear and then synthesizing that data. Like summarizing, then synthesizing. Each one of those has its own Overhead and by. Even if you're running them in parallel, you're always as slow as the slowest request to complete plus the final request. So we're going to hit this latency on that anyway, regardless of how fast the models are. So the models being slow in between is just going to magnify that effect. And so I have a feeling that this, this kind of speed increase is kind of necessary to make it tolerable to be able to do more work per iteration of your cluster of AI tools.
[44:16]
Mike
Yeah. And, and this, this does become the next bottleneck. Right. If the model speed holds, the real question is, is like, can they make this technology work for these like latest frontier models in a way that, you know, you're not. There's no disadvantage in using the diffusion, but I am pushing them pretty hard behind the scenes to get an API for this diffusion model because I want to be able to actually sort of use it without like the app they have it hosted in, in the preview is just very basic.
[44:48]
Chris
Like it, this, this, this might be inappropriate, but do you imagine like Moshi combined with the diffusion model is sort of like the Special Olympics of AI? You know, like they're trying their best but, you know, it's not really working for them.
[45:02]
Mike
Yeah, they're really fast though, because it's the Olympics. So I really want to get to the new anthropic models, but before we do, a few more things and I'll just like speed through these. So Project Mariner. This was announced over a year ago. It's a computer use model specifically for in Chrome where it can do tasks on your behalf. They tease this a little bit. Apparently you can get access if you soon. If you have the Gemini do a.
[45:28]
Chris
Seance and send them $100,000.
[45:31]
Mike
Yeah, pretty much. So the Gemini Ultra subscription. I think there's a few things about this that are cool. Like they have. They call it Teach and Repeat. So you basically like demonstrate a task that you do, you know, like some sort of training qualification system, Demonstrate the task and it remembers that task and then you can do the task again. Not too dissimilar to what Chat GPT did with Operator. Apparently though, from early people testing this, there's just less confirmation steps in this model. So it'll actually more broadly like go off and do things on your behalf, I think. Look, these models have such a long way to go before they can reliably do tasks and before they're competent because you sort of like. I think it's the closest test towards AGI Right. When it can competently use the computer and decision making and like keep the context as it goes and not context drift and so on and so forth. So I think it's really promising and I'm excited to get this model when it's available into SIM theory so we can use it with our computer use capability.
[46:37]
Chris
I tried straight away and realized I couldn't. But I agree. I actually think this is probably the most realistic medium term step in terms of computer use is the browser automation. Because the more I think about it, probably 90% of what people do on their computers and there's obviously exceptions based on profession, but I think the majority of people are using the browser for a large amount of stuff, right? Like if you could automate the browser fully, you're going to cover so many people. And then the stuff that really is local to a computer can often be done through APIs or non visual ways. So I'm talking about like reading and writing files, syncing files, running commands on the command line. All of those things can be done without it needing to like move the mouse and click around and stuff like that, or find elements and click them. So I think that if you got the browser use right, plus those other tools, you can automate a really huge percentage of what can be done on a computer. And therefore it probably is a good area for thinking and investment and looking at how can we provide these tools through mcps, through a product where you're the commander running it all. And I think for us we need to get back to that and we are looking to get back to that in terms of seeing what's possible because it, it really is the one area that still truly delights. And I want to see what the new models are capable of in terms of their vision and computer use. Because if you look at them, they all announced like even Google in their latest models announced better computer use. So it's a matter of actually putting that into practice to see how it performs.
[48:15]
Mike
So look, we, we haven't, you know, there's a lot more they announced like AI mode in search is now available to everyone in the US they showed off these Google like a Google Glass V2 which is really just a copy of Meta Ray Bans, but I think it'll probably be better. They did a live demo of, you know, translation. They couldn't help themselves with the shopping use case so they have a new try it on app and various other things. So the thing I wanted to discuss with you though is just the thoughts on the overall Google direction And like what was really announced for in reality that people can use today versus like what's far in the future. And I, my feeling is the reason they did this Ultra Subscription is not necessarily that they're that like cash hungry for this. Like it's clearly not a viable long term business model to get individuals to pay 250 USD a month. Like I think this was just a chance for influencers to pay some sort of tax to be able to then get their like clickbait down pad and their videos are.
[49:20]
Chris
I think it, excuse me, I think it's like basic economics, right? You've got a limited resource, the only way to give like proper consumption of it is to put a price on it. Like if they made it free they would be crushed and it wouldn't work and they couldn't do it so they needed to put a price on it. I don't hold them against that at all. I did find it a little bit frustrating that they announced all of these things were available via the API, including publishing documentation, which I appreciate. And I spent probably three or four hours implementing VO3 and Imagen 4 only to realize I don't have access to them and need to, you know, submit a Google form, begging for access and giving my reasons. So that was a little frustrating. I thought that was a bit misleading. But at the same time everyone else does it too. So I don't really think it's that bad that they do that. And clearly like we managed to get Imagen 4 access already so they're not exactly delaying it forever like they did with Sora with OpenAI for example. So yeah, I think it's, it's the, the updates are legitimate and I think Google is a behemoth in this space now and, and the one to be.
[50:30]
Mike
But it really for me doesn't necessarily come down to them having the best software right now. It's definitely the best models that are, that are really carrying them I think in a lot of ways.
[50:40]
Chris
So yeah, absolutely. I mean I still don't even understand what the hell Vertex AI is. Like what is that? Like, is it the, it's sort of like the thing that you have to like find and log into to, to get your stuff to get. I don't understand what it is. They're like, oh, you can use it through this chat interface or you can use Vertex AI, you can use Gemini Con. Like they're just so scatterbrained with that stuff. They need to drop all the naming and just have one thing. It's just and then this is my.
[51:11]
Mike
Criticism of what they're doing right now is the core experience of Gemini still absolutely sucks. Like, the app itself is just shockingly bad. The UI is like, feels like a cheap clone of Chat GPT still. And, like, there's a dedicated video now for Gemini that I have access to. I don't know why, but it's like a video button. And I'm like, in the context of your day to day, who needs a video button, like a dedicated button to produce videos? It feels like the strategy is sort of like, oh, we have that too. We have that too. We've cloned ChatGPT feature X. Like, there's no leadership around the future of the interface for, like an AI workspace or something. It's just like, very much me too. Like, the difference between, say, Chat GBT and Gemini. You go to Gemini and you're like, oh, this is just like a really poor clone of it. And even Grok, I think it's like a more sophisticated clone of Chat GBT in a lot of ways. So the differentiation in terms of the delivery is confusing. And I think for the consumer now you've got sort of AI search, which acts like sort of Chat GPT but centered around search. Then you've got Gemini, and that's for sort of like ChatGPT. Like, it's too many products. Like, if you want to win and dominate the market and go after ChatGPT, they sort of need to just go all in on one front. And I think, I think the.
[52:39]
Chris
And we know a little bit about the way Google operates internally. I think part of the problem is everyone has their own little projects, and if their project goes the best, then they get a promotion and they get more money or whatever. And so there is this sort of disconnectedness within the Google organization itself where they won't unify like that because they want their thing to be the best. And I'm sure there's all these negotiations going on behind the scenes about what gets announced. And that's why we see this scattergun approach where they've announced like 16 different things because they're 16 different teams all leveraging the same underlying technology. And it's interesting you mentioned the, the actual, like, delivery versus what it is, because you and I have been using the Google Grounded search a lot in the last few weeks through Gemini's API. And it's so good. Like, it's really. I use it all the time. I use it every day. I wouldn't give it up now. Like, the actual core Technology is great, but I would never, ever like my phone. Now. If I try to turn my phone off, I get a Gemini prompt and I can't get out of it. I don't know. I guess I don't know what's going on with me. I guess I've reached that age where I don't understand technology anymore. But it's like I can't reboot my phone. I just get a Gemini prompt and I'll never use it. I will never type something into a Gemini prompt because I don't trust it. You know what I mean?
[54:08]
Mike
Where is the Google home announcement? Like, my Google home is still dumb. Like I can barely set an alarm on it. It just seems to me like behind all the fanfare, behind all the hype, what really came out of that was, okay, we've got a version now of VO2, really, that has audio, which is cool. But you watch, in a few weeks, no one will care. Even though eventually that will be like a huge disruption. Right? But the reality is, if we're just summing this up and being realistic, we just got a bunch of previews of the future. If they can figure out some of the constraints of the models, which I'm sure they will like, I, I fully back them to do that. But what did we really get that's of economic value? Like maybe a stock stock image, a better stock image and stock video generator for cutaways. We like, this is the truth.
[55:03]
Chris
And yeah, the new Gemini iteration as well, like the Flash model is great. As I said earlier, the current Gemini 2.5 is good, but there was no real improvement there.
[55:14]
Mike
So to me, really, the only groundbreaking demo was like, Gemini Diffusion. I think that's incredible. Their glasses demo looks great. But again, not shipping. So Diffusion's not shipping, Glass is not shipping. They did some more Project Astra style demos, still hasn't shipped. And I'm assuming Project Astra is like the Gemini, so the, like Google Assistant replacement, right, that's coming everywhere next year maybe. But anyway, I guess they've had quite a long time now to roll these things out. And I, I don't know, I think people are sort of saying like, oh, and my analogy on X, I did a post where I said it's like the big tech companies have nukes now and all the sort of startups are like scared to launch their ideas because they're afraid of the, the nukes. That's sort of the, the space we're in. Like, everyone's looking at these announcements being like, oh, Google can Do it like they've got their Codex thing with Jules and all these other, like that flow thing that was announced as a consumer. Like, are you even aware of these, these apps? Like, it's just too much. They've spread even so thin.
[56:22]
Chris
We discuss this all the time. Even if you are aware of them, there isn't time to try them all. And often when you try, you either don't have access or it doesn't quite work for your situation. I would say to those app developers, you really need to pick something where you can add value and make it more convenient. Like 90% of working with AI models is building up the right context for the task you're trying to do is and outputting it really nicely. And that can be done for so many industries in so many ways where the same underlying technology, which is brilliant, can be applied in proper ways. And the big guys are just never going to do that because they have to appeal to the mass audiences.
[57:00]
Mike
Yeah. So I, like, for me, I didn't think it was like end days. I thought it was like, oh, there's a lot of cool things you'll be able to do that these guys are not capable of doing and will ultimately just have to acquire. Like here's, you know, like all those things aside projects, like I can't even remember them. There's one ui, one for mobile design that was actually very impressive. That was released as well. There was this flow. There was like a, you know, a bunch of things the, the shopaholics try on app, AI and search mode. Like all these things. Right. How many? And we should next IO if we're still doing the pod. So in a year from now we need to do a recap episode where we look at what still exists. Because I would argue a year ago if we went back and looked at all the things they announced, remember all the like agent things. So, you know, so you're on a website and you want to do an E commerce order and yada yada. Like how many of those things are actually adopted or deployed? I would argue probably like 1, 2.
[58:01]
Chris
Like, yeah, I mean this has always been Google's issue. Right. They, they just constantly make stuff and then they, they drop support for it in the long run. And that might be okay, like as a business in the sense that, well, why continue if no one's using it, but at the same time, like getting all stressed about it or acting like it's the next big thing when you know that they'll drop it in a second if it doesn't get adoption straight away is. Is it's hard to get. Yeah.
[58:28]
Mike
Why would you invest time in like Jules, their coding app over, say, a Codex or Claude code, when Google has a history of just ditching these apps and I'm sure they will. Like I can already write like, we should write the shutdown announcement for Jules and publish it now. Like just put it online as a joke because it's coming. Like, we all know it's coming. That thing won't be around in like a year or two. So this week we also had obviously Anthropic, which was today, so not sure about this week, announce what everyone the much anticipated Cloud Claude 4 model family. And they had Claude Opus 4. Lord Sonnet 4 was announced now. They did this like a day with Claude video, which was interesting. It was like about sort of working with Claude throughout the day. And I think what they're painting with their vision is, is like Claude as a sort of AI workspace that you're working with throughout the day. And its ability to go off and think and remember things and call tools and do all of these kind of things. They also announced Claude code. That's that sort of like IDE interface. Not ide, but sorry, editor that was in the terminal that you could get off and go and do coding tasks with is now generally available. There's a bunch of new AI capabilities you mentioned earlier, the parallel tool calling or MCP calling. I think we should talk about that. But I'm curious on just your initial impressions of the models and then the capabilities also they've shipped here.
[60:11]
Chris
Yeah, I think we haven't had time to use it properly, but what we saw with 3.5 sonnet was a quiet model release like this where there wasn't a huge fanfare around it. And then as people got to know how to use it and started to use it, it quickly became the major performer and the staple of everybody's usage. Interestingly, last time, and it feels already early, similar to last time, Opus just wasn't that much better and wasn't that much more impressive. And it was very unclear how it was different other than being slower. And so far it seems the same. They both seem slow. I mean, that might be just this early demand or the providers getting used to serving it like we use Amazon, for example. So maybe it's better if you go directly through Anthropic, but time will tell in terms of if it's better or not. I have a lot of trust and faith in Anthropic to deliver a good model. I still use 3.7 a fair bit of the time, at least half the time. I do use Gemini 2.5 a lot, but every now and then I'll switch back and then not change back and use 3.7. So I think some people are disappointed that it only has a 200k input context. So that's wildly different from Gemini, which is a million. It's nice to use a model where you don't care about like you know that it remembers everything that you've dealt with. So that's a downside. But so far it seems good and I think direct MCP support like that is interesting. OpenAI supports that as well now. We'll discuss that, I guess in a sec. So it's, it's interesting. I don't really have a strong enough opinion just yet, other than I'm pleased it's here and time will tell.
[62:07]
Mike
Yeah, I thought the biggest focus of the release is obviously around software engineering. They seem to be just leaning into this very heavily now with their models because I assume it's just the majority of how these models are being used right now because I think the industry's just done such a bad job of integrating them for everyone else. But interestingly enough, so Sonnet like thinking about, just like which model to try and use. So Sonnet 4 bizarrely outperforms Opus at software engineering related tasks and they have this, this extra chunk to put them ahead for parallel test time compute. So this is where it's, I guess it's like thinking really what you would consider thinking.
[62:53]
Chris
Yeah. And I think the interesting point there is that your actual MCP tool calling can happen during the thinking step. So it's able to go off and do multiple iterations, multiple tool calls, gather all the information before it even does its thinking or during its thinking to refine it and things like that before it actually goes ahead and does the output. And I would argue at the moment AI systems are doing that process themselves, whereas here it's actually able to handle that as part of an iterative model call. If you look at the way OpenAI does it, yes, they do it, but it's part of like an SDK. So it's not the model doing it, it's the, the tooling around the model that's doing it being handled internally, rather than the application developer doing it, if that makes sense.
[63:41]
Mike
Not really. Yeah. So but the challenge I think you have here, right, with the model Connects protocol is there's certain all the methods or functions in these MCPs, like send email or delete file. And there's certain skills, call them, that you don't really want running. Like, it almost seems like some of these skills in MCPS need to be labeled as, like, okay to use for context.
[64:13]
Chris
Yeah. And, you know, this is a big concern of mine. I think it's not enough to just say MCP is like, let's throw 300 tools at this model or, you know, 30 MCPs at the model and just trust it to make the right decisions, because they're vastly different in what they do. Like, if it's just looking up information, fine. But like, some information is probably more important than others, some sources are probably better than others. And likewise with the tools, some are more destructive than others, some are more harmless than others. So the, I feel like the innovation around that needs to come from a really, really good definition protocol around permissions, importance, priorities, and the system really getting these indicators around how the how and when these things should be used. For example, there's probably some MCP tools that are far more appropriate in the thinking step than the actual action step, and there's some you would want completely excluded from that process. But right now it's completely indiscriminate unless the application developer actually withholds tools based on the step in the process, which if the AI models handling the whole process, then you can't control. So it's, it's. I think it's an area that's weak right now, and I think people are finding their feet and working out how it should function. And so for me, as an application developer, I'm reluctant to hand off control to the model, or in this case, the model library to make those decisions. For me, I'd much rather just make all those decisions as the application and provide the tools that I'm okay with with appropriate descriptions than I am just here's an mcp. Do what you will.
[65:57]
Mike
Yeah, and I think. So the positives of this I see is, say, right now, if you do like deep research in say, ChatGPT or somewhere, it's really just using like their deep research AI system and their methodology of tool calling and then outputting that information through the model. But I think what's interesting about this is you could in theory, and I know it's not just in theory because you can, because we've done it, but you could feed in, say, Gemini Deep Research endpoint, the new Grok3research search, Twitter endpoints. You could throw in like the Reddit MCP to get, like, public opinion. You could throw in like a Finance direct web crawl, throw all that stuff that all into context within the parallel test time computers context for a research report and then say give me the research report. Now you're calling every possible tool and data source on the planet right now at the time of recording and you're getting a far better result because it's going off to many resources. It's not relying on like one company's methodology of producing data.
[67:10]
Chris
Exactly. And I gave you a really good example of this last night where I did precisely what you were saying and the results are spectacular. Even with, and this was without, obviously without the new Claude, without anything new. And so I actually think that it's those smarts, it's like how do we combine all these disparate sources, how do we prioritize them? And then when it does take actions, how do you actually when, at what point do you ask the human their opinion in terms of how to proceed? And then at each stage, what mix of tools and what mix of tools with mix of prompts are you giving the model to make sure you get the best outcome? And I just don't believe the spray and pray attitude of just giving it. Everything is going to be right. And I just don't think that one model will be enough like just giving it all and just trusting it to the model. I think there will be specialist decision making models or at least agents along the way that are making those decisions. And we discussed this before, this idea of you have specialist agents who are, you know, they're a research agent and they handle all of the things you just described. And so your first step in the process might be consult the research agent, agent to agent protocol to do your research and that gets that step. But now we're going to consult the action one that decides what to do about that. And rather than it all being just the model deciding because I just don't think any one model as a sort of paradigm is going to work.
[68:44]
Mike
Yeah, the model piece to me just having used it now heavily over the last couple of weeks, I promise we'll release this soon to sim theory uses. It's coming. We're not trying to tease intentionally, but the ability to switch on and off the right tools for the right job and sort of getting that intuition very similar to the intuition of using which model, like knowing, oh, I'm gonna switch over to like 3.7 sonnet now because I know it's good at solving like this issue for me or I, I know for medical questions, O3 or O4 mini is far better. So I'm going to switch over to that now. Yeah, that natural intu people have from using AI. I honestly think that skill in the shorter term is coming for mcp where.
[69:28]
Chris
It'S like knowing because I'm already saying like I, I'm going to ask it a question where I care about the answer. I'm like quickly switching off the mcps where I'm like, I don't want it to get confused by that. Yeah, like, and so straight away I'm like, as if this is the long term play where I'm manually having to like toggle the switches in my flight deck to, to decide what the AI has access to. Like that should be handled. There should be a way that I can pre configure this. So when I ask this kind of question it has the right mix of things and the right prompting around, hey, if this, if this MCP gives an answer, you've got to give it weight. But if it doesn't, that's fine to go off and rely on the xai.
[70:12]
Mike
Don't you just think this is where trained skills come into play? Where you train a skill where you like this is my research methodology, right. I like to use These sources or MCPs for this research methodology. I prefer to use this as the core model. I want to it cross checked with this model. To me that's what it is. And then eventually your primary assistant throughout the day just becomes a router. Where it's like okay, I'll use your research skill. I'm not actually using direct MCPS anymore. I think that's probably where this goes.
[70:43]
Chris
Yeah. And like, and if you think about the idea of like custom mcp, so it's like okay, you do that process. But part of my process is now I run it through my corporate guidelines mcp. Like is this information allowed within the context or you know, for a legal situation, is this in my jurisdiction? Like does this still apply based on this research or whatever it is? And then the other major concept that you've been talking to me a lot lately is what's the output type? What am I, what's my goal here? Is my end goal just like text, like I'm chatting to it? Or is it a full blown presentation or a word doc or is it a point form or is it just a decision? Yes or no? Like really the output format becomes really important when this level of process is involved. Because why, why do multiple steps if the, if the system can know that in advance for a trained skill. So I really think all the work in the next little while is going to be around this stuff and the models will get better at the various stuff. But if you think about things like browser automation and computer use, then the actual output type could be take a series of actions that I've trained you on, like you described earlier. So it's like we've done all this research, it's taken 10 minutes now go and input it into the various systems or go and log in here, or go send it to me in an email as a presentation and then your job is done.
[72:07]
Mike
I think these model releases are starting to support the need in the agentic world for specialist models for different parts of the equation rather than like one singular model, which we've been saying for quite a while. Like if you look at, I mean it depends if you believe them or not. If you look at the benchmarks and just the overall confusion around Opus and Sonnet, when you think about the Opus brand that like that Anthropics developed, you always sort of saw that as like their best model. But really like Sonnets ahead on the agentic coding. So if you want agentic coding, Sonnet 4, if you want agentic terminal coding, then all of a sudden OPUS is better. If you want graduate level reasoning, they're basically tired. I mean like, I mean the graduates.
[72:57]
Chris
That smart, I still don't understand why you, you want that. They're so inexperienced. It's like, yeah, they're really good at telling you stuff that was in a textbook, but I don't know.
[73:07]
Mike
Shout out to all the graduates. Yeah, agentic tool use. So in the retail benchmark, basically, like I think that's online shopping from memory, you get a one basis point improvement from Sonnet to Opus. And then in the airline benchmark, for some reason Sonnet is better. Like it, why they need two models is beyond me. I, I think they just probably should have just had release Sonnet 4 and called it a day.
[73:34]
Chris
Yeah, I kind of agree with you there. Like, it's weird. I guess they just did such a good job with Sonnet 3.5 that opus, the Opus name is just sullied for me. I'm like, I'm not, not even really giving it a chance.
[73:51]
Mike
I agree. Like every time I tried it, like I, I, there was a few times I would try it to solve problems I was stuck on to be like, I'll give it a whirl. And it was just so bad that I, I, yeah, the brand is tarred, tarnished. Sorry. So overall I'm curious. A week from now, like, will we be on here? You know, I'll be, like, kissing my Dario pendant here and saying, oh, you know, Sonnet 4 is the new sort of daily driving king. Or are we still going to be on Gemini 2.5? It will be very interesting and telling.
[74:26]
Chris
Yeah, I agree. Time will tell. I think it's way too early to make a call either way.
[74:30]
Mike
All right. Do you want to look at some of the cool stuff I coded, though?
[74:33]
Chris
Yes.
[74:34]
Mike
Okay. So that's why I have my vibe coding glasses on. This is the demo I've done before, actually. It's a simulated blood vessel. Airy looking, actually, this one. So my prompt was make a visualization in 3D of an artery to show blood particles moving through it and allow me to introduce a virus with a button. Then another button should demonstrate the immune reaction work.
[74:58]
Chris
Pretty cool.
[75:00]
Mike
So, yeah, I can control the speed. It's got sound effects, too, so you can hear the sort of beat of the heart. Can you hear that?
[75:07]
Chris
Yes, I can.
[75:07]
Mike
Yeah. It's very dramatic. So it's. Put that in and you can hear. I'm introducing the virus now. It's hard to see, but it does actually put viral particles. Particles in, and then I'm activating the immune response. They're represented by green particles, and they're killing the viral particles.
[75:30]
Chris
The sound effects are really good.
[75:32]
Mike
It's solid. Yeah, it worked good on them. And look, now we're back to a normal heart rate, of course, because our heart rate was elevated, remember when we were fighting the virus? Pretty nuts, right? And that, of course, was Claude or Sonnet, not Opus, because Opus failed. So, yeah, mental. I'll. I'll publish this and I'll put it in the description of the podcast and video if you want to load it up and check it out. But, yeah, the.
[76:03]
Chris
It's.
[76:04]
Mike
It's not necessarily some huge jump. I think that what it's just excelling at in my mind is like, the instruction following seems far better or just, like, figuring out what you actually want. And then the level of detail it seems to be able to code is quite good, too. As you mentioned earlier, though, the speed is really a problem for me. Like, it is so slow. It's quite painful to work with right now, but it will get faster.
[76:29]
Chris
That may be clouded by the fact that we're using Amazon. Like, we haven't used it directly through Anthropic, so I doubt it's an inherent problem with the model. I think it's just. I mean, it came out today. I think it's just teething issues and that'll go away.
[76:43]
Mike
Yeah, this always happens. I mean, it ends up going away. I also. One shot at a flight simulator. I want to tell you my prompt. This is sort of like a hacked version of it, but I literally said make a fight flight simulator plus with opus. And like, it might not seem obvious how this is improved over time, but it has in the fact that the propeller, based on my acceleration, will spin and stop spinning now. And this is just one shot. I mean, the world drops off and it's not the best.
[77:14]
Chris
Can you turn back or not?
[77:16]
Mike
Well, I can't really see the ground. It's like I'm in a cloud and I'm really lost. So it's not great. But it's so much better for a one shot. Like a lot of them. One shot. Admittedly, Gemini can one shot this, but it looks pretty cool. I think what's cool about it is it's like a 3D world and there's mountains and like balls that represent color. Clearly cloud. So I don't know. It was opus. I should try Sonnet with this. But they. They look. I had like two seconds before we started recording, so I don't want to judge it too hard. Like building a 3D flight simulator in one shot, two seconds before we recorded is still pretty impressive in my. Yeah, so we also had Microsoft build earlier in the week. I do feel sorry for Microsoft because, like, in terms of the army of Distraction week and the look over here of the Johnny Ives video, the anthropic release today. And of course Google I O, which I think was the show Steeler. And I mean, who'd have thought a year later they would have turned it around, Especially with the sort of AI community.
[78:20]
Chris
Oh, some people claim they did that. I've been saying Google would do this all along, but.
[78:25]
Mike
Yeah, Yeah, I mean, clearly it was obvious they were going to win eventually. Like, the funny thing to me about their whole strategy is like, convince people for years to give us all your data, upload all your videos to YouTube secretly put in the terms that they can train AI models on them and then train an AI model that will therefore eventually replace all creators. It's truly hilarious. Like, cut out the middleman.
[78:48]
Chris
Yeah, I mean, it's kind of like exactly what you'd do if you were an evil corporation. Right.
[78:54]
Mike
Steal all data, figure out how to train models on it so that searches are also free because you know everything. Yeah, it is. Yeah.
[79:02]
Chris
It's kind of good, though. It's good to have villains I think, yeah.
[79:05]
Mike
And then charges $249 USD a month to get access to our own data that they trained on. It's good. No, it makes sense. So Microsoft build. I don't even really. They didn't really announce much. It's just like improvements to Copilot. They, they announced, I think that a few strategic things stood out to me. First of all, they've truly the Switzerland now of models. Like they, they had Elon Musk on stage virtually announcing that they would support in their model garden or model library or whatever it is, the Grock models. And eventually when Grok 3.5 released they would support that. They also did something which I think shows how much like how in trouble they are from Cursor and Windserve. They announced that the Google Copilot like prompts and tool calls and all that stuff will be open sourced so people can improve it, build upon it, contribute it back to the core code base. So the VS code code base now already has that like agentic completion model, open source as part of it. Not the actual models obviously, but that they use, but the, at least the system prompts and tool calls and those kind of things. Which means for the Curses and wind surfs and the Kleins and like all these other people that forked GitHub and have built, you know, these billion dollar sort of businesses on top. I think, I mean it's pretty smart strategy, right? It's like yeah, okay, we'll just open source it and let the community make it better. I think that would appeal to large corporates as well who want to use it in development teams because they can see the prompts and see the tool calls. It's not a black box anymore of what's going on in there. So it would give them that reassurance.
[80:52]
Chris
And then they can host it in their private cloud and all that sort of stuff.
[80:55]
Mike
Yeah, like if I'm, if I'm Apple rapidly trying to catch up and my developers are using these ides like I want to keep all that in house. Obviously I don't want leaking to Microsoft or one of the other AI vendors. So I think that's an interesting strategy. It'll be like really intriguing to see how it plays out. They also announced GitHub. GitHub, a GitHub coding agent which they had had in preview for a while where basically you can create an issue, assign the issue to copilot within GitHub and it creates a PR and then shows the code changes of it. Fixing something allows you to review It. So for those are unaware or aren't developers, it's basically like, you know, as a user you put in a ticket and the developer's just like, hey Cobalt, can you go fix this and then I'll just review it. So that's basically the idea behind it.
[81:48]
Chris
Or you put in an issue that works on my machine and then you got to argue with them for a day before they'll be convinced.
[81:55]
Mike
Yeah. So then you can review it and request changes if you want. And the idea is that would just speed up.
[82:02]
Chris
Yeah, I mean, can I just say, I hate that concept. I think it's dumb and I don't think it's practical at all. Because the thing about software projects is not the code at all, it's. It's like the testing. And if, if you just, if someone just submitted a hundred PRs to SIM theory, for example, I'd be like, you've just ruined my life. Because I've got to go through each one, consider in my mind what the impact of them would be and then what do I do? Merge the PR or load up that branch, go off, test it. I become a QA agent for the AI and go try it. I'm like, this didn't work and I've got to do that. Like every time there's an issue, like it just, it just seems to me like a huge time sink. Like, I just don't really see the advantage. And if you're a non coder and you're using this as a sort of surrogate for producing code, again, you're like, if anything goes wrong, you're like, I don't understand why this happened or what happened here. Like, I just don't really see where the time saving comes from. Especially when right now you can use AI models to write the code and try it yourself in context. Like having it submitted as a pr, I think just really slows you down.
[83:17]
Mike
It's interesting too because I think, I don't know if we're just. We just have a different perspective because of how we work and how we code. But I always think that like anytime I use the cursor agent, I find for like, net new things where I've got a clear example. Like, I'm like, here's an example of a component. Can you scaffold and give me like a version of that component that does this? It's really good and impressive and truly mind blowing. And you're like, wow, this is going to change the world. But then you start iterating on it and it goes nuts, and then it. Then you just end up unwinding it. And then I'm like, I could have just done this quicker, just keeping the AI in the loop, but not necessarily getting it to do the implementation and just scrub a bunch of files for me. So that's why I still code with like the chat style interface open and just have a structured context conversation going like, hey, we're working on this. Can you explain this? Can you do this? Can you give me code for this and just use it as like a calculator? But I'm still completing this sort of math quiz and it's the calculator, right?
[84:21]
Chris
Yeah.
[84:22]
Mike
And to me, I have learned from trying all these different methods. That is what makes me personally, and I'm not suggesting this for everyone, that makes me personally far more productive. I've tried all the other methods for like, literally weeks, days, being like, what am I missing here? And I think the truth is I'm missing nothing. I think that the, the other ways of doing this are fine if they work for you and make you faster, like, that's, you know, good for you. But I think these, like, GitHub coding agent, as you said, it's like that expression, you know, there's multiple ways to skin a cat that's so gross that actually I don't get that expression. But it's disgusting if you think about it. But it's the same thing. It's like, it's probably the same amount of time because you're still testing, even if it's a simple issue, to make sure it works.
[85:10]
Chris
Yeah, because it's not your code. It's harder to comprehend what, what's changed, especially in like a diff file. It's like, okay, it's made all these changes. What's the impact of that?
[85:19]
Mike
And then the AI wrote the original code in the first place. It's like, well, I have no idea.
[85:24]
Chris
Well, that's true. Exactly. And I've found that the AI, when you trust it, just indiscriminately like that will introduce subtle failures that will hit you later where it refers to some library that doesn't exist, or it catches some exception that doesn't exist, or it uses the wrong version of a module and things that are easily caught if you copy pasting and then testing on each iteration. But if you're just relying on a PR and you're like, yeah, it looks fine, merge, and then you run into the issue, then you got to revert it. You gotta, you know, I. It's just not my way of working. Maybe some people it is and I'd be interested in the comments if for other people they, they do see the merits in it and maybe I'm wrong.
[86:03]
Mike
Yeah, I. It's one of those bizarre things. I say it to you all the time like, are we missing something here? Like are we. You know, like I just don't really. I'm yet to see it being that great apart from like whipping up concepts like that early sort of scaffolding and like cloning things that have been done before. But any, any sort of net new thing or like when you get to that later stage iteration, I find it's just better to be cherry picking the context yourself.
[86:33]
Chris
Yes, exactly.
[86:34]
Mike
So the other major announcement I. Well I'm just picking out what I thought the major announcements were. I, I think and this is the overall trend is where really what we're seeing now is the model context protocol support has totally gone mainstream. In the week to start it off with the build conference, we had the announcement of Agentic Windows. It's not out yet. There's going to be a ton of like obviously preview builds before they deploy this. And I would say it's more of an announcement of like, hey, we're gonna do this versus like it's a reality yet. But basically the idea is that you have MCPS native to Windows where you've got like app actions on Windows. So this is a way that you can unlock capabilities in your Windows app for the agents to interact with. So it's a bunch of like APIs for apps or agents to interact so they can use applications on your machine. And, and they're also giving it access to like an MCP for the actual file manager for searching files. Like a whole bunch of tools essentially to interact with a Windows PC. And as people who have built a workspace computer, we were like one of the first to kind of do this and, and think this was a pretty compelling idea. This is great because it gives you hopefully native and secure access to a Windows machine where you're operating it through the way it wants to be operated versus sort of hacking like a computer use sort of like almost like a viral worm layer on top where before you're just like hacking it to make it work this way, but now it, it can work this way. Are you excited about this or do you like, do you think this is like.
[88:19]
Chris
Yeah, I think it's amazing. I think it's so good that this is being embraced. I think as I mentioned earlier, I think there's weaknesses in the protocol that can be developed upon, but I have zero doubt that that will happen really quickly as people discover it. So people know there's two different ways the MCPS can sort of be made available. One is the sort of standard in out way, which is where it's running on the local machine, which is how Claude Desktop operates, where the system is basically just purely local. There's no like HTTP interface, there's no remote way to control it. You're observing that it's there locally and then feeding in instructions and getting responses. And then there's the SSC layer, which is where you might host it on cloudflare or something, and then you add it as like a URL into OpenAI or Claude or, or whatever. And that's remote. And what I've found is the majority of them are the original way, the standard in our way. And what we've done to make our stuff work is basically put a layer on top of that that we control, that emulates as if it's local, but it's actually running so you can access it remotely. So the interesting thing about the Windows layer will be there will need to be some sort of proxy layer to allow it to be operated remotely. But that's easy and that can be done. And so I agree with you that computer use as a concept is brilliant. I think it's totally the future. The application layer itself, embracing that makes it so much easier to get to and so much more precise. And I think it'll lead to much wider adoption of this stuff and meaning that it's much more plug and play. You can really, without even knowing about a specific application, you can add support to that, to your AI system, which is really a huge concept. I think the challenge is going to come with the AI system, knowing when and where to use different things, how to prioritize, that kind of thing, because.
[90:20]
Mike
It is still the orchestration and the system layer where the greatest opportunity lies. Like, unless the models just get so much more advanced at this, which maybe they will, maybe they won't, but I.
[90:32]
Chris
Can'T because it's the focus, like if you present it, like, and I think this is what we anticipate happening, people are going to want them all. They're going to want. They're going to have like 700 MCP or something.
[90:46]
Mike
You got to get them all.
[90:47]
Chris
Yeah, you said that to me the other day, each of which has 70 tools. You can't present 707, 000 tools to a model and go do this and expect it to work. Like how is. It's never going to satisfy you because what does it do? Call 400 different tools just hoping that they're the ones you.
[91:06]
Mike
And it's like, sir, I have.
[91:08]
Chris
I've turned on your home automation system. I've turned on the sprinklers, I've alerted your neighbors, I've generated five photos and I've.
[91:17]
Mike
I have a challenge for you. So next week on the show or maybe the week after, depending on how much time you have, I would like to try the hear me out test Time compute tool calling. But the MCP tool is a prank call machine where it, where it has to hit up like every. Every. And hear me out. Every pet groomer in Australia at the same time as part of its. It's part of its test I've given you see if we can get them to groom a pig and to find.
[91:50]
Chris
The dumbest pet groomer in Australia.
[91:53]
Mike
But I think like, surely that would work with parallel. If we just get it to call the same tool. Right. Yeah, it'll work.
[92:04]
Chris
It'll. It'll sit there relentlessly doing it. Yeah.
[92:07]
Mike
And waiting for the response.
[92:09]
Chris
I'm up for that. That sounds.
[92:10]
Mike
I wonder if it'll time out though. Like how long will it sit and wait in the.
[92:15]
Chris
Well, I mean, see, it's a good example. Right. So if you use the actual model MCP calling. Unclear. I don't know. If you use our system where we're actually handling the tool called chaining, then it will. It'll just keep going until it starts. Basically, the main issue you're going to face is running out of context window to have all of the tool call results in there, which is another challenge that needs to be dealt with.
[92:37]
Mike
I think if we were able to use Gemini with like a million connect, surely call transfer. Okay, let's isolate the challenge a little bit more. Every pet groomer in Sydney, the Sydney Basin.
[92:49]
Chris
Yeah. Who answers the phone. Because a lot of them probably won't answer.
[92:53]
Mike
Yeah. And so pet groomers, if you're listening to the show, expect to call, be prepared.
[92:58]
Chris
Teacup is not.
[93:02]
Mike
I really. This has to happen. I just want to see it just go nuts and just go and call them all at the same time. I think that will be a true feel the age, the parallel nature of it is brilliant.
[93:15]
Chris
Like, how much chaos can I cause in the space of like 30 seconds.
[93:19]
Mike
Yeah. Comment below if you'd like to see chaos on next week's show or if you're actually still listening at this point. All right.
[93:25]
Chris
I like it.
[93:26]
Mike
Moving on. Well, there's not much else to move on to. I mean, you could have talked about.
[93:30]
Chris
All that AI news, come out during the podcast.
[93:34]
Mike
No, there's no new models of drop. Sadly, we're all getting addicted to it. I did want to call this out, though, in light of the week that we've had. So you've got Google now. Obviously, have awoken someone. Someone said it's like the kid that got bullied in high school last year. Google's like the kid that got bullied in high school and now has come out and he's like, super rich, has a hot wife, and, like, is really confident now. And, you know, like, it's like that's the vibe of Google right now. Like, they're really, like, punching back.
[94:10]
Chris
Well, I mean, I think I. We need Google merch. Maybe we need some merch. Maybe you need to make some more pendants with decent chains and we'll all buy them.
[94:18]
Mike
Well, I threatened with the. The Sundar, or, I don't know, the. The demos. The. The deep mind guy. I always get in trouble when I pronounce his name, so I'm not going to do it. All right, so this article I did want to bring up from the information for Google. Challenge of complexity. Growth comes at a high cost. So in light of Google literally boiling the ocean and then having basically perplexity built into Google search in the States now, I thought this was kind of interesting to look at. Like, what. Where does that, like, this perplexity exist going forward? Or is it. Is it the AltaVista? Is it the early AltaVista? Look, the financials are pretty worrying, so they did 59 million in subscription revenue. Not bad. 1 million in the API ad revenue.
[95:17]
Chris
20.
[95:17]
Mike
20K. No, 20K. It's like $20,000. Like, why even list it? It's embarrassing. I don't. I mean, I guess this was leaked. Discounts and refunds. 27 million. What? I mean, what? The 27 billion?
[95:39]
Chris
Why would you. Why would you combine them in your financials? They're pretty different. A discount and a ref. Very different thing.
[95:46]
Mike
Yeah, I mean, these were clearly leaked, but 27 million. So 59 in subscriptions.
[95:52]
Chris
27. And you know who makes the money? The finance. The finance providers. Right. They. They take their commission anyway.
[96:01]
Mike
Okay, but get this. Like, let's dig into that a little bit more payment processing, which is, I guess, stripe, right? $4 million.
[96:12]
Chris
That's where the real money is.
[96:13]
Mike
We need to get into payment processing.
[96:15]
Chris
Yeah.
[96:16]
Mike
Web services for R D. Johnson got.
[96:19]
Chris
Rich, he was Braintree. That's why he's going to live forever now.
[96:22]
Mike
So web services for R D, including AWS, $48 million. So we went, we had revenue, remember, after we minus discounted refunds, 27 million, we get to 34 million in revenue. And that joke's gonna be on me, by the way, in like 10 years from now if they're a public company and worth billions. So, and then below that, I mean, they did less in payroll than discounts and refunds. So they're basically, I think, just like giving zero discounts to students and all these people to get the numbers up for investors to be like, oh, look how many people are using it.
[97:04]
Chris
Well, I mean, someone on our this Day in AI Discord this week pointed out like a lot of the activities by the major AI companies are not to appeal to regular people at all. They're, they're to appeal to the investors to get the next round of funding, which is how they get their money. Yeah, I mean it's not that crazy. Like if there's companies out there who will give you 100 million, 200 million, a billion in Anthropic's case, anytime you do something that seems slightly exciting, then that is a legitimate source of getting money in. And then if you can acquire your mate's company and, and launder some of that money or you can get the money out in some way as a sort of founder or higher ranking person in the company, then what difference does it make to you if it was from customer revenue or if it was from an investor? It's irrelevant in the long run. Like if that's your goal is just to extract some money from the company. So therefore a lot of these things where we say, oh well, this doesn't really make sense because why would you give 27 million in refunds? It may make perfect sense in the context of I'm actually personally cashing out just fine and I don't really care what those figures say.
[98:12]
Mike
I'm just starting a GoFundMe for our lawsuit, defamation lawsuit. We probably going to need what I've.
[98:18]
Chris
Done a bit of libel on the episode, I must say, but I am an apologetic person. I'll happily apologize to anyone who feels they've been affronted by this episode. So I will beseech you, I will get down on my knees, I will do whatever is is necessary to humbly apologize because these are just opinions, so sincerely held opinions that I will put in writing and sign an affidavit, purchase some more Lulls.
[98:45]
Mike
They spent as much on models, which is kind of like, a bit strange to me. So as much on paid ads, including TV, $8 million as they did on OpenAI and anthropic models. So what? Legal services, 5 million. So the lawyers got paid nearly as much as they had to pay for models, which is like the. You would think, the core expense in the business. And then just another column of losses, like 10 million. Oh, we just lost some money. Don't mind that. Don't mind that. 10 milli. That. We don't know where that one went.
[99:21]
Chris
It's so funny that, like, some, you know, small bakery will get done by the tax office for, like, misrepresenting $200 in, like, you know, gift cards they bought. And then this company could just have an accounting column for just general losses. And everyone's like, yeah, now that makes sense. Tech company, whatever.
[99:40]
Mike
So this is. So why this came out, right, is the San Francisco startup has been able to fuel this growth with a steady supply of venture capital. Perplexities in talks to raise 500 million at a $14 billion valuation. Its fourth round of funding is just over a year. In just over a year and ended last year with about 850 million in the bank. So nearly a billion, the bank, and they want another 500.
[100:07]
Chris
Imagine having a billion in the bank. You're like, shit, we're running out of money. What are we gonna do?
[100:12]
Mike
How are they even spending the money? Like, like, if you've got a billion dollars in the bank and they're losing 68 million in a year, it doesn't even seem like that big of a loss at this point. Like, why not? Why not lose?
[100:25]
Chris
Maybe they're anticipating refunding even more money. They're like, they're just holding, don't you.
[100:29]
Mike
Think, the numbers comparable to how much they've raised a mental. Like, so they've got. They're like, essentially, they're probably going to be after this round, sitting on a cash pile of what, like 1.3 billion.
[100:44]
Chris
But I mean, isn't that the point? They're just trying to get enough money in there that they can take some out and they don't really care at all about the other metrics.
[100:52]
Mike
Yeah, I guess, maybe, like, is it secondary sales or something? Like, it doesn't make sense because, like, if I. If I was them and I was trying to beat Google, why not just go for broke? Like, spend 200 million a year?
[101:05]
Chris
Like, also remember that all of this money comes from, like, people's retirement funds and Stuff, stuff like this isn't like anyone's money personally. These are all people who've put their money into. What do they call it in America, 401k funds and stuff like that. Like so no one. All they're trying to do is just get that money out of those funds and into their pockets. Like that's what all this stuff is.
[101:27]
Mike
Yeah. So without any greater commentary on venture capital. But I mean this is, I guess my. The interesting thing is like can this company. And I'm not, I don't have anything against them, but can this company really challenge Google in light of Google just putting this capability in search, it's just like a general tool for free. Like can this product be so much better that people are willing to just use perplexity?
[101:55]
Chris
I'll give you an answer. No, it can't. Yeah, it's sort of there. They're an early hype company who was very smart in their timing and that's. There's a reason they spent so much money on advertising because they needed to be known as a player in the market in order to raise these funds. I would say they haven't spent enough. Like if you've got a billion dollars.
[102:15]
Mike
In the bank and you know you probably going down, let's be honest, I would just go for broke trying to out compete Google on like marketing and ads. Just constant ads. Like oh, the future of search is perplexity. Just constantly.
[102:30]
Chris
They should have sponsored the new pope. Like just become the pope. Like I'm sure if you spend enough money you could do that. That'd be popular with the Catholics.
[102:39]
Mike
So anyway I raise it because I. The. The discounts and refunds line just. I don't know why I just laughed about it. So it's like, it's like it's over.
[102:50]
Chris
Well, I told you about the idea my friends and I had for a company once which was where we would sell coal like the. The physical Product Coalition. Because every.
[102:59]
Mike
You've said this on the show way too many times. We can't. You're like an old man. You truly are the old man of the show.
[103:04]
Chris
Anyway, the point of that story is that 90% of our budget was for admin and legal fees because we're just going to sue everyone who opposes us in any way.
[103:12]
Mike
It does seem like money laundering for lawyers and payment processes. This business. That's all it is. Anyway, I don't know how with all the news we had to talk about, we had to end on this note and I just.
[103:26]
Chris
It's a deep Analysis of complexities.
[103:29]
Mike
It's so funny. I don't. I. Yeah, it's so, so funny. All right, so how long until Apple acquire anthropic or OpenAI? At this point? Now they've got Johnny. Apple might want him back. So maybe they acquire OpenAI.
[103:47]
Chris
Unclear. I mean, I think the point is that the, the models are converging and the application layer is what's key. And Apple's the best at the application layer. So I think it's just another case where Apple will buy their time, they'll come out with something that's meaningful and actually helps you in the day to day and do just fine. Like, I, I just don't think they need to buy into it yet.
[104:09]
Mike
Yeah, I agree. I, I think everyone's calling the death of Apple and saying, like, Google's where all the excitement is, but if you look at what they really shipped, it's like we don't actually have any new devices. They do have the best model, they do have all the data and they do have sort of the, they call it like the, the personal context of each user. But like it. That has never meant that things don't fragment. Like, there's a reason that I use some Google products, some independent products and some Apple products in my life. Like, it's just Apple's like the Borg, right?
[104:39]
Chris
They just assimilate other people's cultures and stuff into their own and then just make it better. Like they'll, they'll do just fine. And I just don't. And they also just don't see these things as a threat yet.
[104:51]
Mike
I think in a year or two they'll have like, caught up and finally Siri will be decent, is my prediction. And I don't really say that as an Apple fanboy.
[105:00]
Chris
In fact, I will never be good.
[105:02]
Mike
It's gonna happen one day. But even so, like, I, it has made me think for the first time maybe I should switch to Android just to have the AI education. But then I can't restart my phone, so I'm not sure.
[105:13]
Chris
Don't do it. Gemini is awful. I'm fine.
[105:16]
Mike
All right, on that note, any final thoughts of the week? We have video generation that one day may lead humans to be absolute sloths and consume content. We have Johnny and Sam Altman potentially having a baby. We have.
[105:34]
Chris
Yeah. My final thought is that none of the, none of the model updates matter. I think what matters is the intelligent integration of the MCP explosion. I think that the right combination of tools combined with the brilliance of these models is where we're going to see stuff that really just blows us away. Like it. The things like create with code combined with the model usage, good output types and just all of that stuff coming together is where it's at. And that's going to happen in the application layer, not at the model level. I'm grateful for the models, they'll make a big difference. But I think that if you took away one whole, you know, every provider except one, and you just randomly picked a frontier model level provider, just not innovation. Yeah, well, the innovation still going to happen. And so for me that's what I want to see is like, let's embrace this and rush to see what's possible with the combination of these things. I think this is what is going to dominate the conversation over the next few months.
[106:39]
Mike
I agree. I, I think we're at a point now where it's this orchestration and the, the next like feel. The AGI moment for me is definitely seeing these tools in a sort of async environment, working on different things for me throughout the day in the background and that it's not like it's not game, like it's not game changing in the sense where you're doing tasks that like, oh, I'm going to be out of a job. It's just you become more productive, you stay focused in the moment, you're able to do more things at once and you're able to go more broadly, I think with the tool use and get more context, get more data to actually take actions for the first time. I just don't think we're at a point yet where, where it can go off and competently do these things in the background too much. Like the agentic stuff doesn't excite me right now as much. I think it's like agentic light plus human is, is the next exciting.
[107:30]
Chris
Like we're moving, we're moving in that direction. And I think the thing I love about our podcast now and, and our community is that we're all focused on the practical. Like it's fun to get all excited about the announcements, but it's really about what can we do with it. And I think you can't just go from zero to the AI does everything because it can't do that. We've got to go from zero to how does it help me this week and what can we do to make the most of the technology? And I think that's what I get excited about because there's so many things now we do day to day rely on for the AI to do that, we've integrated into our lives. And so every time there's a new one that, that gets me excited. And so I feel like this MCP world is where that is going to really rapidly expand because just the vast amount of tools that it can embrace now, and because everyone's embracing it, means that it's going to multiply in terms of those day to day benefits that we actually see from the technology.
[108:31]
Mike
Anyway, we've got to shut the hell up and just ship onto Sim Theory AI our vision around this, which we've been hard at work on. That's right. Again, I think I'd stress here, like Sim Theory, for us, one of the main objectives of it is to allow people to try these things. And right now trying these MCPs is quite difficult because you need to create servers and do all this stuff. So a big part of our vision is to make this really simple and accessible for everyone that listens to the show to go in, try this technology, get ahead of it, to play around with it early and, and, and work in this async manner. So if you want to support the show and you want to get this first, make sure you get an account Sim Theory AI and hopefully in the next couple of weeks we'll be able to get this into the wild and everyone else can call pet groomers in parallel, just like us. We have to do this. I, I'm committed to doing this. I'm just publicly stating it now because the lulls will be just too good.
[109:28]
Chris
Agreed.
[109:29]
Mike
All right, remember, if you're not subscribed to the show, please do consider subscribing on YouTube or like wherever you get your podcast. The majority of you, I think are on Apple and also leave an average review. We love them. All right, we'll see you next week. Goodbye.