Summary10 min read

This Day in AI, EP99.07-06-05 (June 6, 2025)

Hosts: Michael Sharkey & Chris Sharkey
Episode Title: AGI Reality Check, Gemini 2.5 Update, Are Your AI Chats Safe & Fun with Veo3

Brief Overview

In Episode 99, Michael and Chris come together in-person for a wide-ranging, irreverent, and self-deprecating discussion of recent AI headlines and hands-on experiments. The brothers review Google Veo3’s video generation, break down the Gemini 2.5 update, debate the realities and risks of AGI, analyze model lock-in and access drama, and voice concerns about AI chat privacy in the wake of new legal demands. Listeners are treated to their signature blend of average hot takes, practical anecdotes, and comedic asides as they navigate an ever-shifting AI landscape where everyone is just trying to keep up.

Key Discussion Points & Insights

1. Experimenting with Google Veo3: Costs, Quality, and Use Cases

Immediate Take: Veo3’s new video model impresses in some ways but is prohibitively expensive for hobbyists.
- $3.75 for 5 seconds via Foul, 78 cents/second on Replicate. Quick humor over accidentally prompting “Tech Cage” instead of “Tech Cafe,” leading to a caged man with cats ranting about the podcast. (01:10–02:16)
- “It sort of does sum up the person who probably wrote that.” – Chris (03:12)
- Quality Assessment:
  - Highly adherent to prompts, even adding small requested details.
  - Some oddities: teeth, uncanny eyes, audio leveling issues needing post-editing.
  - “It’s 4K as well… I think, using them as cutaways for $3.75, if you just want some sort of establishing shot… I think it would totally work for that.” – Danny (03:53)
  - Caution: Early social media demos were “cherry picked.” Random outputs still trip up.
  - “It’s not something you want to just play around with unless you’ve got some real commercial use…” – Danny (04:26)
Potential for Marketers: Marketing materials and stock video generation seem realistic uses even today.

2. DeepSeek (Deep Seq) R1 Model Allegations & Model “Train-Stealing”

Rumors swirl about DeepSeek’s latest model being trained on Gemini outputs, following earlier speculation around OpenAI and Anthropic’s Claude. (05:28)
- “I think this is the new benchmark, right? Deep See—whichever model they’re training on is clearly the…” – Danny (06:09)
DeepSeek’s current performance highlighted: creates impressive, functional code and websites, not to be underestimated. (06:29–06:48)
Benchmarking through “emotional intelligence” and repetition analysis suggests possible synthetic data use.

3. Google Gemini 2.5 Pro Update & “Thinking Budgets”

New Tune: “A new tune of Gemini 2.5 Pro dropped into general availability… not a lot of data yet apart from them claiming it’s now number one on a bunch of benchmarks.” – Danny (07:10)
Key Features:
- More creative, better-formatted responses.
- Improved instruction-following, e.g., for comments in docs/code.
- Ability to specify up to 32,000 tokens for pre-answer “thinking budget.”
  - “This one can use 32,000 (tokens) just for the thinking, let alone… final content…” – Chris (08:20)
Tune Versions: Users can now toggle between March and new June tunes (March version proving popular for some).
Frequent Updates: The slow-drip release of “tunes” is smart marketing even if it confuses users. (09:34)

4. Personal Model Rankings & Comparison

Model Variety: Much debate about Gemini, Claude, GPT-4/4.1/4.5, and open source options.
- “I feel like even if we were cut off for everything except one modern AI model, you could still get a lot out of it…” – Chris (33:09)
Model Performance:
- Gemini 2.5 is a go-to for Danny (“gun to my head… it would still be 2.5.” – 31:31).
- Claude Sonnet 4 is Chris’s daily driver for its parallel tool calling, decent speed, and reliability.
- GPT-4.1 remains suprisingly good for conversational and vision tasks, but “just one of the most middling, disappointing, mediocre models out there.” – Chris (29:04)
Agentic Capabilities: Claude’s ability to handle tool calls in parallel is noted as a “profound” development, possibly game-changing. (14:22–15:48)
- “It did 10 at once… and synthesized those results back… astonishing.” – Chris (14:22)
Model Switching Fatigue: Frequent retuning and updating causes both excitement and skepticism among users.

5. Infinite Context & AI System Layer

Context Size Wars:
- Gemini’s large context (theoretical “infinite” context) stands out in practice:
  - “You can build up the benefit of all these tool calls and… take advantage of that full context…” – Chris (17:04–18:19)
  - Danny: “[The] large context right now is what keeps me going back… I just feel really reassured that it’s taking into account larger chunks of data…” (18:19)
Challenges Remain:
- Even with large context, “context drift” still happens without reminders to refocus.
- The need for goal lists and mechanisms to reunite context from branch-tangent conversations is growing.

6. AI Model Access, Lock-In, and the Windsurf/Anthropic Split

The Drama:
- Windsurf (a Claude-powered coding client, alleged OpenAI acquisition) loses Claude API access. Anthropic responds: “It would be odd for us to sell Claude to OpenAI… just trying to enable our customers who are… working with us in the future.” (24:03–25:17)
- Anthropic’s move seen as the “first rug pull,” establishing a new precedent for strategic model denial.
  - “We predicted… our first episode ever of the podcast that what if they take the models away… that was our fear right at the start…” – Chris (26:31)
- Attempts to workaround by procuring Claude via AWS or Google Cloud; likely to be recurring industry issue.
- Theme of “what happens when the model you love is suddenly taken away?”

7. AGI Reality Check: Scaling LLMs & The Limits of Today’s Tech

Yann LeCun’s View (Meta/Facebook):
- Clip played at [37:08]: “We are not going to get to human-level AI by just scaling up LLMs. This is just not going to happen. Okay?”
- “It’s not a PhD you have next to you, it’s a system with gigantic memory and retrieval ability, not a system that can invent solutions to new problems…”
Hosts’ Reaction:
- Respect for practical (if unsexy) take: these are amazing general tools but missing true “intelligence.”
  - “My point is this: so what? It can still do heaps of really useful things across many industries…” – Chris (39:15)
- “We can now build AI systems that will change the world… change everything and are changing everything already.” – Danny (40:13)
- Concern that AGI hype overshadows real-world deployment and utility.
- Noted trend: less public “AI doomsday” fear, more talk about practical adoption.

8. AI Safety Rhetoric vs. Product Reality

Anthropic’s Alarmism:
- Seen as coinciding with need for fundraising: “every time I’ve noticed they need to fundraise, they just come out and spread doom porn like there’s no tomorrow…” – Danny (46:07)
Sundar Pichai (Google) Approach:
- More measured: acknowledges possibility of progress plateau, focuses on pragmatic challenges (context, memory).
- “He’s talking very practical… like, these are the next problems we need to solve: infinite contexts and better memory…” – Danny (45:14)
OpenAI’s World Tour Parallels:
- Safety panic as a recurring marketing technique, not necessarily an honest reflection of risk. (46:26)

9. Court Orders & AI Chat Privacy

Court Order: OpenAI directed to preserve all chat logs, including deleted and API chats, in ongoing media lawsuits (e.g., NYT).
- “Like deleted chats, private chats, their logs, like all this stuff…” – Danny (48:12)
Privacy Implications:
- Brings up unsettling scenarios: would-be confidential chats now potentially retrievable for legal scrutiny.
- “It’s an area where I think the GDPR laws should be enforced… The right to know what data they have stored about you…” – Chris (50:19)
- Especially important as people connect more personal data and organizational data to AI tools.
- “It’s like, everything I ever see, I’m going to keep…it’s an incredibly violating thing…” – Chris (52:13)
Reality: Free AI access often comes at the cost of data privacy and consent; proactive legislation and user education are vital for long-term control.

10. The Future: AI Workspaces, Job Disruption, and What’s Next

Job Automation:
- Sober reflection on job impact—especially procedural roles. Example: AI taking drive-thru orders isn’t “intelligence,” just automation. (57:33)
Adoption Curve:
- “People...underestimate how long it takes society to adapt to these technologies.” – Danny (59:43)
Long-term Vision:
- Shift from “software stacks” to “model/MCP stacks”—curated setups of MCPS, agentic tools, models.
- Belief that those who adopt and learn AI now will simply be more productive and fulfilled, not replaced.

Notable Quotes & Timestamps

“The show neither impresses me nor disappoints me.” – AI-generated review via Veo3 (02:51)
“We are not going to get to human-level AI by just scaling up LLMs. This is just not going to happen. Okay?” – Yann LeCun, guest clip (37:08)
“If you had a gun to my head and said, you can live the rest of your life at this moment with one model, it would still be 2.5.” – Danny (31:31)
“It did 10 at once… and synthesized those results back. The amount of thinking… was astonishing.” – Chris re: Claude Sonnet’s parallel tool calling (14:22)
“We predicted...what if they take the models away… that was our fear right at the start…” – Chris on Windsurf/Anthropic (26:31)
“My point is this: so what? It can still do heaps of really useful things across many industries…” – Chris (39:15)
“If you’re going to talk about any form of legislation or anything, it’s just consumer protections around the right to not have your data trained on, even if the products [are] provided for free. That should be a human right with AI, really.” – Danny (51:05)
“This is the first rug pull really that we’ve seen...” – Danny on locking out Windsurf/Coding clients (26:51)
“It’s not going to be your software stack anymore. It’s going to be your MCP stack or your model stack.” – Chris (59:28)
“Yeah, it’s not going to be like robots with lasers on their head walking down the street killing people for thought crimes.” – Chris (60:41)
“When I get that [AI dishwashing/kitchen bot], I’m just done. That’s AGI for me.” – Danny (60:48)

Segment Timestamps

| Time | Segment/Topic | |------------|--------------------------------------------------| | 00:00–03:19| In-person banter, Veo3 first impressions | | 03:19–05:18| Veo3 prompt results, pros and cons | | 05:28–08:20| DeepSeek R1 training rumors & demo experiences | | 08:20–11:54| Gemini 2.5 Pro new tune & thinking budget | | 11:54–14:22| Benchmarks, model variety, Claude tool calling | | 14:22–19:41| Parallel tool calls, agentic workflows | | 19:41–24:03| Context size, infinite context, workflow issues | | 24:03–29:52| Windsurf/Claude rift, model lock-in discussion | | 29:52–34:48| Model rankings, resilience, and alternatives | | 34:48–37:08| Open/closed model ecosystems, open source | | 37:08–41:38| AGI, Yann LeCun interview/clip, hosts’ reactions | | 41:38–47:33| Safety rhetoric, OpenAI/Anthropic/Sundar Pichai | | 47:33–54:42| OpenAI lawsuit, chat privacy, legal demands | | 54:42–59:43| AI workspace future, job disruption, adaptation | | 59:43–64:16| Model stacks, societal change, kitchen AI dreams | | 64:16–end | Closing banter, daily driver choices, humor |

Memorable Moments

Accidentally generating a “tech cage” video in Veo3.
Griping over repeated “tune”/model versioning—“None of these labs have a healthy relationship with naming things.” – Danny (12:58)
Lively digression about 90s CPUs as a parallel to context size progress (23:11–23:40)
The “first ever rug pull” by Anthropic when they cut Windsurf’s Claude access and its portents.
Yann LeCun’s no-AGI-with-LLMs quote, met with both agreement and a ‘so what?’ attitude.
Proposed AI time-keeping/clock metaphors for tool-calling self-regulation (63:02).
Strong feelings on privacy: right to have chat data hard-deleted (“The basic person is not going to assume, oh, it’s a soft delete…” – Chris, 50:37)
Mundane but relatable AGI wish: “When I get that, I’m just done. That’s AGI for me.” – Danny, (60:48)

Tone and Style

Conversational, skeptical yet enthusiastic, with frequent self-deprecating humor. The Sharkeys routinely poke fun at their perceived mediocrity, lean into average-guy takes, and approach AI advancements with a critical but practical eye. The episode is balanced, mixing hands-on technical discussion with irreverence and accessible analogies—a podcast for people figuring out AI as they go.

For Listeners/Newcomers

Curious about AI’s actual impact on work and privacy? This episode covers firsthand experiments and lively, down-to-earth debate.
Worried about AGI hype or model access rug-pulls? The Sharkeys ground their discussion in the world as it is, not as doomsayers or marketers pitch it.
Thinking of using AI tools in depth? Insights here about prompt quality, context management, and privacy risks are practical and immediate.
Like your tech with a side of self-aware humor and skepticism? This episode delivers that in spades.

Closing

The in-person format adds new comedic energy, and, despite poking fun at their own knowledge, the Sharkeys give listeners a timely and nuanced look at the realities of AI development, deployment, and user risk in 2025.

“We don’t often touch. This is the most affection…”

“If you like the show, please do leave a comment, like, and all things. What other. Oh, wait, this is how you be a TikTok influencer…” (64:27)

Loading summary

Transcript163 lines

[00:01]
Danny
So, Chris, this week on the show, I did court with a firm in Delarin.
[00:05]
Chris
Wait, we're together?
[00:10]
Danny
So, Chris, this week we're together.
[00:13]
Chris
Yes, very close. Uncomfortably close for the AI podcast.
[00:17]
Danny
Also the colors in the background. And of course, the majority of our listeners listen because they're called listeners, but we are together in the same room with the sort of AI lighting in the background. It's very important.
[00:31]
Chris
Yeah. Finally, I'll stop being called the ugly one because I have proper lighting on my face now.
[00:35]
Danny
You've got good lighting, good HD resolution, maybe even the thumbnail. This week people will be like, wow, he's. He. He was the. The hot one after all.
[00:45]
Chris
Yeah, that's my main goal of the podcast, is how people think I look.
[00:49]
Danny
Yeah, that's why we talk about AI. So we have been spending quite a bit, quite the. A lot of money on VO3, because instead of having to be in the US and pay about $500 of our Pacific Peso currency to get the Gemini Ultra or whatever it is, subscription Max Ultra plus to use VO3. This is, of course, Google's new video model with audio as well. So we've been playing around with it. It's now available on this service called Foul, but it costs $3.75 US for five seconds, and that's actually cheap.
[01:27]
Chris
There's another platform called Replicate that does it as well, and it's just flat 70. I think it's 75 or 78 cents per second of video. So for an eight second video, that's nearly $8, $7.
[01:38]
Danny
I know last week I tried to convince you that it was, you know, like, pretty cheap, but I don't know, after using it and hearing the audio quality and then how many iterations we had to go through to just get a few joke clips out of it, I would surmise have probably spent, you know, close to $100.
[01:57]
Chris
Yeah, it's expensive and especially when you need multiple iterations or you make a mistake like I did. One of the clips we made was meant to be a man. I wrote in a modern tech cafe, but instead of a tech cafe, I actually wrote Tech Cage. So it's a man in a cage with cats ranting about our podcast. Waste of $3.75 there.
[02:17]
Danny
So we thought a funny test would be to put together a series of these clips. And for those people listening, of course, you can hear the audio coming from the. The various clips to make up your own mind. But what we did was we fed in some of the most hilarious reviews, including like serious negative ones into the Google VO3 and got people to do all sorts of things like perform them, do interview styles. And I will play a few of those clips right now. Prepare yourself for a journey into the.
[02:45]
Chris
Heart of the most mind bogglingly average content on artificial intelligence you've ever heard.
[02:50]
Guest Expert
This day in AI.
[02:51]
Danny
The show neither impresses me nor disappoints me.
[02:54]
Chris
It's just average. I like the pig grooming. Peace. This podcast sometimes has an AI guest that interrupts the hosts. The hosts are average, to be honest, five stars.
[03:05]
Danny
They are unimpressed by and critical of virtually every AI announcement.
[03:10]
Chris
This can be valuable, but tends to.
[03:11]
Danny
Get annoying after a while. You know, I know you were worried about the cage, but it sort of does sum up the person who probably wrote that.
[03:19]
Chris
Well, that's true. His eyes are a bit freaky and I've noticed it doesn't do teeth well. It seems to like have like one tooth, you know, one big curved tooth in a lot of them. But the quality is pretty amazing and the adherence to prompt is very strong. Like it's quite amazing. Like in that with the lady, she has the Dario shirt on, it has the cheese in the background, which I asked for and it's able to do it quite well. And I was saying if you think about for marketing materials and advertising and things like that, it's already at a level that you could absolutely use these clips. It's 4K as well.
[03:53]
Danny
Yeah. I think using them as like cutaways for 3.75. If you just want some sort of establishing shot or you know, like different. What we probably should do if we edited our podcast correctly, like you have some gutter eyes occasionally to mix it up. So I think it would totally work for that. I do think though that a lot of the cherry picked examples we saw early on, you know, maybe like they definitely were cherry picked a lot. Like people were only showing off on X, the most brilliant things. And then some of the generations I've seen, you know, it trips up and there's a lot of different problems. So it's definitely progressing in the right direction and it's exciting, but it's an expensive hobby. Yeah, it's not something you want to just play around with unless you've got some real commercial use. But for people that are using stock video right now and do want some establishing shots and things like that, I think it's, it's more than enough. The only thing I did notice is I had to put the video clips into Premiere Pro and then Rework the audio to be able to play them because they, for some reason it doesn't do the levels. Like everything maxes out on the audio leveling. So if I played it naturally. Not that we've really ever edit the audio on this podcast, but if I played it naturally, it would like blow. You know, it has that annoying hiss, like where it peaks. I'm no audio expert, but it doesn't seem. It seems like that's something that would need to improve as well, but still incredible.
[05:18]
Chris
Yeah, my favorite one was the moshi one where it's like there's a. There's a cat guest who interrupts the host. That one's really funny. And the audio on that was cool. It sounded like he was outside. It sounded natural.
[05:29]
Danny
So, moving on. This week there were allegations again against Chinese lab Deep Seek, who, you know, updated a version of R1, the R1 reasoning model. This was the. The one that sort of blew everyone away. And everyone was like, oh no, you know, China's catching up and so on and so forth. And so there's been all this speculation that the new Deep Sea has been trained on Google Gemini. And I got thinking about this and, you know, the, the Chinese, if they are doing this at Deep seq, are clearly just maybe training on the best model. You know how it was speculated last time they were training on Claude, but then OpenAI was like, oh no, they're definitely training with us. It's sort of like the bragging rights now for the US Agency is all about, you know, who's actually training on which model. So I think this is the new benchmark, right? Deep Sea, Whichever model they're training on is clearly, that's the.
[06:29]
Chris
And I actually use Deep SEQ today as part of a demo to try Create with code mode. And I tell you what, it's very good now. Its ability to create a full. I made like a landing page website with very specific details and it didn't. Excellent job. Like, the model should not be underestimated. It's very strong.
[06:48]
Danny
And keep in mind these are just allegations by actually an Australian guy, a developer based in Melbourne, who analyzed the emotional intelligence of the model and was looking for, like, repetitive words and things like that and found that the like, top repetitive words are very similar now to Gemini, where I think he alleged in the past they were similar to OpenAI. So they're clearly using these models for synthetic data. If I worked at one of these labs, I'd be flattered that they're now, you know, picking my model. It Truly shows that I think I need to get the Sundar pendant going because clearly Gemini is on top. But they weren't just happy with the lead we had just this morning, really. A new tune of Gemini 2.5 Pro dropped into general availability. There's not a lot of data yet around this apart from them claiming it's now number one on a bunch of benchmarks. Again, it already was either number one or pretty high up on a lot of the different benchmarks. But interestingly enough for me, they talk about the tune of it. They've it says here it can be more creative with better formatted responses and they've addressed a lot of the feedback. Now I haven't had enough time to probably, you know, test the model, but I am hoping it does stuff like just better instruction, following around, you know, leaving comments in documents and code and you know, how like mixes its thinking and thoughts basically into, you know, into whatever it's out.
[08:21]
Chris
We also had some people throughout the week on, on Sim Theory talking about how the the current tune wasn't as good as the March tune and can they go back and so what we've done now is put both the March one back plus this new iteration. The other major update on this iteration is the ability to specify a thinking budget. So that's the amount of tokens that you make available for the model in order to think before it starts to answer. And so you can now go all the way up to 32,000 tokens of thinking budget. So if you think about models, there's some models still that only have 8,000 tokens output. This one can use 32,000 just for the thinking, let alone how much you can actually output in final content once it goes. And I think that what we're seeing in the MCP world and the tool call world is actually being able to think with the results of some of the tool calls is leading to far better outcomes and indeed better decision making about what tools to call next. And so having that thinking budget available shouldn't be underestimated in the next iteration of what we're going to see from AI models. So it's exciting and like you, I haven't really played with it yet, but we're ready to and I think it's a good, good update.
[09:34]
Danny
I also think Google strategy right now with the Gemini tunes, I mean they did announce it at IO and said that they were going to introduce the thinking budget. So it's right on time, right on schedule as they said. But I also think this strategy of this Constant drumbeat of like different tunes. While we criticize a lot, a lot of the time, because it's hard to, you know, say to someone consuming these models, oh, this is a preview or this is a, you know, an early release and it's never actually the final version, but it's kind of smart from a marketing and keeping in the, in the mainstream right now where they're just trickling it out. They seem to have learned that tactic from all the countering a year ago that OpenAI was doing where they dropped something and then it'd be trumped. Now they're just slowly dribbling these, these announcements out. And I like that they've clearly got a good model with Gemini 2.5 Pro and they're not really afraid to take the feedback on board and then tune it differently. Even if maybe that doesn't impress everyone time to time. Personally, I didn't notice any different between the first Gemini 2.5 Pro that I really liked. To that middle tune where some people wanted to go back.
[10:45]
Chris
I personally, I noticed only insofar as I've been using Claud all week. I noticed I found myself moving away from Gemini and then it wasn't until someone called it to my attention that it got worse that I, I noticed. But one advantage, I guess is that Gemini do keep their models separate. So often when OpenAI releases a point update, it just overwrites the previous one. You don't have a choice, you don't have the ability to compare. But in this case you do.
[11:10]
Danny
It's funny though that these tunes, like people are so sensitive to using them and the labs still don't understand fundamentally, you know, the difference. Like when they announce these, like the, the different tune, they really, they have no idea because it's impossible, given the scope of these things, to test every aspect of it. They just like, oh, yeah, it's like vibing better for us. And so as a result, you get the, oh, you know, it's scoring more ELO points, which really means nothing to no one. It's really about your own personal feeling.
[11:41]
Chris
And it constantly reminds you of that fear that these things that are so useful for you today may just be taken away on a whim. Like they can just disappear. And then all of a sudden you don't have the thing you were relying on to do all of your work.
[11:54]
Danny
Because that's the other thing too. Like they've got to have the server capability and the resourcing to spin this up, right, and host these models. And so, you know, that Some of these tunes are going to go away. I guess they do have the opportunity to revert back to the older tunes if they, if they want. But.
[12:11]
Chris
Yeah, but if you're in, if you're in love with March Gemini, well, it's going away. It's going to be there for a bit and then it'll be gone.
[12:17]
Danny
Yeah, it's like the summer's coming and you're going to get a different, a different model tune. But it, it is, it's, it's pretty interesting. Over on LM LM arena they have all these different leaderboard breakdowns now in a much like a much nicer formatted way. And just in the last sort of 12 hour scope or updates you're seeing that latest tune of Gemini 2.5 Pro number one for tags and then number two is the other tune just slightly below it. Web development, of course.
[12:50]
Chris
Do you think they also deliberately did 0605 and 0506 just to mess with people like me who need to keep these things separate?
[12:58]
Danny
Yeah, it's. They don't have a health. None of these labs have like a healthy relationship with naming things. I, I truly don't understand. But if you look at these benchmarks now. Vision, Vision Grounding search deep seat 2.5 is on this code. I don't really understand this. Copilot 1 I also think text to image GBT image 1. My experience through the week using Flux Con context. Yeah. And just the base Flux models. I think Flux context is better than GPT.
[13:31]
Chris
I agree. I think GPT image if you want. Maybe they're judging it based on prompt following because it's probably better at prompt following. But the images are cartoonish, they're unrealistic. They're not really what you're trying to do with AI images. In my opinion Flux context is much better at that in terms of following it, preserving character and producing images. And then I think that things like Imagen are really good at like creating unique interesting images that aren't based necessarily in reality.
[14:00]
Danny
But you were talking about in the week how, because you were working with MCPS and playing around with them, you were finding Claude, the Claude o person sonnet. Like you were sticking to those simply because there was some sort of agentic capability like you know, they were able to call multiple MCPs at once and stuff like that.
[14:22]
Chris
Yeah, like one of the most profound things I've ever witnessed because I was working on testing parallel tool calls for mcp. So the idea being if you're doing say research or something like that and you ask a query you might want, well, you definitely want to query multiple sources. So you might want to do a perplexity deep research, a Google deep research, a finance research and all these different tools. You want to gather the information together and make an assessment. Now most models, at least up to this point, will do that sequentially. So they'll call one tool, get the result, realize they need to do more work, call the next one. But some models are capable of parallel tool calling, including the Claude series. So I was testing on 3.5 and 3.7 but they would tend to, they would sometimes do parallel, but they would tend to do things sequentially. Then I flicked over to Claude Sonnet and suddenly it did 10 at once. It's just like I'm going to do 10 and called them all, synthesized those results back and said okay, now I'm going to do three more. And the results were just fascinating and comprehensive. Like the amount of thinking and time and work that went into this synthesis single question was astonishing. And I thought maybe these models, while we're just comparing them on the same basis we have of all time, once they're empowered with all of the tools, maybe they are profoundly better. We just haven't put them in the right circumstances to see that yet.
[15:48]
Danny
Yeah, and it makes me wonder too like if you know, because it's moving to this AI system layer where you know, you're starting to build this portfolio of MCPs or connections into your world that are like your sources or actions that can take on behalf of you if the model tune that you start to preference comes down to those agentic capabilities. And of course Anthropic did announce this when they announced the, the, the 4 Series model, saying that they were specifically tuned towards these agentic tars and like to give anthropic their credit, like 3.5 sonnet still has this special place in my heart, right. Like it was the first model that I thought was just truly tuned so well that it was just hard not to use it all the time. And I think the Sonnet and Opus, they could be remembered at least the four series is the first tuned in that agentic workflow where people do become reliant on their tool calling ability and that, that way of working. For me though, the question is given how quickly Google responds with the Gemini 2.5 Pro series, like, you know, once that has the like, who knows, there might be a tune that now spits out concurrent tool calling.
[17:05]
Chris
Well, and the other major, major advantage of Gemini is the context size because something that we were discussing just last night was when you do involve say 10 tool calls, there is a lot, an absolute wealth of data that can come out of correct research tool calling or any tool calling really. And, and in German, and sorry, in Claude, you've got to shove that into to a 200k context window. So suddenly you do all this work and it's able to use the tools brilliantly, but it isn't able to benefit from all of that work without summarization and other intervening steps where you lose that fidelity, like lossy sort of steps in between. Whereas with Gemini it's the exact opposite. I mean this thing can handle 3,000 files in a single request. So as it builds up the benefit of all of these tool calls and it's able to then take advantage of that built context. And I think one of the things we're definitely converging on is realizing that building a massive context for solving a problem is crucial to getting the best possible answer out of a model. So if the model itself can build that context with correct tool calling and then it's able to utilize that full context, it's going to be a lot more powerful, no matter how good the other one is, because it simply lacks that ability.
[18:20]
Danny
To me too, the context size. Like I think Sundar, the CEO of Google, was on interview this week and said something like one of their goals is infinite context next. And I think that at least in Gemini 2.5 Pro, that large context right now is what keeps me going back to it because I just feel really reassured that it's taking into account, you know, larger chunks of data that I'm working with. But I do find sometimes, unless you keep reminding it, even though it's got the big context and in theory it can, it still suffers from that context drift where unless I sort of re remind and be like, no, no, no, like look at this stuff, you know, versus just like, I don't know, I still have this problem where I'm like, I don't trust it. If, if all that content was pasted in like a kilometer ago in the chat history, does it still really focus in on that? So a lot of what I'm doing when I'm doing follow up prompts is at least trying to like refocus the model and say like, look at, you know, look at this again, this is like really important. And I sort of wonder with that infinite context, if it doesn't necessarily become about context length anymore, it's around how to, how does an AI system get into Focus what actually matters to you at that given point. Even if it's infinite, it's still going to matter.
[19:41]
Chris
And we've also been talking about this idea of going off on, on side, side tangents where you work with the AI on a sub part of the problem and you're like, let's solve this little bit. But you need to reunite that with your core purpose of the project you're working on. And we think that this is going to lead to needing a sort of goal list or like, here's what we're trying to do from this interaction and allow the AI to keep bringing things back to that and then extracting from those subcontexts what matters to the overall goal. So it's discarding the things that become irrelevant as they're solved and keeping the things that bring you back to that point and maybe going back, like you said this morning, and reevaluating all your recent interactions with it to work out what's relevant to the current task at hand.
[20:29]
Danny
Yeah, and I think it's, it's sort of in line with how the models think before answering and they're really just running inference on themselves. Right. To extract better, more valuable answers. And in a way, the idea of threading a chat session is saying, I want to go down this path now and ask you questions about this to get to some meaning and then just like bring that back into the core context. Like a lot of these problems weren't. We're sort of playing around, trying to, trying to solve right now. I think we've made a pretty big breakthrough in terms of just being able to fork out and, you know, like latch onto a historical context. But I do think that re figuring out how to reunite in a way that doesn't interrupt the main flow of like, what project or tasks that you're trying to accomplish is probably the next big thing. And of course, this isn't like a discussion around some form of agentic capability where you just want it doing that. It's like where you still want to be human in the loop.
[21:28]
Chris
Yeah, exactly. And I think we're definitely, as we speak to people who use AI more and more, everybody is coming to the realization that these, these chat contexts you build up through working with the AI are valuable. They're actual items of ip. Like if you're working for your job and you're building up this massive chat context, the knowledge contained in that is actually extremely valuable for accomplishing tasks. And, and then you've got to think about like, do I own this Data Do I, do I want the big model providers to have access to it and how can I preserve it to make sure I'm, I'm benefiting from it in the long run?
[22:05]
Danny
Yeah, like, because we see it all the time. People get these chat sessions or histories and they just want to cling onto it. And I do think at some point it does make sense. Like I know in Claude code, one of the techniques when that promptly was this idea of compaction which people have been doing for a while where you sort of, once the context reaches its limit in their case because it's quite small, like comparable to others right now, they will compact it down. So they'll be like, take what we've been talking about and basically, you know, synthesize it into this chunk and that forms the new context. But you know, I still think the larger context and the rolling context window where it somewhat infinite that Google's probably working towards, it's just so, so competitive. It makes me wonder like do anthropic secretly. Are they gonna drop like, oh, opus now supports a million. Or if they just trained on that, that size context that it's just too hard to.
[23:02]
Chris
Especially because they were the first to go from whatever the context was to like a hundred thousand and then 200,000. They were the ones who were like associated with that.
[23:12]
Danny
Talking about Pentium models, like, remember the days of 32k and MMX was very.
[23:17]
Chris
Good when that came, wasn't it?
[23:19]
Danny
Like 6. I think when we first started using AI models it was like 4k output or even less maybe. And then, you know, I remember thinking 32K. I'm like, oh, and this is, it feels like like a penny a model. I forget. Or 486 or whatever we used to count back then. There was something before 486-386-2286.
[23:39]
Chris
Yeah.
[23:40]
Danny
So yeah, but the, I guess the difference is that was like a timeline of years. Like you could sit on your 486Six for like, you know, a couple of good, good years. You had a few good years with it and now it's like, you know, a few good weeks. Maybe one week. Yeah. Gemini's last June was only like I think three weeks ago. And we're like, oh, I'm not happy. I want to go.
[24:00]
Chris
It really was. You're right. Look at the dates. It's very confusing.
[24:04]
Danny
So something like pretty alarming to me this week and I, I, we actually were talking about it last night. So earlier in the week, the, one of the founders of, of Wind Surf, which is a competitor to cursor the. It's like the, you know, Vibe coding client, let's call it. And so, so they came out and basically said that the. They've been cut off from the Claude models. So the three series and the four series they no longer have access to. Now we did talk on the show about the fact that these guys had allegedly been acquired by OpenAI but there's still no announcement or evidence that that's occurred, but everyone's just assuming they have and I'll continue to assume that until told otherwise. So we had this, this whole kind of blob about it and the Anthropic co founder finally responded. There's an article in, in TechCrunch that I've got up on the screen now. Anthropic co founder on cutting access to Windserve. It would be odd for us to sell Claude to Open AI. And they that his comment was pretty vague. We really are just trying to enable our customers who are going to sus. Going to sustainably be working with us in the future.
[25:18]
Chris
So it's revenge, basically.
[25:19]
Danny
Yeah, it's I think pettiness for Vibe coding. Most developers prefer the Claude models, I think that's fair to say. I would say there's a growing cohort that now are in the Gemini Pro camp. Like they've. I would include myself in that. Like I just think you get better output, especially on SIM theory with create with code like, like Gemini 2.5 is just consistently better, especially for 3D visualizations or anything I think quite meaningful. But there are a lot of people, you know, with MCPS and tool calls and other capabilities in these IDEs now that have become reliant on them or.
[25:55]
Chris
Are just simply used to working with it. Like I, you know, I got very familiar with working with the claw models and I still go back to it on a job semi daily basis. So I totally understand the need to have those models available in a tool like that.
[26:08]
Danny
And so I guess because OpenAI have acquired them, they've said well you can't use Claude anymore because that's a competitor. Which I think, you know, in any other normal world would make total sense to me. The competitors just like why would we give you access to our models? But then it gets me sort of thinking. It's like do these labs now take this as a sign because this is the first time it's happened. It won't be the last.
[26:31]
Chris
We predicted, I think on our first episode ever of this podcast that what if they take the models away like, that was our fear right at the start that these things are so amazing and so much potential. They may just realize that we don't want everyone to have access to this. Only we do or our special in a group does. And at any time they can cut off that tap.
[26:52]
Danny
Yeah. And that they wouldn't be open and give you access to these models. So this is the first time. Time we've seen this now happen. And I think, I think this is, this is a historical moment in terms of AI labs and AI models because it's, it's the first rug pull really that we've seen. And I'm sure a lot of people out there will defend it. Like, oh, well, why? Like, I just tried to, like, why would they give their models to a competitor? But to me, why wouldn't they as well? Like, it's almost making the mockery of OpenAI by being like, oh, you know, you're users of that thing you acquired for a few Billy, which is a fork of VS code. Their users prefer the code models. And if I was OpenAI, I would sort of see it as like a challenge. Like, hey, I want to learn what they're doing. And I guess this is what Anthropic's thinking as well, is like if Windserve might have access to the code or the outputs. Now they're like our friends over at Deep Sea being able to train on Anthropic models and really extract that secret. So Source. Yeah. From the model.
[27:58]
Chris
It's hard to know the motivation, but yeah, I guess it's just like the. You don't want who you perceive as your biggest enemy having your stuff. I mean, it seems, seems fair. But I also think that the Windsurf people, while no public acquisition has been announced, must also feel like, hang on, our whole value here has just been taken away overnight. Like, that's the other thing. They didn't exactly give them notice.
[28:24]
Danny
No. And they, I think they gave them a few days notice and then they Windsurf responded by basically discounting the Gemini 2.45 model. So they were taking a hit of like 25 on that model. But isn't it a sign of the times when a company OpenAI bought bought, they're like, ashamed. Like, oh, we still have Gemini for GPT 4.1, guys. Like, if you might want to consider using it, like, like, I know I'll probably cop a lot of stuff for this because there's a lot of fanboys out there, but the reality is like, most developers do not work. I don't know anyone that works with.
[29:04]
Chris
Gemini 4.1 has to be one of the most GPT. GPT. Sorry, GPT 4.1 just has to be one of the most middling, disappointing, mediocre models out there. And it's not that it's bad, it's just that there's so many better alternatives. It never gets any attention.
[29:23]
Danny
The thing I don't get about GPT 4.1 is I have this like sneaky little secret that I use it quite a bit and I don't use it for coding. I use it. I think it's a great conversational model. It's really fast. It's also, its vision capability is so far ahead, especially of the new Claude 4 models. I mean for vision they're so bad. Like the worst of any model. Like if you're doing any vision related task, I would avoid sonic.
[29:53]
Chris
And yet they rank one and two on that thing you had up earlier.
[29:57]
Danny
Yeah, I want to see that again.
[29:58]
Chris
Yeah, they were definitely one and two on Vision.
[30:00]
Danny
Oh, Gemini. No, they weren't even close. They're not even on the board.
[30:04]
Chris
My apologies.
[30:05]
Danny
Yeah, right. So, but, so GPT 4.5 was up there, but they've discontinued that I think now officially. So 4.1, if you look at it kind of high up there, but I just realized Gemini 2.5 Flash is even better and I do enjoy that for the speed. But if I want some speed and I want some real talk, I do, I do like a good sesh with GPT 4.1.
[30:27]
Chris
Isn't it crazy for all the hype around 4.5 and how expensive that was?
[30:32]
Danny
Well, that was GPT 5 failed training run and then, and then there's all these people out there that really liked it. But I, I just felt rich people. Yeah, it was like too hard to use it. I mean even on ChatGPT they limited you to like 10 messages or something. It's like, how did you get a feel for it in 10 messages? I'm exaggerating obviously. And then I think O3, I think the only reason that Stack ranks up there is I think the OpenAI models have always been exceptional at vision. So that doesn't surprise me. But also the, the on the text it being like tied to with the other tune of. Of Gemini, whatever that means. It's just its tool calling ability that because the tool calling so well integrated into its thinking step, it performs really well. But as a RAW model, it's not so good.
[31:23]
Chris
Yeah, that's right. And that's why it's never going to operate in something like Windsurf because no one's going to sit around to let it think for 30 seconds to make a small update to your code.
[31:32]
Danny
Yeah. For me personally, though, right now, I like for fast sort of conversational things. It's GPT 4.1. Weirdly, I don't know why. I'll never explain it. I. It just feels good. It's Sonnet 4. I've. I don't know why I think it's the Sonnet brand. Can't really tell a difference between Opus and Sonnet. I know some people say they can. I don't know about it, but GBT, sorry, Gemini 2.5 is like, if you had a gun to my head and said, you can live the rest of your life at this moment with one model, it would. It would still be 2.5.
[32:05]
Chris
Yeah.
[32:06]
Danny
So Sundar pendant is incoming. I'm gonna. I'm gonna work on it this weekend. That's gonna be my thing. But this. This back to this Windserve thing, like, is it keeping you up at night? Like. Because I guess my point here could be this, right? You base some, like, really good app as a developer, you're building on Sonnet, right? You're like. Because they're like, oh, you know, develop. They've got their dev relations guy. I don't know what his name is. I forget it. But out there being like, time to build, and all these, like, statements they go on with. And then you do go on build, which the Windsurf team did, and then they're just like, oh, no.
[32:42]
Chris
You know what it reminds me of? It's like how casinos are your friend and will give you free drinks and free food and look after you, but then someone's like a card counter, and they start winning and they kick them out publicly in front of everyone, bash them up, drag them out, whatever they do. And suddenly you're like, oh, hang on a sec. They are a big, evil corporation who don't care about people. It's just that I haven't. I haven't annoyed them. Like, I haven't crossed them. So they don't care. And I think that's what it is. It's like this reality check, this alarming check where you're like, geez, okay. They do have that power and they've exercised it, which, as you said, hasn't happened before. However, I don't actually fear being cut off because I think there's a lot of value in all the models we see around. And I feel like even if we were cut off for everything except one modern AI model, you could still get a lot out of it in terms of what we're trying to do. I think spending more time with one model, you'd probably be able to learn its strengths and weaknesses better and take advantage of those.
[33:44]
Danny
I never thought I'd say this. It makes me appreciate what the Chinese are doing with AI because even though, yeah, maybe they're training on the outputs and it's not like ethical to the guys that invested all the money scraping the web and stealing our data to profit from models, it's now unethical because, you know, the Chinese are scraping the actual model instead of just the web. But I guess having that available as an open source model and some of these open source models, when this hit, I was like, okay, the reality is I could probably just tune R1 to feel like I won at some point if this, this goes nuclear over time where they just start ripping them and they, they're like, we're going to control all products on top of our OS or.
[34:32]
Chris
Yeah. And also this is one major provider out of three and you've got some in the wings like Grok and stuff which are, you know, up there. So I think there's just enough competition there that you'd never going to be cut off completely as Windsurf isn't.
[34:48]
Danny
I just don't understand why they did this and maybe it's because they're worried about OpenAI training on their outputs, which is just like sick sad. Surely it's not that. And I get, from a business standpoint it might make sense, but from goodwill of developers. Yeah, they've just lost my goodwill and like I said, I don't trust.
[35:04]
Chris
It's like bashing up the guy at the casino. It's like, it's not that, it's not that that will affect anyone's business, it's just that you suddenly are just aware of the reality of what's going on.
[35:14]
Danny
Yeah, I, I think they should reverse this decision. It sounds like what the Windsurf team was able to do because there was a recent update posted is go to, I'm assuming Amazon and consume the models there. So they are saying they're going to get capacity and bring them.
[35:29]
Chris
Oh, and Anthropic can't block that?
[35:31]
Danny
No, it appears not. Because they must have some deal in place or they could go to Google Cloud and just. They also host those, you know, so I, I get the feeling that they will be back. But it's like the Anthropic directly have banned them from their API. Anyway, I think this is a huge. I mean, if anyone from Anthropic listens, don't ban us. Buy a pendant. Buy the Dario pendant. That would be my first advice. But yeah, I just think this is a terrible move. And I, like, they've lost a lot of trust over it. I think it's going to be like a bigger problem than they think. All right, so a bit of a reality. Time for a reality check. I'm interested to get your response to this because this really triggers people that are like, you know, AGI believers, so feel like we should, we should trigger them a little bit. So I'm going to butcher the name Jan Lecun Lee Kung. I won't go there. But anyway, the French dude from Facebook that he's. He's sort of got a reputation now of saying things that then later worked. People alleged, like, sorry, saying things that would never work and then did work and various other things. But he's been a bit poo pooing on the LLMs lately. He's been poo pooing them, saying, you know, if you just infinitely scale up these LLMs, they're not going to reach AGI. And I don't know if I'm just getting older, but some of the stuff he's saying I really agree with. Let's hear. We'll play the clip and then, and then we can talk about it.
[37:08]
Yann LeCun
We are not going to get to human level AI by just scaling up LLMs. This is just not going to happen. Okay?
[37:16]
Danny
That's your perspective.
[37:17]
Yann LeCun
There's no way. Okay? Absolutely no way. And whatever you can hear from some of my more adventurous colleagues, it's not going to happen within the next two years. There's absolutely no way in hell to pardon my French, the idea that we're going to have a country of genius in the data center, that's complete bs. There's absolutely no way. What we're going to have maybe is systems that are trained on sufficiently large amounts of data that any question that any reasonable person may ask will find an answer through those systems. And it would feel like you have a PhD sitting next to you, but it's not a PhD you have next to you. It's a system with gigantic memory and retrieval ability, not a system that can invent solutions to new problems, which is really what a PhD is.
[38:11]
Danny
Okay, all right, so what did you say?
[38:14]
Chris
He's saying, like, it can do all the things a PhD can do. It can answer questions that Any reasonable person might ask it, but it's not intelligent is what he's saying. I, what, what I don't understand is, I guess he's saying it lacks agency or I don't really understand the point he's trying to make. Like, I think don't get too excited because it's not that good, even though it can do all these good things.
[38:39]
Danny
But I think what he's saying is the LLM, at least in its current formation, won't scale up to the point where, you know, there's that whole counter thing people say at the moment where, you know, a teenager can learn to drive in like 20 hours or whatever on average. Yet you know, we've had to train on like billions of parameters and we still don't have cars everywhere driving themselves. So I think, you know, he has some, maybe like points there and often I'll be using the LLMs. I'm like, God, this thing's so stupid. And then other days I'm like, this is a super God. All hail super God.
[39:15]
Chris
I think the thing for me is, and he, maybe he is right, he knows a lot more about it than I do. But my point is this. So what? It can still do heaps of really useful things across many industries and it is going to impact people's lives and jobs and the way we do things in many industries too. Regardless of whether it meets some definition of intelligence, it still can do a lot of really useful things across extreme amount of domains. And it's getting better all the time and it's getting better in unexpected development. Like, you know, as you just said, it's rapidly getting better. So just because it doesn't meet some arbitrary definition of intelligence doesn't mean it's not hyper relevant and really going to impact the economy of countries, the way people work, the way people live. Like there's going to be, there is already real world impact from this technology. So I just don't understand why people are sitting around.
[40:13]
Danny
But I think he's more speaking to at least the side of the clip. Like we're not going to achieve some AGI super intelligence God with this technology, which I 100% agree with, like there's going to have to be an evolution or other breakthroughs that happen along the way. Yeah, but as you said, I think even with the progression we've had so far, we can now build AI systems that will change the world. Like will fundamentally change everything and are changing everything already.
[40:39]
Chris
Well, yes, and also while it may not meet the definition for some super intelligence, we are going to have AI systems that are making decisions. Like, we're going to have systems there that have the ability to take actions through tool calls, make decisions through research and their own decision making, and those decisions will have actual impact in the real world. Like, regardless of what anyone says, people are going to put these in businesses where they might be deciding if you get a loan or not, deciding if we're going to hire someone or not. These are, these systems are being developed and they're going to exist. So whether or not it's actual intelligence, it is going to have a sort of intelligent effect on the world with all its flaws. So I guess I don't. I misunderstand probably the point he's trying to make. I agree it probably isn't going to lead to AGI, but there is intelligence there and the intelligence is going to be used by people and therefore we need to consider the impacts of that.
[41:39]
Danny
Yeah, I'm just of the opinion that and, and like, you know, we'll see over time how quickly I'm proven wrong. But I do think we could see a situation where the LLM capability sort of does peak. Right. It just, it reaches a peak. And I think we've seen a little bit of that where it's getting better and refined. Like, the VO3 is a good example, and I'm sure that will get so much better. And in another year we'll be like, oh, I can't believe we thought VO3 was good.
[42:09]
Chris
Yeah.
[42:10]
Danny
And I think with the models, probably the same thing, but I don't think they're on this exponential path right now where I worry about the future of humanity. No, okay, I agree with you. I don't. I think that eventually people will just finally realize that, oh, this is just a new tool in my toolkit. I can be way more productive. I can create a business on my own. I can do all these other things. I can learn anything. Like, they'll truly understand the value of what it is that's being created. And I think that, you know, that whole analogy like, we've created fire and we just don't realize how good this.
[42:43]
Chris
And definitely, I mean, we haven't had a doomsday episode for a while. But when I talk to people about AI now, it's all practical. It's all talking about what they're going to use it for in their industry and their job, or how people they know are using, or how their kids are using it at school, the different ways they're going to have to change the way People are evaluated in their jobs and education and those kind of things. They're. The conversations people are having. They're not going, well, what's the point? Like, the going on, I'm just going to sit at home and wait until I die? You know, Like, I think there's less of that. And I think people like to joke about that and just say, oh, well, you know, who cares? Why even bother learning? But I don't think that's what's actually happening. I think generally people like the technology.
[43:27]
Danny
No, I mean, there's a lot less fear porn out there in general right now. Unless you're the CEO of Anthropic, the safety sex cult CEO himself. And you need to raise more money because then what you do is you go on mainstream media news and say things like, we need to be raising the alarm arms, just like our friend.
[43:52]
Guest Expert
People have adapted to past technological changes. But I'll say again, everyone I've talked to have said this technological change looks different. It looks faster, it looks harder to adapt to. It's broader. The pace of progress keeps catching people off guard. And, and so I don't know exactly how fast, you know, the, the, you know, the job concerns are going to come. I don't know how fast people are going to adapt. It's possible it'll be, it'll, it'll, it'll, it'll all be okay. But I think that's, I think that's too sanguine an approach. I think we do need to be raising the alarm. I think we do need to be concerned about it. I think policymakers do need to worry about it. If they do worry and they do act, then maybe we can prevent it. But we're not going to prevent it just by saying everything's going to be okay. In terms of inequality.
[44:39]
Chris
Yeah, I'm a little.
[44:41]
Danny
Little.
[44:42]
Chris
Yeah. And it's funny because now I'm sort of the other direction where I'm like, that is so hyperbolic. It's like to, to act like, oh, we better do like, guys, you better stop us. Like, hold me back, hold me back. I'm gonna accidentally destroy the world with what I'm creating.
[44:57]
Danny
Maybe you can talk once he gets to a 1 milli. Context size. Like, yeah, call us back, bro.
[45:04]
Chris
Well, and it's like he's also doing it now as the guy who doesn't have the top model anymore. You know, like, it's like it's catching us off guard, but it's like, we didn't catch ourselves off guard this time. Time someone else did.
[45:14]
Danny
No. I really appreciated the commentary of Sundar at Google throughout the week. Now he's actually being real and going on, you know, podcasts and various other channels and talking real talk. Right. And he's sort of joking around like he's in a position where the guy is like, obviously like in command of Google filthy rich, like doesn't have any. Like, you know, he wants to succeed and he wants Google to succeed, clearly. But he's also, you can just tell he's able to call BS on a lot of this stuff. And he said, you know, he says like this, he's honest, he's like this. We might hit. They don't see some sort of peak yet of them, you know, hitting a top, but we might. And we might, it might, you know, we might flatline for a few years and then keep going and need to find a new approach. So I think he's just talking like really honest and he, he's saying like, these are the next problems we need to solve, like infinite contexts and better memory and all this stuff we go on about. And he's talking very practical. And it used to be, we would say anthropic with the adults in the room, but any time I've noticed they need to fundraise, they just come out and spread doom porn like there's no tomorrow, you know. And I do, I think it's reactionary to not being on top anymore.
[46:27]
Chris
Well, I mean, look at OpenAI did the same thing. Remember they did that world tour where he's going around telling everyone, oh, if you've seen what we've got behind the scenes, you wouldn't believe you better make laws, you better do a this stuff. And, and, and what came of it, they can't even get close to the top model anymore. And so I think that it is just a sort of marketing technique if anything. And it's not real. I think if they, if they really did have that stuff, I think we'd either hear nothing at all and they're hiding it or, or they would actually be demonstrating why you should fear them.
[47:01]
Danny
Yeah, and I like, then you think about, you know, are they just trying to like force legislation to, you know, ingrain those major model providers and create a moat that way. And, and, but then you also see their product releases like now, today we're releasing this new button in Claude and you're like, well, hang on, is it AGI and end of world scenario or is it like an integration with SharePoint? Like which one is it, guys? Because it's, it doesn't really align.
[47:30]
Chris
Yeah. The future of the world is not like a zapier like product, I hope.
[47:34]
Danny
Yeah. Anyway, so in, in other news and the sort of final thing I wanted to cover today in the episode that we prepared three minutes before I'm going to do. Yeah. Is, is this court order. Very well researched segment, this court order. OpenAI slams court order to save all chat GPT logs including deleted chat. So I'll give it my best summary of this. Essentially there's a large amount of news organizations trying to sue open AI for training on their data. This is the New York Times is sort of the, the main plaintiff, I believe the original.
[48:12]
Chris
We're still relevant kind of people.
[48:15]
Danny
Yeah. And so they, they have basically there's been this, I don't know what you call it in, in court, but they've said that they've got the court filing or the judges ruled they need to retain all these records.
[48:28]
Chris
Like an injunction or something.
[48:29]
Danny
Yeah, like deleted chats, private chats, their logs, like all this stuff that and through the API as well. So if you consume directly through their.
[48:40]
Chris
API, that's, that's very scary because I think a lot of people for a long time have reassured their customers, hey, we're using the API, don't worry because they don't store that stuff. But seems like they might.
[48:52]
Danny
Yeah. And this, they've, they've obviously not, I don't believe they've complied. They've come back and argued this and said, you know, As a result, OpenAI is forced to jettison its commitment to allow users to control when and how their chat GBT conversation data is used and whether it is retained. So people having like private conversations, like.
[49:11]
Chris
Think of the depth.
[49:12]
Danny
This isn't just like your search history where you might be like, you know, x website but you're really looking for Twitter.
[49:20]
Chris
Yeah, yeah.
[49:21]
Danny
But anyway, you know what I mean, it's not like your search history where you can infer a lot of data and like providers like Google made a business out of selling that tracking data to advertisers so they kind of could infer purchasing decisions ahead of time.
[49:34]
Chris
And then, yeah, like you mentioned, you mentioned that you want to buy a fridge and then a few days later all you see is ads for fridges.
[49:40]
Danny
But this is so much deeper, like the depth. Like, you know, you might have a, an assistant that's a psychologist. Right. And you're telling it you're inner feelings or you know, you've got an AI doctor and you're putting in your medical records because you live in a regional area where you don't have access to, to good health care. And so you're trying to get some insight or maybe you can't afford it. All this stuff is going in there and I don't know that. Like, I think with AI, it's the technology that we should be fighting very hard for, for individual rights around privacy and, you know, security and just encryption. Because to me it's such a personal technology.
[50:20]
Chris
Well, it's an area where I think the GDPR laws should be enforced the strongest. So the right to know what data they have stored about you. Because clearly here they're talking about giving access to deleted chats. I would think it's a fair assumption that most people assume if you delete a chat, it's deleted.
[50:36]
Danny
Deleted, like hard deleted. Yeah.
[50:38]
Chris
So I think the basic person is not going to assume, oh, it's a soft delete, they keep the data anyway. Like that's not a normal assumption. And then the right to say, if I ask you to delete my stuff, I need to be sure it's actually gone. And it's clear that they're not providing that. And so it really is an area where I think things like the GDPR need to be extremely hard enforced to force the companies to take it seriously.
[51:05]
Danny
Yeah, like if you're going to talk about any form of legislation or anything, it's just consumer protections around the right to not have your data trained on, even if the products provided for free, I think is a fair, A fair, like it should be a human right with AI, really. And then, and then just privacy encryption, like the right to understand where this data is stored. To me, at a base level with the depth of conversations, it's going to sound like I've told the AI, like killed someone at this time. But I do, I really do think that of all things, is something early in this era, if it's fought for in the battles won now, that protects everyone moving forward into the future. Whereas if it's not, we end up in a situation again where the best, the smartest and brightest people of our era go into like engagement and ad selling because that's the market that's created. Whereas I think the market should be. No, like we're not going to let you sell ads, we're not going to let you harvest this data. It's just going to be legislated away. So that can't be the business model. And yeah, you're going to have to pay a fee to access AI. But then you're not compromising your privacy.
[52:13]
Chris
And stop me if I've mentioned this before, because I might have, but once you start to connect MCP servers and you connect to your email, your calendar, potentially your desktop computer and your hard drive, you connect it to your accounting software, you connect it to your bank account, people are going to do this. Like, regardless of how seriously someone takes personal privacy, there is a group of people who will just connect everything. Just free go for it. Like, just connect everything. Yeah, because it's great, it's convenient. And I said this this morning. It's like, if you've got a system that will solve all your problems and do all your work for you, you want that thing like, you want it, and you're just going to give it whatever powers it needs. Now, if the company providing that service, then is like, okay, well, I'm just going to. Everything I ever see, I'm going to keep. And then if someone orders me to, I'm going to give it all to them, no questions asked. And you have no control over. And even if you change your mind later, I didn't actually delete it. Sorry. I've got it saved on disk. All your emails ever. I've got a backup copy saved and I'm going to send it to the government or whoever asked me. Like, that's an incredibly violating thing that could potentially happen to you over clicking a couple of buttons.
[53:26]
Danny
But to me, the bigger issue is still the issue that exists today is like, organizations that haven't got their own AI systems or workspaces or the, you know, these structures set up, right? And their staff, I know this firsthand, they go off and they will put personal data or any forms of data, any documents into these free AI systems which knowingly are then extracting that data and training on it. And right now, under the terms and conditions, they're allowed to do that because that's the exchange for free access to these models. And so it's like if you don't provide a solution, right, they're going to do it anyway. And so, like, there's all. I just think there's just so many issues and you're just handing over all of this personalized data or company data or methodologies for how you do things, and then they're able to go train on that data just like they scrape the whole web and steal all this content. So, like, I'm all for the progress of the technology and quite frankly, anything that's put on the web, I think should be able to Be seen, scraped is my honest opinion. But at the same time, I do just think that people right now want access to the technology because it's so good.
[54:42]
Chris
Yeah.
[54:43]
Danny
But they're just not aware that this is now all like. Like, you know, you put in you. You upload to, say, a free account of ChatGPT, like information about your kids or something, and. Or, you know, you paste in an API key and then that accidentally gets trained on because you're just not.
[55:02]
Chris
And that's the thing. I think people need to be aware that they need, like, to provide. Say it's an organization. You need to provide your team with an alternative that doesn't have that. That risk associated with it because they're going to go find something if you don't provide it. I think that's the real critical thing. This is not a problem people can ignore because everybody uses this stuff. It's just a matter of which one they're going to use. And do you really want to be using the ones that are embroiled in this stuff?
[55:32]
Danny
Well, to be fair, I like mad respect OpenAI like, they're fighting this. It's not their fault they're being summoned by a judge. But I think the judge in this case is doing wrong by the users, like exposing their details for a court case. I think that's just there needs to be, like, basic human rights stuff around access to this technology. And if there's not, then. And I don't. Look, I'm not that optimistic. I don't think they will. But we're going to end up in this, like, weird future where, like, it's sort of like last week we did the whole Claude calling the cops on you. Right. And it starts to make you wonder. It's like, are they going to scan your chat histories? And then, like, thought, you know, thought crime. Like, oh, he's going to commit a crime. Like, he's going to delete the files of his wife's computer. We've got to send the cops, like, right now. And I don't like, that's the world I don't want to be in. I want to be in the world where, you know, we're super productive. We have more time to work on things we want to and we use this technology for good and our rights, like our right to privacy. And, you know, you would hope, you would hope someone leads from the front here, but then you've got Dario, someone who, I mean, I have a pendant necklace, so I'm so. Clearly, I respect the guy. Yeah. And he Creates the finest models. But come on, like what are they doing? Like be real here. Like if you have something in your lab that you think is going to end the world soon, you know, prove it.
[57:04]
Chris
Like I agree. I think at this point they need to prove it rather than just going on CNN and something and just sharing doom and gloom to raise money.
[57:13]
Danny
Yeah. And look, I know, I think there is some small job disruption coming around, like maybe customer support and you know, various other cost savings initiatives that come through it, but really we know from working with it that's going through like process mapping and automation. It's not necessarily coming through some sentient.
[57:34]
Chris
Everyone, everyone shared. In Australia we have like Burger King, but it's called Hungry Jack's here for reasons I won't go into. But they shared this video where it was like a drive through and the drive through was a text to speech, speech to text model that would take someone's order and they're like, oh my God, a billion teenagers just lost their jobs. Or 17,000 teenagers just lost their jobs. I'm like anyone who couldn't see that one coming was an absolute idiot. Like McDonald's has had automated ordering for ages inside the store. It was a matter of time before they had a basic voice model doing it outside. It's like, but this isn't really artificial intelligence. It's like pattern recognition and pattern matching. This could have been done anyway. This could have been done 10 years ago with technology that existed then. There's no intelligence.
[58:22]
Danny
Yeah, it's now just shareholders.
[58:23]
Chris
It's not suggesting like ideas for burgers. It's like, hey, you know, if you order a few extra patties and some extra pickles, you can do the pickle challenge or whatever it is. It's like that, you know, not that that's intelligent thing to do, but you know what I mean, it's a, there's no creativity, it's just, it's just following a system. So yeah, like I think there's a big difference between jobs that require real intelligence and just following procedures.
[58:47]
Danny
Yeah, I just think that's the era we're entering now where like increasingly these, these AI workspaces or these consoles, the way you'll access most of the endpoints in your day to day, like your personal and professional life. So that becomes the interface point and you know, and then off that I'm assuming the next layer will be, you know, these purpose built, very boutique and bespoke applications that like we might look at today and say, oh, these are throwaway apps. But they might become pretty vital to the organizations. And I think that does fundamentally start to disrupt, like software development and, you know, like software tools in general. I think that's just obvious.
[59:29]
Chris
It's not going to be your, it's not going to be your software stack anymore. Like, what software stack do you use at your company? It's going to be your MCP stack or your model stack. Like, there's going to be people who like curate this stuff and go, here's, here's the most productive way to work. Mix these things.
[59:44]
Danny
How is that disruption any different in reality to when like mobile came along or social media came along and everyone's like, you got to be mobile and social now. To me, this is just the next thing. And I'm not underestimating the technology because I think it's more like it's a bigger, it's a bigger shift. But I also think to ground a little bit in reality here of what's going to happen is people, at least these people on X. And a lot of these CEOs seem to underestimate how long it takes society to adapt to these technologies. So we got, I think we're going to spend years doing basic level automation because it's not easy with AI software will start being replaced. Yes, some jobs will get discontinued, but I think for the most part, again, like I always say, most people who adopt this, understand it, learn it now, will just be more productive and more satisfied, learn more things, be smarter and just feel like just life will be better.
[60:42]
Chris
Yeah, it's not going to be like robots with lasers on their head walking down the street killing people for thought crimes. Yeah, yeah.
[60:48]
Danny
We do want that dishwashing sort of kitchen one, though.
[60:52]
Chris
Yes, exactly.
[60:54]
Danny
When I get that, I, I'm, I'm just done. That's. That. That's AGI for me.
[60:58]
Chris
They're gonna come out, but they're gonna be like 200,000US dollars and you know, you gotta be on a waiting list.
[61:04]
Danny
Wonder how, like if they're really slow and clunky and like your dog knocks it over and like, can you pick up the robot? You know, it's like the Roomba and it just keeps running into the wall like we have now. All right, so one final question before we go. What is your actual daily driver at the moment?
[61:24]
Chris
Well, Claude Sonnet 4. I'm using Claude.
[61:26]
Danny
Why not Opus, though? Here's the.
[61:28]
Chris
Like, I'm just curious because still the. Just the capacity on tokens is just not there. I just can't get it working reliably enough. So I'm just, I'm just reticent to use it Now. I know that's not. I'm sure like other people have the allocation to be able to do that. So it's not a criticism of the model. It's simply that Sonnet works for me every time. The speed isn't too bad, pretty fast. And because now most of the work I do with AI now involves MCP calls. Like not just because I'm testing it, but anything outside of actual coding. I'm genuinely using MCPS a lot now. And therefore its ability to call them is great, especially the parallel. Like the ability to do the two calls in parallel makes it achievable to work with it on a regular basis. Whereas if you're waiting, you can yeah, do things in the background and wait for them to complete. But it's still better if it's a bit faster. Sonnet just seems to get it when it comes to that calling. So that's why I'm using it. Gemini can do it too, don't get me wrong. But I just find at the moment that's working for me.
[62:27]
Danny
I think as 3.5 was sort of the vibe code revolutionizer. I think just like anthropic always they have, they have purpose behind the models. And the purpose of the four series is clearly these agent long form tasks where instead of having like a loop or a clock, the clock is like weirdly in the model. To me that's a concept no one really has mentioned or ever talks about. Is that this thing? Actually that's the first sign of thinking after saying HDI is far away. Like it's actually. It's got its own clock speed does.
[63:03]
Chris
Yes. And it's very interesting when it decides to stop calling tools and decides to just answer your question. Like it really is self regulating in that sense. And we've actually discussed what the best way to handle this is and actually I know it's late in the pod, but if you've got comments about this I'd be curious because how much do people want to control how much it thinks? Because you can encourage, you can force it to continue thinking. You can also cut it off and just remove its access to the tools at some point to stop it. Or do you always want to let the model decide and just go naturally? Because so far natural seems very good. It seems to really get the significance of the question you're asking and self regulate.
[63:43]
Danny
But I think that's the thing they said in the model. It does somewhat decide on its level of thinking as well. And maybe that's what we're alluding to with this clock. Clock. That's the only way I can explain it. Like in gaming they have, you know, in a game you build, you have a clock speed, which is sort of the tick rate of the game. And it starting to feel like in these models there's, there is an internal clock. And to me, going more down that path is exciting. I don't think though, like, I'm fully contradicting everything I just said on the episode.
[64:16]
Chris
That's. That is our style.
[64:17]
Danny
Yeah. So anyway, I've hedged, I've hedged. I've said both. So we're fine. All right. Well, that was our first ever show in person. I can like, it's.
[64:27]
Chris
Should we touch?
[64:28]
Danny
We don't often.
[64:28]
Chris
We don't often touch.
[64:29]
Danny
This is the most affection. Yeah, it's weird like, because you're like, do I look here? Do I look there?
[64:35]
Chris
Yeah, I think we spent most of the time just like looking at the desk.
[64:38]
Danny
Yeah, I was definitely just looking at the, the colors on the little soundboard here. So hope you enjoyed the in person episode. Exclusive in person episode. If you like the show, please do leave a comment, like, and all things. What other. Oh, wait, this is how you be a tick tock influencer. So, Chris, what did you think of this show?
[64:57]
Chris
It. Oh, sorry.
[64:59]
Danny
Definitely needed to watch.
[65:00]
Chris
It was average or below average.
[65:02]
Danny
All right, we'll see you next week. Goodbye.