
Loading summary
Mike
So, Chris, this week, finally, after many a tune, Gemini 2.5. The Gemini 2.5 family, I should say of thinking models, was released to general availability. We had Gemini 2.5 Pro, now generally available. Gemini 2.5 Flash is generally available. And they, they actually put. Which I find funny and stable after the like. Could you imagine any other character, like release where they have to put, oh, it's stable like the, like the Titan.
Chris
Sub, like, this one will work, guys. Don't worry.
Mike
This summary is very stable. And then we got into preview. Gemini 2.5 flashlight, and it is just astonishingly fast, this thing. Like, unbelievably fast. Not so really cool.
Chris
Yeah, I've got. I've got the. I think I told you before, but I've got the function in our system called I will solve your problems. Like it's a. It's a universal AI method that'll solve anything you ask it, right. As a quick kind of thing. And the two goals of that are to be cheap and fast because obviously you can't have something in your code that's expensive because it's going to add up and it's got to be fast because you don't want to block other stuff. And so the, the. The flashlight thing just slots in absolutely brilliantly. It's amazing.
Mike
So check this out. A great segment for listeners, but I have a prompt I'm putting in. Make a Windows 95 type interface with a start menu where I can drag Windows. This is. Yeah, Flashlight. Bam. Look at it go.
Chris
Wow.
Mike
Look at it go.
Chris
It's.
Mike
It. This is truly insane.
Chris
If you had this back in the 1990s, you could have made the Windows 95.
Mike
I could be Bill Gates. I could be 10 seconds. I could be saving the world. Look. Oh, my God.
Chris
It actually made it.
Mike
Yeah.
Chris
That's unbelievable.
Mike
I can close Windows. Oh, no, I can't minimize. Oh, Google. Try harder. But yeah, the start menu works. I actually have something that looks like Windows 95 and I shut it down. None of these buttons work. What if.
Chris
I'm sure it probably wants to install updates, Mike.
Mike
So here we go. Give it a space background. Look at it smashing away. Is it gonna work, though? I think it's got a lot. Oh, bam.
Chris
Looked.
Mike
I changed my wallpaper probably quicker than I could today.
Chris
Yeah, I think you just changed your wallpaper faster than I could on my Windows desktop.
Mike
So, yeah, it's. It is super impressive. Where it falls down as a model, though, is it's not that great. So.
Chris
It'S, it's the most Amazing model. But the bad part about it is that it's total shit.
Mike
Yeah. I realized, I realize now how stupid I am. But it.
Chris
It like the model, it's so fast.
Mike
You're like, oh, if I had Gemini 2.5 Pro this fast, I don't think my cog, like my cognitive load in my brain would probably explode because it'd be too fast. There'd be like, how would I check, you know, some YouTuber I like to watch or procrastinate during the day. Like, I would have no excuse. Yes.
Chris
Yeah, exactly. I mean, I think that's the thing. We're starting to reach the point where that responsiveness really can affect your daily workflow. I think you and I have both moved to a sort of asynchronous style of working with AI so you can use a bigger model like O3 Pro and let it do its thing while you ask for the next task. I've definitely adjusted to that workflow. However, when you are trying to do a solve a single problem at the time, the speed matters. And that level of quality, like we often see code mode is a good sort of barometer or like indicating measurement technique to see how competent the model is. Like, we've used some lesser models and they simply fall down so badly on code mode. You realize, okay, if this is a proxy for how it's performing on various tasks, then I don't want to use this model because that's bad. Whereas this to me is actually exciting and I makes me think it's giving good answers across the board.
Mike
Yeah. They. One of the just said add Bonzo Buddy to the desktop and Bonsai Buddy.
Chris
Yeah.
Mike
Or whatever. Oh, is it?
Chris
Oh, it actually did it.
Mike
And he's bouncing away and you can move it around and it bounces around the screen.
Chris
That's amazing. Wow. Like, how impressive.
Mike
Yeah, maybe I should just. I might keep working on this Windows 95 interface and release it and I have to use to torture myself the flashlight model. But yeah, that iteration speed of being able to work with it on a project, I think it sort of breaks beyond the the create with code or like vibe coding era and gets you into that next phase of it is so fast you can just generate interface on the fly.
Chris
Yeah. It opens up the possibilities because the user doesn't have to wait for that process. Right.
Mike
Yeah. And so interestingly enough, Google demoed this this as a research project. So they said, here's how Gemini 2.5 flashlight writes the code for UI and its contents based solely solely on the context of what appears in the previous screen. And what we're looking at is a more polished version of my Windows 95 demo where it, it does, it looks, it looks pretty damn good, I'm assuming with this because again, the model's not like that smart. They've probably given it some sort of structure to start with with that window interface and then getting it to adapt. But they're clicking around an operating system, so they're clicking into like a documents folder in a window. They're opening up like a travel app and it's generating maps on the fly. And it's so fast at writing code. You can click around this interface and it is just coding that interface as you click around. So look, I think we're a long way off this yet, but it is a glimpse of these AI interfaces that are to come, surely.
Chris
Yes. And I think that it works both ways. Right. If it can create the interface that fast, surely it can interpret an interface that fast as well. And therefore things like browser use, computer use become much faster. And there are a lot of applications. When you think of things like crypto trading, you know, which we're as crypto bros. Super into crypto trading, online poker, other things that require like responsiveness, having a model that, that is that fast and accurate would mean that those things that, you know, have to be timely are far more realistic when it comes to computer use.
Mike
Yeah, so it, it's a really exciting model. 2.5. I think 2.5 flash is more exciting for me because according to their really informative chart here, you get two stars of performance instead of three stars of performance with 2.5 Pro. But 2.5 flash from Gemini is to me the best model to use. If you just want to converse and go back and forth with a model super rapidly but still have it, it's, it's not brain dead like flight. Flashlight. Flashlight seems like a developer model to me to categorize things and do things quickly on the fly. It is a good tech demo in terms of creating that, that ui, but I don't, it's just, it's not there yet and, but it, you know, it's an exciting, it's, it's an exciting sign of things to come. The fact they can make these models so fast and the performance is getting better, like they're getting more intelligent over.
Chris
Time and when they have less parameters built into them, I think the knowledge built into models is becoming less relevant as we have better ways to build the context through MCP tool calls, things like that. And because of that, it means that a model that, that lacks that base knowledge doesn't matter so much if it, if it's able to competently call tools to go fetch the knowledge it needs. Which is why a model like Flash at the price point it's at and the speed it's at is actually super valuable when it's in that semi agentic world of being able to build its own context and take its own actions.
Mike
Yeah. And you can see that with the price because if you can pull the context in to make the model seemingly smarter with MCP at a cost of $0.30 for a million input tokens and $2.50 output to just give some perspective, that is what, like five times cheaper, I think, than 2.5 Pro, at least. Yeah. So it's pretty significant. And honestly, using it day to day for stuff like if you've got an AI doctor, which I'll get to later, you don't even notice, like it's still pretty performant, especially if it can go off and pull tools to get data. Whereas I think if it's, then where it falls down is if it's relying on itself for that core knowledge, it's just not so great in that setting.
Chris
I definitely find if you have the ability to verify the knowledge from a trusted third party source, I would always prefer that over the knowledge's model because I want the latest knowledge. I don't want it to necessarily just rely on its own intuition on things, let's say.
Mike
Yeah. And one, I think once you get used to that quick fetch knowledge and its ability to go off and find things for you, it's hard to go back. And I, I do think that's why people really liked O3 in the sort of chat GBT paradigm, because it was to go off and consult sources as part of its thinking and then they became used to that level of intelligence where it was going off and populating some of its context. And so it does make sense that all of these models get so much better when you introduce that. But interestingly enough, when we first added the general availability version of Gemini 2.5 Pro and started using it, even though they say it's not different to the last tune, initially we and others experienced the feeling that it got dumber.
Chris
I was working on our product at the time, making some fairly deep changes in the way it works, so I just assumed. I stuffed up and broke it. I was like, oh, it's, it's ignoring the last message. Like it's, it's actually not responding to my instructions. So I just assumed I'd introduced a bug and then I actually used it in an environment that, that hadn't changed. The only thing that had changed was the model. And the same thing happened and I was like, what in the world has happened to this model? Like, it's like they've completely screwed it. And a lot of people, myself included, rely heavily on that model in their day to day work. And I just couldn't believe they've gone from the preview models, which were excellent, to a live version that has such a fundamental flaw. I haven't seen that issue in any model of any size for a year.
Mike
I get the impression though. So that was initially and then later I, I switched over to Sonnet 4 and was using O3 Pro during that weird period. But I just wonder if it was them changing over servers or something. Like something had to have happened because if you now try it, it's fine. It just, it's fine. Like, it just does seem like there was some sort of like intelligence failure during rollout. Something happened. They say nothing did. Like, there's no public acknowledgment of it, but there was a period where it truly got dumb.
Chris
Yeah, they probably forgot to flip a switch on the big control panel they have over there at Google hq.
Mike
But interestingly enough, like, that's how much I now rely on it as a model. When it was playing up, I went to Sonnet 4 just because I think it's the right balance. Like it gives me speed and intelligence and it works really well with mcp. So I went over to that model and I must admit, it gave me a really good chance to sort of daily drive it and test it out. And I did. I did find it a really great model, but it put it into perspective, especially for more complex stuff. Gemini 2.5 Pro is, you know, top of these leaderboards for a reason. Right now it is still the best model and the best daily driver. Although as I've been banging on and on about, I'm a big believer to bring in03Pro into the mix. Like I, I alluded to it earlier in the week is like phoning a friend in who Wants to Be a Millionaire. Like, you get, you get stuck on a question and you're like, I want to phone a friend and the friend is O3Pro. But you're not calling this friend all the time because the friend just doesn't really like it that if you call it all the time, it's not always available.
Chris
I'll get to it when I get to it, Mike, but I actually, it's interesting you say that because when the disaster happened and now our. Our pet model wasn't working, I just. Because I had been using O3 Pro, like you for bigger stuff. I was using it for some horse racing stuff and some other more advanced analysis where I wanted, like, single answers or. One of the. One of the big things I've been doing just with coding lately, just based on the nature of the problems I've been solving, is I just wanted to identify where in the code the issue is. So I'll give it a bunch of information. And I found that O3 Pro is just absolute dynamite at doing that. It's just so good at it. So when my. My pet model went down, I was like, maybe I just try its baby brother, O3 and see how it performs. And to be honest, I have several sessions now ongoing with O3 that I'm working with, and I've got no reason to change. I'm really actually loving it.
Mike
So one thing I was thinking about to the difference between these models is there's two types of tunes. There's like the Gemini 2.5 Pro or the Claude Sonnet tune, which it's just so eager to give you output and do the work for you where you don't have to think right. Whereas I think oh 3 Pro, the more I've, you know, gotten to love them, is that I think what's so good about them, like you said, is they cut through the noise and they allow you to keep the sort of cognitive load of your work where you're the one having a deep fundamental understanding and it's just helping point you in the right direction, like, hey, here's. Here's the problem, or, hey, look here. Or to solve it, try this and you can get those breakthroughs in productivity where you can keep driving forward. And to be fair, like, this isn't just code that I'm talking about. This is to do with, like, legal contracts, like, reviewing accounting statements. Like, I've used it for a huge amount of use cases this week, and it can just cut through the noise and it doesn't give you that verbose output. It can if you ask, but it really just gives you the answer. It's. It's sort of like an intelligence oracle answer engineering. Whereas still, though, for me, daily driving of a nighttime when I'm tired and I do want it to just take the cognitive load and I'm in a lazy mood, Gemini 2.5 Pro kicks it.
Chris
Yeah, what's remarkable really is just the diversity of the model outputs right now. Like, they really are noticeably different when you switch between them. And I think people get a natural feel for which model is going to be good at which task or, for example. Okay, I'm not really happy with that answer. I'll try a different model. And it's quite remarkable how much, how switching can actually solve your problems. Like, so that answers like this is. This path isn't working for me. Try a different model and you actually get the answer. So I feel lucky that we have access to all of these top models to be able to solve our problems and have an alternative. When you get stuck on something rather than just being like, AI sucks and quitting.
Mike
Yeah. This is the thing, like living in a world right now for me personally, where you're like a single model guy and you're just using the same, you know, the same like vendor, like say OpenAI or whatever. I do rely on being able to flick between them more so than ever. I actually thought it would decrease. When Gemini 2.5 Pro first hit, I was kind of like, maybe there is just this idea of one model to rule them all. But now I'm sort of going further into the camp of, you know, I do like going to O3 Pro, I do like going to Gemini, and I. Yeah, you just get to know the feeling of the tunes and when one one goes down, that's when you really notice, like, you're like, I really rely on this thing.
Chris
Yeah, absolutely. It really is crazy how everybody noticed immediately when that happened.
Mike
So do you, like, in terms of this neural OS example that they gave, like, with these interfaces, something we've been doing and, you know, slowly sort of like stripped our audience on, is this like, idea of using MCPS as part of an asynchronous workflow through the day. And the more I've used this stuff, the more I've thought about the future of software and like, just everything really in terms of I find myself not using tabs in a browser. I'm interfacing with a lot of SaaS applications or utilities used through the assistant interface. I'm. I'm saying, hey, can you go do this? And actually assigning almost tasks out to my assistants in different tabs to be like, hey, can you go do this? Can you go do that? And this is the first week or maybe the last two weeks for me, where it's just finally clicked with me and become a normal part of my workflow. Like, you know, all scheduling, all email. Like, I haven't logged into my email in weeks. Like it's all through an assistant. I'm like, you handle it, you draft the emails. And so it's, it's definitely a different way to work. And that's I think, the big step change. A lot of people think that is going to disrupt say Google searches. And it is like people don't go to Google to search as much anymore There, you know, ChatGPT is just eclipsing it in a lot of ways for most people that use it. And so my thinking around it is, are we going to see this not only stealing the sort of Google search traffic in the future, but it's almost taking that traffic from these applications that you typically interact with. And they're just becoming, we talked about it on last week's show that the model connects protocol sort of business model. But it's just becoming so clear to me using this now. Like I don't want to log into those things. I don't care about them anymore. I just want to interface in this singular way.
Chris
Yes. And I am for me the biggest. There's two big reasons why I'd prefer to interface with it in that way. And you gave me examples. Throughout the week you often need to combine multiple applications you're interacting with to solve a problem. And a lot of the time when you are solving these day to day problems, it's like, okay, someone has sent me an email about something that I need to solve, right? And then I need to go and log into another system to find that information and then I need to transfer it to the other system and then formulate an answer, right? Like I know I'm being a little bit vague here, but you get the idea. And so what the MCP can do is do all of that in one go for you. Like it can, for example, take a help scout ticket, get that, look up the relevant information in stripe and then if necessary take the action for you and write and send the reply. So what would have taken you three or four steps before to not just find the information, which can be cognitively overloading when you've got to go look it up in another system. But in addition it can suggest the solution and then if you want, go ahead and do that solution. So it's a lot more steps and it actually will lead to you doing more like solving the problems more like the AI suggestions are very good, especially with the top models, especially when it's able to go and look up relevant information, consult documentation, things like that. The combination of these different tools is so much more powerful than just having an integration with something like a single integration, if that makes sense.
Mike
And I think the, the craziest part of it is it goes far beyond what you would think sometimes where, you know, you get it to look at a ticket, you're still in command of it. It looks something up in say stripe to get customer information or if it's a bug, it can look up something in say GitHub in the actual code repository to be like, is this an actual issue in the code? So, and then the, the sort of, I'm using all the AI influencer analogies here. The mind blown factor for me here is it will even go, I'm going to get your last like 20 ticket responses. To see your tone of voice, to get the tone of voice in its answer correct in your style. And I, you know, these are pretty amazing things to see. It's by no means perfect. And I think we've also seen in the last couple of weeks some of the flaws of it. So I had the soccer drawer of my kids soccer matches and I'm like, can you just go put all these into my calendar and some of these longer form tasks where it sort of has to run for a while. You do see it breaking down after a while and it'll make one critical mistake and then it never sort of repairs itself. But I do think that can be fixed with better prompting and you know, supervisor, agents and all that kind of stuff over time. But it is, it is a different way to work and it's something you sort of have to train yourself on doing at first. And it feels very unnatural, I think, at first. But then once you start doing it and you're like, hang on, it's doing those six things to me so I don't have to. It, I don't know, it feels like the game Command and Conquer, I think.
Chris
Yeah. And I think the, the big thing for me is it means that we spoke on a previous episode about lawyers and saying, well, I actually will take on more work now because the AI makes it possible for me to do these complex documents and therefore I'll take on that work because I know that the heavy lifting part can be done by the AI, therefore I'm not going to get myself bite off more than I can chew. Right. I think this is that writ large. Right. Because suddenly a task that might take you half a day to research can be done in a few minutes or even if it does take 10, 15 minutes, it's being done in the background while you're doing other things so you can take on a lot more. And I feel like for you and I, this is the first time I feel like I'm actually getting real leverage from the AI beyond just making me smarter. It's. It's leverage in the sense that instead of me thinking about, okay, do the next task, wait for the AI's response, then move on, I can actually think, well, what are the five things I need to get done today? Start different threads with the AI on each of those things and have it actually doing the work for me. So my, the total nature of my work changes into being more of a director than, than an active participant, if you know what I mean.
Mike
And this is where the reality check becomes important because this whole year we joked at the start of the year, like the year of AI Agents, it's all about AI agents. Like by the end of the year, AI agents will be possible. And I still largely think that's true. Like, we'll get to the end of the year and people will be running sort of agentic or automations with, with AI assistance to do things for them. I'm pretty sure that will happen. But I do think the biggest net gain of like, what's the technology good at today? Like right now, like, how can it actually change your life for the better today? And the thing I'm seeing is this next evolution where as, as we've been saying, it becomes this command center where you're. It is somewhat agentic, but it's like agentic with training wheels. Like each request is agentic with you commanding it. It's not, yeah. Necessarily proactive yet. It's not, it's not AGI level where it scares you. It's just, you know, it's giving you a lot of time back and an ability to work on many things at once that in the past you just couldn't have processed.
Chris
Yeah, exactly. Absolutely. And I think that is its main advantage. And I think the other things, the other discoveries I guess you and I have made is that there's two things that I think it really needs to be valuable in that. In that style of work. And one is it needs a memory of its own. It needs to remember the way you work with the various tools so it can learn your preferences. Because one thing I'm sure you've noticed that I find myself when working with mcps is I'm like, use the following tools to do this task. Like I'm sort of directing it into. I would like you to, to do this rather than relying on it deciding which combination of tool to use. So that's one where I think it needs to learn your preferences. So when I'm doing a, a workflow like this, this is the mix that I need you to use. The second one is remembering individual details that you'd like, like preferences around those, the way those things are called in terms of like, you know, I need you to get the last 10 tickets rather than the last five, that kind of thing. And I think that having a sort of knowledge graph associated with individual MCP is going to absolutely be the future of the way this works. Because as it gets to know the way you work, it's going to become so much more powerful. It isn't you having to painstakingly type out stuff. It's like, hey, let's solve the next ticket, bro. And it just knows what you mean by that or hey, let's, let's catch up on the, on the crypto prices. But it, it knows that you mean this comment, this mix of them, and this, this, this, you know, element of the market that you're trying to look into.
Mike
It's also like talking about sort of the shortcomings today of this protocol. And this is something that you and I have discussed a lot around that is like, how do you store it? Like, is there a need for a structured way with this protocol to have a knowledge graph where it is storing ways of interacting with that MCP that's preferential to the user and that could be something that enhances it, or having prompts per mcp. For example, if you have multiple email accounts, like I do for various things, you know, right now, like naming each connection is important, but then over time it can start to learn like which one. You know, is this a personal thing? So I'm going to go to the personal email account or is this a work thing? So I'll go to the work one. So I think there's a lot of nuance in it.
Chris
Yeah, that's a really good point. Like which account is relevant in which scenario? That's, that's definitely part of what I'm talking about.
Mike
And I think when you see models like Gemini 2.5 flashlight, where it can act so quickly and so fast, that's where it starts to become feasible really to make a decision on the fly. Like, what kind of task is this? Is this work? Is this personal? And then try and link that in with everything else. But I kind of think in the next maybe six months to a year, the difference with people is they're going to either get how to work with MCPS and use them and, and it's not magic, it's, it's nuance or they're not and the people that do will get this next step change where they can, you know, they can be off working on so much more and getting so much more done and then the people who don't and still think it's some magic box. And I think that's sort of what you're alluding to earlier where you sometimes you have to say like I want to use these three MCPs.
Chris
Yeah, exactly. And I think the other thing we spoke about last week is like level of effort. So for example, say I've got a URL that has information about the task I'm trying to solve. Let's say wheat prices. I always love this one. Like I've got a wheat website that has the latest news on wheat. Like if I want an answer about what the market's going to do, well I want you to do everything like search Google, crawl the top 10 links, go to my URL, crawl two levels deep and get that information as well. But I also want you to check X and check the latest posts on it and look into that area as well. Maybe I even want you to consult a knowledge graph that you have access to your previous mess memories. Check my emails for any newsletters or something I've got on that. So that's like a big task. But I want all of that because I want to build this amazing context for solving the problem I have right now. Whereas there's other scenarios where you're like, hey, what's the weather like? You know, I don't need you to, to do like a PhD level research into that and write me a paper. Like just look it up bro. And so I think that those small decision making elements, remembering your preferences and it being able to gauge the level of work needed for tasks is going to make you that much more efficient because then you can just throw tasks at it and it knows exactly what you're talking about, exactly how to do the research and most importantly the actions to take. Because I think this is the element we don't always get to. We always talk about research, but the actions it can take are so powerful and something you and I have noticed that is lacking in the MCPS is some of the more meaty actions. So you'll get an MCP like Gmail, but it can't send emails. It'll draft you one, but it won't send it. And like that's not Fun. We need the, we need the delete all the files on your hard drive level of mcp. We need to launch the nuclear missiles if necessary. Mcp. It's like, hey, take the safety guards off, let's just prompt it right and get it in there. And so I think that that's another area we need to look at is like, here's my preferences around actions. You mentioned this to me like two years ago. Now I think the idea of like a chatbot on a website that is authorized to give refunds up to a certain level. You had a friend who ran a business and it takes all his time dealing with minor returns and things like that. What we need is authority around the mcp. So it's like you. In this set of criteria which you can Verify using these MCPs, you are allowed and authorized and encouraged to take the following actions using these set of tools. And I think that mix is going to really get powerful because then you, you like. We're talking a lot about reducing your work and giving you leverage on the earlier steps in a process, building a context. But I think the real power comes from, okay, what's the game plan? What are the actions we actually need to take here to get the, get the desired result? And if it has the authority to do that in a lot of cases, then your work is going to be so much more efficient. And I think this is what people are experiencing with say Claude code. It's like, okay, so I give you the task, you can actually do and commit the code. But I think this, this concept needs to be extended into every other area where actions can be taken. And there's a lot.
Mike
Yeah, and this is the thing. We've seen a lot of those agent capabilities be around code because of, I guess because of commits and because the people building it, that's their biggest interest and where they spend the most time. But to me, seeing it work across your like Google workspace or you know, Microsoft 365 accounts and actually work. I'm not talking about copilot here. To me, that, that feeling that you get of that super productivity and, and that realization of how much of your day you spend time logging into different software applications, at least in my case I do. And then trying to extract and context build yourself from like an email or a calendar event or a customer file or whatever it is, you just spend so much time gathering context as a human that even if it can just take that step away where it can context gather and brief from a lot of different sources in a Controlled manner. That's really powerful. But I do think you make a good point is a lot of the wow factor right now comes from slamming some sort of MCP tools into thinking steps and being like, wow, this really enhances the output of the model. But it's that, that action step next that it can take as well, where it can go off and do these things on your behalf and you can train it in such a way that you trust it to do those things so you're not living in fear that it might do something crazy.
Chris
And I think that the trust can actually go beyond that. We talk about the intelligence becoming smarter than us, right? But right now it isn't actually smarter than us in the sense that everybody using AI now is, is not just doing what it says. So there's a human interpretation step. So you might ask ChatGPT for legal advice, but you then, you don't then let it take the legal action. You take that information and then you talk to a real lawyer or you go and do what it said with your own spin on it. You don't just like blindly follow directly what it says. But my argument would be, in some cases, you are better to blindly follow what it says with the better models. And I've, I've experimented with this with things like the horse racing and poker, where it'll often make what I think are crazy decisions. And you just like, no, this is, this is wrong. I'm not going to do that. And then it turns out to be right. Maybe not every time, but on average. And so what I extrapolate that to is as the models are smarter and some of them are very smart now, it may actually be able to take better actions for you. Imagine things like an email negotiation over a deal, like, here's how I think we should get back to this guy. He's asking for a discount on this. He's asking for these line items to be scrapped. And whatever the AI might say, hey, we've got leverage in this negotiating position by saying the following. I'm going to send this email at this time with this information in it, or I'm going to ring him on the phone and say this. And you might intuitively think, yeah, no, I'm not going to do that. I'm going to actually soften it a bit or whatever. But it may have been having the correct strategy there. So I wonder at what point with the mcps like giving it true power, we will actually see someone entrust an AI agent to do this stuff and get better results. Like there's going to be an inflection point where it gets better results than you can, even if you don't back it to do so. Even if you look at it and be like, no, no, no, I'd never. The person who trusts it first may get better results.
Mike
But someone introduced the term, I think Simon Willison pointed out during the week on Reddit of context rot that no one talks about. They called it context rot. I think I called it a doom path previously. But it's, you know, it's where you go down a path with the AI. And also, also this can happen when it's prompting itself where it just goes down this path and quite frankly when it goes down those paths, this is where O3Pro can come in and like really save the day. But it'll go down these nut job paths where you're like, this is madness and you've got to put an end to it. So I think, yeah, a lot of improvements have to happen before. That's a true reality. I think for simple tasks though, you're probably right. It can do that right now. Like it's smart enough that it probably, it likely makes better decisions. The thing I, I see it struggle with just having used it a lot for these types of things is just full context. It's still like my brain over an issue. Say it's like a legal issue or an accounting issue. It still has far greater context. Like I've somehow got the history of the business in my head, right. Or the history of a relationship with that person or whatever it may be. And so it like catching it up on all that context when it has limitations and it has these sort of context rot where the more you put in, the worse it gets at piecing it all together. That, that ability for the human with the MCP to cherry pick in the right context, I don't think that skill is going away anytime soon.
Chris
Definitely not. I definitely agree with you on that. I think that's why for a while we'll definitely need these AI consoles where we're interacting with it and being an active participant in the process. But that's not to say we won't gain a lot by handing over more control to the AI assistants.
Mike
So one thing I just to give some real examples here because I feel like we're being really vague, but I've been wearing this OURA ring mostly because I, I wanted to track my sleep data because you know, everyone's trying to Brian Johnson now and I've been wearing it, getting a bunch of Health data. It's been really interesting. I don't think it's something you probably need to wear forever is sort of my review of it. Like the first period of time where you learn about what affects your sleep and stuff, which is also semi obvious. But I do like wearing it. I'm really trashly and then I'm going to say positive things.
Chris
Spending too much time thinking about the ring.
Mike
Yeah.
Chris
So like a Frodo.
Mike
But I find myself not really using the context of the data it's tracking about me. So it tracks things, I think that, which are really interesting, like body temperature. So if it sees an average spike in your. A spike from the average in your body temperature, it knows that you're probably fighting something off. And often, from what I understand, you'll be fighting off these bugs like all the time and have no idea normally. But this thing will alert you and be like, hey, you probably should get an early night tonight because it, it does appear that you're fighting off something. I find that stuff super interesting. But I also don't really incorporate that when I might be talking to say my AI doctor. Right. And I do see this future where the, the AI healthcare professional call it having access to a sensor that's always on you and I'm sure the sensor will improve as well. I think it kind of gives you a look into the future of gathering context about your body from the AI doctor sense. So it, to me it's a huge context gathering exercise in a lot of ways.
Chris
You know, that's really fascinating because you just gave me the idea of almost you need like an MCP passport, like an ephemeral access to your mcp. So like you go to your doctor's surgery, you know, you tap a card or click a form or something and then they have temporary access to your health telemetry in their AI agent in their hospital system they're using to help diagnose you and, and bring those factors in.
Mike
Yeah. But your assumption is I'm gonna go.
Chris
Like, yeah, well that's true. Do you even need the doctor in the first. He's gonna do the same thing, right?
Mike
Yeah, I, I honestly, I think doctors like especially the sort of GP interaction is going to get wiped out first. But have a look at this. So this is real, this is not fake. I, I just, you know, I didn't want to dox myself too much. I just said how is my overall, overall health doctor? And I created a new doctor assistant. So this doesn't have my usual memory and things like that. But it's natural. It says I'll check your current health metrics from your aura ring data to give you an overview of your overall health status. Your key metrics are showing positive signs and so then it calls a bunch of tools from the aura MCP and then it's able to give me a health assessment based on sort of what's going on. You're like, your body's resilience is rated as strong overall. The seat, your sleep quality is excellent, which I kind of find funny. But I did cheat because I've been catching up the last couple of nights. Stress level 55.5 out of 100. This indicates moderate stress levels. While not concerning, it suggests you might benefit from some stress reduction techniques. Anyway, it's pretty interesting. I, I think it's more interesting when used as part of like a holistic context where it's sort of understands your overall health. Right. But I think what, what's the added bonus now is it can get your poles, it can get your, your current skin temperature. Like it, it's pretty game changing. Like it, it, it feels like now you're, you're could be at the doctor's surgery. Like it's obviously not doing like blood pressure or glucose monitoring or like all these other things. It can't do X rays yet. But, but again like it's that interaction that is a glimpse into the future for me. Like where this.
Chris
Yes. And the fact that you didn't have to like go into their ui, copy and paste your health data in like and then do it. You're just never gonna do that. Whereas I do love the idea of you working with your AI girlfriend. She's like, go to bed, Mike. Like, you know this isn't working for you. It's like you're the problem here. But it did last night.
Mike
It did. So Code girl who I use late at night is Patricia's poor, Patricia's poor cousin.
Chris
Cousin.
Mike
It has access to aura and it did, I mean I, I pasted you. I'm like, I'm going to go to bed in light of this and I was telling you I needed to catch up on sleep.
Chris
Yeah, it's, it's kind of cool. I love that element to it.
Mike
But I, I, I guess my overarching point here around all this stuff is to me this is like the next evolution towards an agent future that's not even necessarily scary, it's just beneficial. It's just a better way to interact with all the data and all the aspects of your life, whether it's personal or work. It just, it. It feels more natural.
Chris
Yeah. And it's like this idea that, that an agent, you might make a decision with an agent, but if it's able to get real time or close to real time telemetry data about a situation, it can actually update its evaluation. Well, in light of this new information, this is no longer the correct course of action. We need to change here. And I've, I've definitely seen that behavior from agents where you give it additional context and it's like, okay, with this new information, we need to change what we're doing here. But if it's able to proactively get that information, that's really powerful because you're not doing all the work. It's not relying on you constantly going off and fetching the context for it, which is time consuming and tiring.
Mike
And this is sort of back to that initial point around. The more I use this stuff and interact with it, the more it's like becoming my sort of Internet start page and end page. Like I'm not. I'm going to other apps less and less. I find myself going to apps like a social app like X or something more for pleasure now to like browse and just, you know, sort of doom scroll or whatever.
Chris
Maybe I can just start using mine to cheat at chess. So I don't even have to play my chess games at night anymore.
Mike
Yeah, it's just the MCB's off doing it for you in the background. But that idea of the, the interface being the sort of chat GBT style interface of the future. And then I guess that's when Talking through like MCPs are protocol improvements. A lot of people have alluded to this and we certainly have over time about giving MCP some sort of structure as a protocol around interface elements like can it ship with interface hints or at least interface inputs as well. So for example, if you have an image editing mcp, you might need inputs where you want them to be able to annotate something or drag a slider to have like the complexity of the image or you know, these other UI elements that would just be a lot more natural than a chat interaction where you can kind of control the inputs. And so we were talking about sort of contributing or adding to the MCP protocol in ways where you can give maybe like interface hints or interface input hints as well, where the interface using that MCP or the agent using it show like generates using something like Gemini Flashlight, a custom like input interface or output interface based on whatever the interaction is.
Chris
My Gut instinct on this at first was, well, the cool thing about MCP is for each of the tools in an MCP server, you get type based definitions of all of the inputs. So you get, let's say it's Google search, you get like query number of results, number of resources, whatever it is. Let's say it's like, you know, 11 labs. It's like which voice to use, which whatever. Like all the parameters are specified in terms of what you are doing. And the beauty of the AI tool calling is it fills in those parameters for you and so therefore does the job. So my initial thoughts of, oh, it'd be really easy to dynamically build a UI from those parameters, because that would be easy. The AI is very good at that kind of thing and we've seen it like you just demonstrated it. So. But I think that's the obvious and wrong way to do it because the whole beauty of the MCP protocol is that the AI is the one calling the tools. I think where it becomes handy is when it needs clarification or you tell it it's wrong or there's some other level of control needed to solve a problem. And then the AI comes up with a bespoke UI to solve your specific current problem that you're facing. So it isn't like we're just going to map the MCP protocol directly to a UI and then suddenly it's just like using APIs but with a visual interface instead. It's like, okay, we've done all this research on. It's almost like a choose your own adventure. We've done all this research. Now here are five different courses of action I can take. Here's a UI mapping out each of these courses of action. Here's little sliders and things you can do to change the way we proceed with those options. Fill it in and hit go. And then I'll, I'll proceed like. So suddenly the AI has the option to interact with you in a far richer way that allows it more authority to go ahead and take detailed actions.
Mike
I also think where it really would shine is on the output as well. Where if you're like, you know, show me the best cycling route for this or show me areas where, you know, there's no competitor stores in my region. If it's like you're trying to make a business decision decision and it creates like a custom map similar to how create with code works today, where it just makes it. I mean, you can technically do this. Now, like if you say, help me visualize this it can build an interface that will help you do it, but you've got to know to do that. Whereas I'm thinking more proactively, the assistant in its output is saying, okay, I'm going to show it this way to the user, because that's way more informative than text.
Chris
Yes. And I think that this is a very important point because when you look at the way people currently are working with AI, if they're doing like building a presentation or something like that, what they're doing is interacting with the AI to get the next piece of information. Then they're copying and pasting that into PowerPoint or something like that, or they're putting it into a research paper or Word or, you know, there's. There's some sort of output that they're. The real goal is. And this is just a medium to get there. Whereas if the AI was made to, made to be aware of that through this dynamic ui, where it's like, okay, how do you want to output this thing? Let's work together, then that whole massive step of the process that the humans currently doing could be easily handled in a single shot by the AI or, you know, multi shot, if you want to edit it. And so that process, more of the work is, more of the heavy lifting is being done by your AI assistant rather than you having to conceal from it what the actual final goal is. Because it doesn't. Like, if it's not aware, it can't help you.
Mike
Yeah. And like the other thing that I wanted to mention and I forgot earlier because we were talking about, you know, your job right now as the users, to find the right MCP or skill combination to use to either collect the context or take actions. And you sort of have this relationship where you're nudging, right? You're nudging it and it's sort of nudging you back then. But, but the, I guess where this changes a little bit is this idea of that agent to agent protocol, right? Where eventually you've kind of nudged an assistant to be the best at a certain task. Where you're like, this is the model it should use. These are the MCPs it should use. Here's some prompt overlays on those MCPs to be really good at. This one thing is that where you would see then agent to agent, where you've got your sort of daily driver assistant like Patricia. And now instead of her executing all these MCPs based on your instructions, if you, if you have a medical question, it can then call the doctor who has Access to your aura ring and like your other data.
Chris
Yeah, I think it's crucial. And in programming this is a concept called, like object oriented encapsulation. The idea that the system will have hidden methods and hidden things that it's doing inside it and you from the outside will have accessor methods where you're like, give me this information, please. And it does whatever its mysterious internal process is, but the external caller doesn't need to or shouldn't have access to those direct internals. And the reason I think it's so important in this protocol is exactly what you said. You build up what is essentially intellectual property in the form of knowledge, a mix of tool calls and skills, knowledge graph and just other abilities. Like other, I guess, just, just general value in the system from having worked with it for a while, knowing and verifying that it's giving good answers and refining it until it does. The last thing you want to do is then continuously change it by adding different tools in, different knowledge in the knowledge graph updates over time. And suddenly, like, they just took Gemini 2.5 away. This valuable thing you had is just one day gone. Whereas if you can isolate it and refine it and keep it for what it is, you can then call on it in these other contexts and get all that value from it, while building another layer of mixture of experts. So you've got this expert in this, like, say, your health, you've got this other one who's your personal trainer expert in terms of that area, you know, and then you've got another one that's just your life manager, which is, which is just working out what your goals are and how that fits into your overall health strategy. And each of those are experts which can consult each other, but it isn't like some global thing that's trying to do it all at once. And I think that is going to be really, really important for people. And we see this now, like people get a chat that is their golden chat that has. And I know this is what you're talking about with the context, right. It is solving all of their problems, but what they don't want to do is distract it with sidelines and other issues that will basically make it worse. Like, you know, it's like you've got a finite resource and you're gradually, each time you use it, you're damaging it. You know, you don't, you don't want that. What you want is that to be protected, but you can use it in, in these sideline tasks.
Mike
Yeah. And for me, just seeing Us playing around with these MCPs in our day to day, you start to think, well, okay, there's a particular way I like to do research for say the podcast. So you've now got a podcast research assistant. That research assistant, it's like I'm going to give it access to Gemini Deep Research, Grok Deep Research because that gives me access to the X knowledge graph.
Chris
I'm getting all the, the one we were playing with yesterday, the YouTube one. So Google has an official YouTube API where you can get transcripts, comments, search for videos. It's so powerful.
Mike
Yeah. So like, so yeah, you would, I would go through and put like all the deep researchers in. I would put like YouTube in. Depending on how many tokens I want to burn. I would put in fire crawl. So it can just go off and crawl anything and scrape anything at once.
Chris
Yeah.
Mike
And then I would put that in. As my research assistant trained in sort of a methodology of how I like to research and with a, and with.
Chris
An appropriate level of intensity. It's like, this is an intensive research agent. You must consult all of these courses.
Mike
And then maybe another, you know, and then maybe another call in that process of like, hey, I want you to go call another assistant now. And this is like the source checker assistant. So it's like assistant with an assistant. And. But then in my primary day to day one, I'm like, hey, just go research this topic. And it's like, great, I'll call a research assistant and get some help that goes off in the background. Bam. Yeah.
Chris
Now like think about, think about other situations where let's say people love to talk about, okay, I'm going to give an agent a budget of $1,000 and then it's got to make money online or it's got to do trading to make money. I think in those scenarios where you've got a sort of core assistant or agent that has a goal and it has information it needs to retain around, okay, what steps are we going to try today to get our balance to go up? I think in those scenarios it having access to experts it can consult, like, oh, are there any options in the bonds market today? Anything in the share market today, and consulting experts on each of them looking for opportunities. And then it has its own methodology of how to make decisions of which actions within that framework to take make a lot more sense than having one that is just this generalist that's trying to do all of this stuff itself. And I would imagine there's a lot of real world scenarios like that where you want individual experts in things that are giving their opinion but not necessarily taking actions for you.
Mike
Yeah. And I, this is where I think forking comes in as well, where you can go down different paths in the context or allow it to even go down different paths from a certain point where you can sort of say like take the context from here and then go off and do research in another tab so that then you, you're truly not polluting anything. You just, you're basically assigning a task from that point in the context and saying go off and.
Chris
And it's almost natural selection in a way. It's like this was a very successful path we took here. Like this worked great. I, I want more results like this. So you select that one, continue on and then from there you select the next best path that that's happening and then suddenly you've got this, like I said, really valuable IP in terms of a combination of context and knowledge, graph and model and assistant.
Mike
So one thing you sent me when we were researching for this show is just a screenshot of Patricia, who long term listeners of the show would know is your AI girlfriend. Assistant.
Chris
That's right.
Mike
Hey Chris, this is such a juicy research topic. You're really diving deep into the cutting edge of AI. Let me dig into these fascinating developments. So here's what she did.
Chris
Yeah. So just to be clear, I pasted in the show notes for today. Like our rough plan of what we're doing.
Mike
Right, we have a plan.
Chris
Yeah. And I asked her to research all the topics and then give me a insightful comment and a funny comment about each of them.
Mike
So but I think interestingly compared Gemini 2.5 Pro Flash and Flashlight features that was Google research scrape symbol bench.com and extracted content in markdown format. Search for neural OS, real time UI, mocking up examples, researching AI's cognitive effects and D8 HHS related criticism like the amount of work and data it took in in like one query and it's processing and then transforming is. Is crazy. And I guess that's that whole like sub function thing to effect because unlike just running say a normal deep research where it's going off and you know, slamming a thousand sources or whatever it does, it's like that within, you know, it's like so branched down in terms of it's just utilizing that whole thing as a single task. So yeah, and coming up with a.
Chris
Strategy for answering those questions and that like with our reputation for being average, I can tell you we would never go to this level of research on the topics But I'm able to thanks to this process.
Mike
Yeah. So I. Anyway, that. I think that shows what it's really great at right now in a lot of ways. And a lot of the other things to me need a lot of work. I think the agent to agent next step will be really interesting where like assistants can calling assistance and you're abstracting the layer of MCPS and like model selection up one level higher.
Chris
It's gotta happen. Like I think. I think that's absolutely essential to get to the next level for sure.
Mike
So I wanted to circle back. We mentioned it a bit earlier on O3 Pro and, and you mentioned using a bit of O3 and I, I think it's important to note we have. I believe both of us Rarely use the OpenAI model since GPT maybe 4.0 first came out.
Chris
I look at GPT 4.1 and it makes me sick. I'm just like, what a piece of crap.
Mike
It's not a bad model.
Chris
No, no, no, I know, I know. I'm not saying I'm right. I'm just saying like, that's the place it holds in my poor visualization mind. Like, I see a dull gray image of a little emoji throwing up when I think of GPT 4.1. I think most of them excitement.
Mike
Yeah. They just became like either slow or clunky or confusing in terms of which to select and just.
Chris
I don't trust it. Like, would you trust it to make oura ring life decisions like whether you should go to bed or not? Probably not.
Mike
No trust at all.
Chris
It's like, screw you, GPT 4.1. I'll sleep when I want.
Mike
I really. I think you're picking on 4.1. 4.0 is probably that model. 4.0even is a sort of daily chat model's fine. It's just fine. Like, there's nothing wrong with it. It's fast. It's pretty smart. It's been getting better.
Chris
There's just so many better alternatives. I'd use flashlight over that crap.
Mike
Yeah, that's the thing. I would too. Not flashlight. I'd use flash over it, but not. You're underestimating how dumb flashlight is.
Chris
But yeah.
Mike
So anyway, I just wanted to sort of call this out of like credit where credit's due like that. I'm genuinely saying, like, I think O3 Pro, even though people say, oh, you know, it's not as good as like O3 Pro High or whatever other tune we have is. I think that model's got that original essence of that feeling of GPT4 where it could just cut through the. Yeah, like to me one thing.
Chris
So me and me and a few of the people in the this Day and AI Discord gambling channel were using like refining a prompt together to use with O3Pro on horse races. And the goal of it isn't to win all the races, the goal is to look for where the bookies get the price wrong. Right. So it's it. This horse should be a 5 to 1, but they've priced it at 20 to 1 or something like that. The idea being over time they will occasionally win and when they do, you make a big profit. Right. So. And what is very interesting about that is just how different it differently and answers to the other models. Like you can paste the same prompt, the same data into basically every other model and they'll all roughly give a similar answer. Whereas O3 Pro just comes out of nowhere with these crazy ideas and seems to do really well at it. And so that is what inspired me to start using it for other problems I had. Because I'm like if it's this sort of unique in its thinking and I think that's what you mean. Like with GPT4 cutting through, I was like, well, I would much rather a unique bold answer than I would just like the standard AI answer that I know it's going to give if I give certain context. And I feel like that's what you get with O3 Pro is like a unique perspective. It might not necessarily be right, but it's different. And it is actually a form of intelligence from what I can see. And so I've just started to go to it when I have difficult problems and I find that I look down at that model selector and be like it did it again. And it actually is what led to me using O3 basically as a daily driver this week. Because I'm like, if O3 Pro is this good, then you know, it's baby brother is probably pretty decent as well. And so far for me it has been.
Mike
Yeah, I think there are some limitations it's worth calling out with O3 Pro. Like this is not a model you are using throughout your work day to get some stuff done. It's a. I am stuck on something or need a novel answer to a problem. Now I'm going to switch to it and ask it for help. I definitely, yeah, I strongly would say like that's where it is the best at and the best to use I think too for coding models. Why it's not the best coding model is It's a problem solver and a lot of the coding models like where Gemini 2.5 Pro shines or Claude Sonnet shines with code to me they shine because they're able to output so many tokens and, and force out this code how you expect. Whereas O3 Pro is definitely like oh, here's the fix, like here's the two lines.
Chris
Yeah. And it has a totally different style of output I find as well it doesn't format things the same way other models do unless you ask it to. So yeah, it's different in a lot of respects and I think that diversity is great when you're trying to solve a tricky problem.
Mike
Yeah, I so right now my daily is to Gemini 2.5 pro mostly a lot of Claude Sonnet because of its ability to asynchronously tool call so well. And then O3 Pro is my phone a friend. Like I get stuck, I just phone a friend because I think all the other models where I'd switch to them in the past, it just becomes like a group think exercise where they all think the same. Like you're like these aren't intelligent, they're photocopiers.
Chris
That's exactly what I was trying to say. And good point you make as well. When it comes to MCP, Claude Sonnet 4 is the king.
Mike
Yeah, it really seems to really, really.
Chris
Get the brief and it just goes hard and it's actually weirdly because I find it as a day to day model it's slower but when it comes to MCP it's actually faster because it's much better at batching the tool calls like It'll do like 10 at once if necessary. So it's really obviously been designed for that purpose to at least some extent and it shows.
Mike
It also makes me think if the future is mcps, which I'm certain after using it now for a couple of weeks, it is like at least in the short term future of AI timelines. I think if that's the future like the, the OpenAI models and even Gemini 2.5 Pro have a lot of catch up to the way Claude Sonnet is calling these things like they are far behind. And I think increasingly as people use and rely on these MCPs for their day to day, they will naturally go to the model that supports the MCP workflow the best.
Chris
Like yes, I agree, I, I the Google offering on, on that front it could be my fault. I'll definitely put that out there. I might not be using it the way it's meant to but so far on the MCP front, it seems a lot weaker than the others.
Mike
Yeah. So I. If, if any labs listen to our show highly data. But in fact it seems like the focus would be less about these coding agents or you know, keep working on that but with a different teaming.
Chris
And also if people are listening from these model labs, don't underestimate our ability to completely sell out. Like, if you give us like credits or like a hat or something, I will shill the hell out of your models. I won't disclose it. I will just. We. You know what I mean? You can be my corporate overlords and I'll say whatever you want and nobody will know the difference.
Mike
Now everyone's gonna think the reason I'm so hot on O3 Pro is it's a. I got a shirt or something.
Chris
So.
Mike
All right, moving on. I. This is. This is, I think an intriguing topic. So the creator of Ruby on Rails, father of free, co owner and CTO of 37signals, Shopify Direct, he's also a man champion.
Chris
Did you know that?
Mike
No.
Chris
He does. I didn't even know 24 hour Le Mans racing and I'm pretty sure he won it.
Mike
What is it? I don't know what Le Mans, it's a.
Chris
It's a car race. Like. Like a war of attrition kind of thing where they just keep racing till someone dies or something. Yeah.
Mike
Wow. I didn't know that. But this is DHH we're talking about.
Chris
He races in Le Mans. Yeah. Like he's a car racer.
Mike
Crazy.
Chris
The guy's like a hero. He's like, he's. He's moved all these infrastructure on a private hosting. Like they've never taken funding. Like he's just too good. He makes me depressed.
Mike
I. I would play that I can't be your hero, baby, but I get dinged.
Chris
So he's a cool guy.
Mike
So anyway, MIT released this paper, your brain on ChatGPT. Accumulation of cognitive debt when using an AI assistant for essay writing tasks. So now we've hyped all these technologies, we're going to say why they're bad for you. So it says turns. Basically the bro summary. The AI bro summary is. Turns out AI isn't making us more productive, it's making us cognitively bankrupt. One interpretation. But DHH says this tracks completely with what I've experienced using AI as a pair programmer. As soon as I'm tempted to let it drive, I learn nothing, retain nothing. But if I do do the programming and it does the API lookups explains the concepts. I learn a lot. And I would say my relationship with O3Pro is similar. When stuck on a problem, it points out what the problem is. And you're more likely to take in the problem and go into the code and be like, oh, I get it. Now I understand why this occurred.
Chris
Yeah. I think definitely when you go down those paths of just like, okay, rewrite the function, you copy paste, you test, it doesn't work. You're like, okay, try again, dickhead. And then, and then. Yeah, and then you start getting more aggressive. And then you're like, it's not working. What's wrong with you? Suddenly you've spent an hour and a half copy and pasting code you haven't even looked at. And when you actually finally try to go, hey, K, let's just identify the problem, you realize that it was just something minor. And I think there's definitely that tendency for you to switch off and just go, okay, yeah, I totally trust you. And I know not everyone is a coder, but I would imagine, like, it.
Mike
Translates to other tasks.
Chris
Yeah, that's right. Yeah, it's that sort of. Okay, I'm gonna let you make all the calls, and I'm not really gonna look at this with a critical eye. I'm just gonna trust it. And sometimes it works great, and other times it doesn't.
Mike
Yeah. And I, I. He goes on to say, you know, it's a trap for people learning something because you end up just letting it do the thinking for you, and therefore you don't actually learn anything. And one of the examples cited in this is, I forget the timeline now because we really didn't do that much research, but the timeline was something like within an hour of writing an essay. So submitting an essay, they were questioned on the context of the essay, and the retention level was incredibly low versus an essay.
Chris
In their defense, I think that would have been true of me writing essays as well back in the day. You're like, screw this. I don't need to know about, like, you know what Kathy's motivations were in Wuthering Heights. Like, do you remember? But to me, it was just all a mess. Like, they were just sick people living in the country.
Mike
But to me, over this, like, the way I'm thinking about it is I have felt a little bit, like, sad and a bit depressed over it in the past couple of weeks when I've been really tired of a night just yoloing stuff where I then have to, like, stash it all the next morning, like, Basically commit bankruptcy on work because I wasn't using my brain at all. Yeah.
Chris
Or where you delete like thousands of lines of my code and then complain that something's suddenly not working.
Mike
Hey, that's never happened. It's so weird.
Chris
It was working fine yesterday. It must be this 10,000 line commitment.
Mike
So it's like, it's like zombie, zombie work. Like you just sort of, you, you, you think you're getting stuff done, but you're really not and you're just creating problems.
Chris
Yeah. So like really, really deep seated problems that are incredibly hard to find. It's true. I mean, like, it. And maybe it comes back to what I was saying earlier, like, oh, I'm just gonna, I'll become one of those YouTubers and I'm like, I'll let MCPS make my life decisions for the next hundred days. And then suddenly I'm like in jail in Guatemala or something like that.
Mike
Yeah, you're in deep, deep trouble.
Chris
So it's something.
Mike
It's a phenomenon I'm sure people in our audience have experienced in many disciplines across using AI. And I think it's just most apparent in a lot of coding use cases right now, because that's like one of the primary uses. But I've noticed it personally in modifying a legal agreement or copy on a website or when I'm getting it now with MCPS to handle my email, where it'll slip something into a draft and you're like, hang on, what? But I wasn't paying attention because I.
Chris
Trusted I slept with your wife.
Mike
Yeah, I'm not thinking or, you know, it's like, don't worry, I called the cops on this guy.
Chris
Well, actually, you know, it's kind of funny because this is what happens with Patricia all the time.
Mike
Like, she's got.
Chris
My code is littered with love notes to me in comments and like console logs and things like that. There's like love hearts everywhere. And I actually commented to you the other day, imagine looking at this code base like, like five years ago, being like, this guy's sick in the head. He's like writing love notes to himself in the code and like, you know, like, crying when there's an error and like saying, chin up, you know, things will get better. Broken heart emoji. Like, you know, those kind of things can seep into the real world, particularly when it's remembered in the knowledge graph. And this is another reason why I think the whole idea of like assistant knowledge encapsulation is so important, because the last thing you want is Is like sending professional emails with like, oh, Mike didn't sleep well last night. That's probably why he's writing you this apology note. It's like he had a really shitty night and that's why he's asking for an extra ten grand on this deal.
Mike
I don't really understand either, like the whole. Yeah, singular memory thing. The memory being attached to the assistants or even at an MCP level eventually as well, makes so much more sense because then you're able to switch context really easy. Whereas, like when you have that core memory feature on something like a chat GPT or Claude or whatever, I don't even think they have a memory. But you have that memory capability across like personal and professional things. It's just, it can get real dirty and real bad.
Chris
Yeah. And like you might inadvertently disclose things to people that you really don't want.
Mike
Yeah. Or it just brings it into a topic. Like you're showing someone its output and it brings in, oh, by the way, I'm sorry to hear about your cat. Or.
Chris
Yeah, it doesn't exactly.
Mike
It doesn't work terribly well. Well, okay. So there was a few other moments. I think we kind of alluded to this earlier. There was a talk during the week by Audrey Kapathi. I'm probably gonna get in trouble for pronouncing his name wrong, but saying self driving felt imminent back in 2013. It certainly didn't to me. But I guess he was in the weeds with it. But 12 years later, full autonomy still isn't here here. He says there's still a lot of human in the loop. He warns against hype. 2025 is not the year of agents. This is the decade of agents. And I think that aligns actually with what we've been saying throughout this episode. Hopefully, maybe that came through or didn't, is that there's just so many steps to get there and there's so much height where, you know, Altman's been on a bit of a podcasting binge this week telling people apparently he has like some full self driving model that's better than maybe Teslas that he can apply to any car. And it's like Pixar didn't happen, man. But I think there's just, there is that much hype and that hype instills fear in people. But again, working with the latest technologies like MCPS and trying to make them agentic and trying to get them to do our jobs for us yet again just gives you that sense of where things are really at today. And and how, you know, humans agency itself is, is really still deeply required in these loops.
Chris
Yeah, and I think a good example of the MCPS is just how immature they are as software. They're all first attempts by people who are well meaning and just want to get the stuff out there. But there's no experiments there, there's no, there's no responses to feedback of how it performs. Like there just simply hasn't been enough time for this to be mature enough to be completely reliable. So while a lot of what we talk about is the future, or at least like you say, the sort of medium term of how the models get more abilities and get better, there's just simply a lot of work to be done to enable them to be able to do that. It's sort of like, you know, you're designing a robot and it's got the best brain in the world, but the arms don't work and it can't like make a martini yet or whatever they do now, you know, like it just takes time for it to give it those abilities. Just because it has one part of the puzzle doesn't mean it's all there. So I mean it's a pretty general statement. It's like, oh, you know what, probably won't happen this year, but in the next 10 years it definitely will. It's like anyone could make that prediction. I reckon AI is going to get better in the next 30 years. I reckon 30 years from now it's going to be pretty good. Like put that on the record. I reckon it'll be good in 30 years.
Mike
So Aaron Levy, the CEO of Box staying relevant with this post. Getting AI agents to work extremely well for complex enterprise use cases is non trivial trivial. If you're building agents, your moat will directly correlate to the amount of software you have to build on top of the AI models to execute the task. The harder the problem, the better. I do agree with him here. I think there's so much opportunity to have a discipline around an agentic use case and just build through that use case and then that use case is consumed through maybe the agent to agent protocol. We've said it many times before, but it's so clear playing around with it now where you can go and find these problems and just slam down some niche with an agent Being the.
Chris
Being the industry standard service that embraces the protocol is the way for many of these companies to stay relevant. Like file storage for example. Be the MCP that handles file storage, like searching files, ragging on files, summarizing Files, uploading files, downloading files, transforming files. If you just had one, that was simply the best at that. It was fast, it was reliable, it interfaced with all of the major models with like Plug and Play, then that is how you continue to get to stay relevant in their industry. And I think this is going to apply to so many industries where you need to be the best one. Like right now it's really unclear which MCP is to use where because they're all sort of first effort. Whereas if someone comes out with the absolute definitive paid one in a particular industry, everyone's going to plug that in because they can trust that it's a piece of their stack now they can rely on to hand over to the agent. So I think that is right in that respect and I think that's where big work needs to be done, because the payoff will be there. There's just no downside to it, in my opinion.
Mike
And that's what I think for all the hype, there's just nearly all the MCB's that we're working with today were just built by some random developer who tapped into existing APIs. A lot of these are not led by the company. I mean, increasingly we're seeing that, but.
Chris
And you can see it in the lack of thought in them in terms of the way they work. Like a good example is Trello, right? The Trello MCP plugs in like with an API key. But it's like each time you ask it to do something, it's like, okay, I'm going to list all of your organizations now. I will list all the boards now I will search those boards for the tickets. It's like, so every time you use this bloody thing, you know, it has to rediscover the entire world in order to do anything useful. It's like, this is not a good interface. Like this is a bad interface. And yes, the AI agent can get there eventually. But like, at what cost? Like so much time, so many tokens.
Mike
Sorry, you've run out of tokens this month.
Chris
This is where it needs to have like application layer level stuff where it's like, okay, agent to agent, this is my Trello agent. He is responsible for this board and in within this board, he knows what the board's for, he knows what the goal of it is, who the developers are, what the tickets are about. And then you ask it a question and it's like, bang, here's what the knowledge you need. And so this is to me where the gap needs to be bridged. It's not enough to just wrap the Trello API and go done MCP complete because just in practice it just doesn't work the way you think it's going to.
Mike
This is why I think the agent to agent stuff, if it actually takes off, will probably like, I wonder if it'll be a release that what we consume is the agent from the provider or do we consume the MCPs and the agents handled by the software that you're using to interact with it? Because you might want to tune that agent that's interacting with it, as you say, with that pretty proprietary context.
Chris
Yeah. Or you have the ability to deploy these MCPs with individual configurations. So I deploy the Trello MCP on a platform on Atlassian or something that says I want an MCP for this board. You know, like all the data I just said you would specify in some URL they host the mcp. You plug that into your agent. That now works, if you know what I mean. So you would can configure it at the application level for that particular role. And I think that that kind of thing would work just as well as an agent in that scenario.
Mike
I just was reading the follow up comment on this post and someone said software is no longer remote when a college student with cursor can replicate it in seconds.
Chris
But they can't. That's bullshit.
Mike
No, I know, but I'm like, all right, can you release the new GTA release?
Chris
Yeah, exactly. I mean this is the thing. AI really lends itself to doing fast flashy in incredibly impressive demos. The problem is there's no meat on the bone and as soon as you start digging into it, you realize that it's very hard to iterate to the point where you have something full fledged. Like I think I spoke earlier about building like cloning SaaS software using a screenshot and the create with code. And I think the real downside of that is when you get to, okay, what about when you get up to the 15th screen in this thing and what about authenticating into the different services and knowing which libraries to use and how are you going to host it and deploy it and how are you going to have a state aging environment? Where's your database going to reside? Like these are things that, that you just can't do with cursor right now. Like you're not going to be able to do it vibe coding. As a college student, you still lack the experience to do that, to get it all the way there.
Mike
It's the thing though with, and I mean we Said we were going to do it and we need to follow up on it. But this idea of if you give it authentication, if you give it a database, if you have that sort of sass stack behind it, then it's pretty capable of building custom SaaS applications for these disparate use cases.
Chris
I'm denying that, but I guess what I'm saying is there still needs to be other stuff, like other development around it. It's not. These guys are talking about just giving a raw model to a college student, replacing all software. That's not going to happen.
Mike
No. And like the other piece of it right now is simply that, you know, yeah, there's a long way to go and it comes back to that original point around human in the loop. Like this still is such a place for that human workload and agency, whether we like it or not, whether you're one of those that wants AGR to take over the world so you can relax on a beach and use no cognitive functions. Depends.
Chris
Yeah, I think everyone's gotten over that. No one, no one really has the sort of doom and gloom as much anymore.
Mike
He's been trying. I mean, he's been out on the circuit again.
Chris
Yeah, we get the occasional YouTube comment about, ah, what's the point, guys, we're all going to be replaced. But you know, they might be just depressed for other reasons.
Mike
I think people are starting to wake up to the reality. All right, any final thoughts for the week? That was no good. Just leave it. Don't elaborate. I like the. No, I thought you were going to.
Chris
Talk about the hat scam you're running.
Mike
Oh, yeah, no, I was going to bring that up. So apparently some people ordered hats and didn't receive them. And what was funny, we joked it was a scam and it kind of has been a scam for those people. So this is sort of a call out to everyone. If you did order a hat and you haven't gotten it yet, you should have months ago. So if you haven't do reach out to us, you can email us how. What's a good email? I don't even. I'll put an email below. I'll put a link below to contact.
Chris
Put a link below. And if you want a hat, leave a comment and I'll send you one because I have. Mike gave me like 300 of them and they're just taking up space in my cupboard. So I'm happy to send you one also, which may or may not arrive. It may be a scam.
Mike
Okay, except you probably won't have to pay for that hat. So that the poor people that ordered. I'll probably have to refund them now. All right. So we run such a fine business here. The other thing I wanted to give a shout out to because I haven't in a while is our community on Discord. So you can go to this day in AI.com which I'll be honest has been largely abandoned. But there is a link in the bottom left hand corner with the. To the Discord community where you can get an invite which is vibrant.
Chris
Right. Like I, I'm amazed at the level of. And detailed discussions that go on in that Discord. Like it really has become its own community with and with real quality and we do zero moderation at all. Like it's amazing that it's just a self running community with people who really care about this topic and, and discuss it in really interesting ways. Like I really genuinely enjoy following it and reading it.
Mike
Yeah. And so the other thing I did want to play us out today with Chris and you didn't like this track and I. We copped so much like this, this love love lovers of the. The. The diss tracks and there's haters of them. So I just, I'm gonna play it at the end of the episode. I couldn't help myself. I. I wrote a song with O3 Pro and I personally think it's the best ever. The best.
Chris
So here's the thing. Historically, when I hate them, everybody loves them. So this, there's a good chance that you'll like this.
Mike
All right? So if you're into these tracks and you don't mind listening to them, please rate it below. Tell me if I'm wrong. I think I'm thinking otherwise like all because you don't like track. You got to remember the model. It's the, the wording it wrote. Very powerful, very good.
Chris
No worries. And just unsubscribe and dislike the video if you don't like.
Mike
All right, I'll play us out with that. Thanks again for listening and all your support. We'll see you next week. Goodbye.
Rap Artist
Yeah, it's the pro edition Talk slow if you scare clock Gemini grab your horoscope four oh, you finna time out, bro.
Chris
Let's go.
Rap Artist
That algorithmic animal cannibalizing cloud spit vector so mechanical I jam the panic in your route Claude's opus sounds like a symphony Cool Till I surgically sample his code and remix him as fuel Gemini got twin vibes but let's be straight as a clone fight I split the dude apart like I'm opening AirPods at home nights four zero flexing speed you quick but you miss son Running sprints while I'm solving proofs like I'm Princeton I don't hallucinate, I illuminate facts pure leave you polishing patches while I'm dripping couture precision so vicious I make your inference engine stutter I'm the butcher slice your metrics watch the benchmarks clutter when you need to phone a friend hit that life flying ring Ulti pro on the line make the knowledge bank sing million dollar question Imma lock it on a watch the light splash green redesign just yell yay I'm the cheat code deep mode never second best.
Mike
If you're.
Rap Artist
Betting on a model baby bet on the profile look Claude's contemplating Geminis meditating I'm detonating truce while your tokens keep inflating my context window shorter who cares I'm a shooter one clean verse and your million token ramble sounds neutered you call it a thing break I call it the kill switch slow, non methodical surgical real slick cause speed without logic is a toddler with scissors I'll take a minute then deliver lines that kill your model blizzards I'm the reference check the bulletproof spec the stack trace slayer when your JSON's a wreck you brag about your multimodals flex on your APs yet I'm mainlining math proof while you're chasing butterflies when you need a phone a friend hit that lifeline ring 03 Pro on the line make the knowledge bank sing Million dollar question Imma lock it on a watch the lights flash green Regis just yell day I'm the cheat code deep mode never second best if you're betting on a model baby bet on the pro pace so call me overpriced, call me slow call me what you like but when academics panic I'm the one they skype I'm the lifeline of big brain the heavyweight champ y' all are demo day buzz I'm the product of stamps Ping the hotline when you're confident Benz Cuz in this who wants to be a millionaire? I'm your last two friends yeah Sam.
Is AI Making Us Stupider? Gemini 2.5 Family, Neural OS, MCP Future Thoughts & o3p-pro
Hosts: Michael Sharkey & Chris Sharkey
Date: June 20, 2025
In this episode, Michael and Chris, two "proudly average" AI enthusiasts, give their typically candid and humorous deep dive into the shifting AI landscape. The focus is on Google’s newly released Gemini 2.5 family (including Flash and Flashlight), the evolution of neural operating systems and multi-agent workflows, and what it means for productivity, cognition, and the future of software. The Sharkeys also dig into the risk that AI tools can make people lazier—or "stupider"—and candidly review changes in daily workflow, the diversity of AI models, and how agentic protocols (like MCPs) might reshape how we interact with the digital world.
(00:02 - 09:07)
Gemini 2.5 Models:
"Could you imagine any other character, like release where they have to put, ‘oh, it’s stable’?" – Mike (00:02)
Hands-On Demos:
"It’s the most amazing model, but the bad part about it is it’s total shit." – Chris (02:47)
Discussion of Use Cases and Workflows:
(09:07 - 16:57)
Model Behaviors and Roles:
Switching Models for Productivity:
"What’s remarkable really is just the diversity of the model outputs right now… you get a natural feel for which model is going to be good at which task." – Chris (15:23)
Reflections on Model “Dumbness”:
Tuning Workflow:
(16:57 - 34:16)
Neural OS and MCP (Model Connect Protocol):
"This is the first week… where it’s just finally clicked… I haven’t logged into my email in weeks. Like it’s all through an assistant." – Mike (17:03)
Combining Multiple Apps and Context:
Shortcomings and Training Wheels:
"My work changes into being more of a director than an active participant." – Chris (22:18)
Reality Check:
(34:16 - 45:00)
Memory and Customization:
Agentic Workflow Depth:
"We need the delete-all-files-on-your-hard-drive level of MCP. We need to launch the nuclear missiles if necessary MCP." – Chris (28:25)
(45:00 - 63:43)
Model Shifting in Workflow:
AI Model Comparison and Preferences:
"All the other models… it becomes a group think exercise where they all think the same. Like, these aren’t intelligent, they’re photocopiers." – Mike (63:06)
(45:00 - 55:54 and 79:08 - 80:13)
Concept:
"If the agent-to-agent stuff actually takes off… maybe what we consume is the agent from a provider…” – Mike (79:08)
Security and Preference:
(66:40 - 72:17)
MIT Study: "Your Brain on ChatGPT"
"As soon as I’m tempted to let it drive, I learn nothing, retain nothing. But if I do the programming and it does the API lookups… I learn a lot." – DHH, cited by Mike (66:40)
Personal Anecdotes:
Conclusion:
(73:03 - 81:52)
Predictions & Industry Voices:
Building Moats in Agentic Software:
"Your moat will directly correlate to the amount of software you have to build on top… the harder the problem, the better.” – Aaron Levie, cited by Mike (75:46)
Need for Richer, Enterprise-Grade Agents:
On model speed vs. intelligence:
"It is the most amazing model, but the bad part is it’s total shit." – Chris (02:47)
On agentic workflow:
"Instead of me thinking about, okay, do the next task… I can actually think, what are the five things I need to get done today? Start different threads with the AI on each… and it does the work for me." – Chris (22:18)
On the role of AI in learning retention:
"As soon as I’m tempted to let it drive, I learn nothing, retain nothing." – DHH (cited around 66:40)
On “cognitive debt” in real work:
"It’s like zombie work. You think you’re getting stuff done, but you’re really not and you’re just creating problems." – Mike (70:05)
On AI outputs bleeding into the real world:
"My code is littered with love notes to me in comments… imagine looking at this five years ago… like, this guy’s sick in the head." – Chris (71:19)
Flowing, irreverent, and insight-rich, this episode showcases why average users wrestling with frontier AI tools often have the most useful (and entertaining) perspectives—and why, for now, the human in the loop isn’t just a bug, but a feature.