
Loading summary
A
So Chris, this week many people were suspecting that you would have fixed the bookshelf. But no, you've gone a level worse. You've got a beard. You look like absolute. And here we are. So huge news this week. Absolutely huge news. There's a lot of speculation about a new model coming out. Maybe next week, maybe next week. Merch packs have arrived. Gemini 3 could be launching next week and the Internet is going absolutely wild. People are using, I think it's in AI studio. You can basically go in and there's like an a B test going on. A sneaky a B test where you. We believe they're getting access to Gemini 3 and what that has led people to do for some reason this is the new benchmark they're benchmarking or one shotting the ability to create like recreate the Mac OS X like desktop experience or the Windows desktop experience in a single shot in, in like one clean HTML file. And this model is really delivering like check this out. Like for people that can't obviously see because most of most people listen to the show, it is, it looks fully like Mac OS X. I can open text files, I can open the web browser here and I'm on the Wikipedia website. I can resize it and move it around. It's pretty phenomenal. And it even has like bouncy ball effect at the bottom of the screen. There's a, there's a full on terminal so you can even do commands like LS to see the, the folders available. I don't know if I can go into them.
B
Let's try removing all files.
A
I can't do that. So it, you know, there's a bit of a foul. I can change the background color, but anyway it's pretty good. Here's another one and this one is just a comparison to Gemini 2.5 Pro because people have been getting really overly excited about this and I thought just as a bit of a reality check, someone also put out there another version. This version was done in Gemini 2.5 Pro and I think, you know, looks pretty nice. It's still pretty amazing, right? Like it's still really good. You can browse around and you can even open Python files in a code editor. I would sort of say functionality wise this one's better, but it might be to do with the prompt. But anyway, I think if this is an indication of the polish, the level of polish and detail and output that this model can deliver, assuming it's releasing next week, it's going to be a really exciting week. And I thought given the speculation around it, maybe we could just put out our final Gemini 3 wishlist quickly.
B
Gemini 2.5 is already pretty good. Like it's kind of hard for me to think of what else you'd really want. I mean, I don't know, will they make it 2 million context window, perhaps Cheaper would be really nice. I think that's probably the biggest one for me. Make it cheaper.
A
I think the tool calling needs a lot of work, but I would say either we have improved it in SIM theory or they have been sneakily releasing minor updates and not saying anything. But the simultaneous tool calling and the agentic flow of the model could use a lot of work right now. I think it's truly a turn by turn, like a turn based model, instead of having that feeling like a Claude Sonnet or the new Claude Haiku, which we'll get to in a minute, where it has that internal clock and it feels just agentic at its core.
B
Yeah, I agree. I think I tend to use Gemini more for coding and I use Sonnet more for agentic stuff. So maybe just. I'm naturally going that way for that reason and Gemini 3 might fix that.
A
So anyway, we'll, we'll report back, obviously. I'm assuming it's next week. That's what everyone's saying. But let's move on now to a brand new model. Brand new model, hot off the press. It only came out, I think eight hours ago. Now introducing Claude Haiku 4.5. And this is the younger sibling, of course to Claude Sonnet 4.5, which was released I think two or three weeks ago now. And so it's a smaller model, it's cheaper, we'll get to pricing in a minute. But in terms of where it sits, in terms of the software engineering benchmark they've got up here, it's somewhere between Sonnet 4 and Sonnet 4.5. So it seems to be weirdly performant. I mean, I never really trust benchmarks. But to say with agentic coding, it's on par with say like a GPT5 or Gemini, like an ahead of Gemini 2.5 is kind of crazy for the size, price and speed.
B
I used it this morning quite extensively to actually debug mcps live. So what I'll do is when I'm adding a new mcp, I'll actually, when there's an error, ask the model, hey, what was the error? And then I'll say, here's the code, here's your code. Can you just explain what, why this broken and fix it, please. And it was really competent at that. Like, really good. And then because I was working with mcps, I get these really long requests of like testing every single tool in every single combination and give it to the model to do and. And Haiku was able to handle that perfectly. And it's really fast as well, so it was quite refreshing and good to use.
A
Yeah, I mean, we haven't, as you say, had that long to play with it, so it's hard to know just yet. But I would say fit. Like, if you compare it, I think the closest comparable for me, not necessarily on price, but just on where it sort of sits in the food chain, would be Gemini 2.4 by Flash, especially that new, newer tune of it. And I think that while it's a really good model, it seems to struggle sometimes in the feeling of intelligence. Like it seems really dumb sometimes. Whereas just in my limited tests with Claude Haiku 4.5, it never felt stupid to me. It never felt like a lesser model. In fact, I think it kind of makes me feel as though the performance and price gains of some of these models when they come down, like things could be really good. Like if you can get the, like you compare it to GPT5, say, in terms of tool calling, and it's so much faster. It's like if you're working with MCPS, like, why would you bother with GPT5?
B
I agree. I think when you work with MCPS extensively, the speed is a massive factor. Like having to wait for every step to take so long just to get to the point where it's actually doing the work. It's not worth it. And when you're doing like, operational things where context is everything and just calling the tools important, the speed is what actually makes you more productive over it, being slightly more intelligent, but taking ages. Like, the whole idea here is productivity gains with using these things. So speed actually does matter. It is actually as crucial as the other stuff, not to mention price. Like, if you're working with huge amounts of data, you don't want to be constantly stressing about how many tokens you're using and doing it. And with this model, you don't have to stress at all.
A
So the pricing is not, I mean, not where I think it should be if they want to be really competitive, but it's still. Still pretty good. So it's a dollar per million input and $5 per million output token. So I think it does have that anthropic Claude Premium, like the Sonnet and Opus models. But I still think it's pretty reasonable for how performant it is. And I think naturally you do want to compare it to a Gemini Flash, but I mean, these are, these are sort of somewhat worlds apart in terms of cost now. So it's another really good option in the toolkit. And I think, as you say, for MCPS and agentic use cases, I don't see why you wouldn't use this over like say, Claude Sonnet 3.4.5.
B
The one downside I see is it doesn't have the beta 1 million context window that Sonnet has. So it's only 250. 200,000, sorry, context. It does have 64,000 output, which is very high, but only 200,000 inputs. So that is one limitation that Gemini Flash doesn't have because it's a million and Claude Sonnet doesn't have because it's a million. So depending on what you're doing, that may be a blocker for you, but if you can deal with that, it's worth it.
A
So my feeling though is it's just a more like optimized version of Claude Sonic 4.5. Like that's what it feels like. I cannot tell the difference so far. So that's a pretty good sign. And for people who do use SIM theory, we have made this available as non frontiers. So you can essentially use it as much as you like and it won't affect your token limits at all. So to have a model this good, I mean essentially somewhat for free, is pretty mind blowing. Now I do want to talk also about its ability because I think this is like the new benchmark, right? Its ability to one shot, a Mac OSs OS style operating system. So I put it to the test. I said make a Mac style operating system where I can open a notepad app and draw in a paint style app. And I just wanted to get a comparison here to if this is Gemini 3, if it really is indeed that, what would Haiku, a pretty cheap new model do? And look, it's not as stylish, but it's still pretty good here. I've got it up on the screen. I can open my notepad, I can take a note here. I can move the windows around. I've got my pain application. I would actually say it's pain application better. So. So I don't know, I don't know if this sort of puts like, you know, water on the flames of the hype around Gemini 3 because you can see like quite easily. I could did this and it did it. Honestly, not the best benchmark.
B
Given that they can all just do it.
A
Yeah, it's all.
B
Yep, no worries.
A
Well, I think people like to like me, like look at the taste, you know, the tune and you can like, you can really see the best. I mean this is unbelievable and like the, the detail and the icons, it's very, very impressive. So we'll wait and see. But, but anyway, Haiku passed the, the new OS one shot benchmark. The other one I looked at was its ability to just like do the simultaneous tool calling and that's where it'll go off and call like multiple tools at once. In this case I just. Because I had my local set up, I didn't have that many MCPs installed, so Google was its only option. But it's still called Google multiple times. Looking at various angles around the AI, AI news in general, it was very, very fast. It formats things really nicely as well. I even got it to create a document. It created a document and, and was able to put the sources in, in the format I wanted. So it's like, I'm really impressed. I'm. I'm gonna, I'm going to just daily drive it for as long as I can possibly stand outside of probably harder problems.
B
So to give you an example of what I was doing, I was testing all the Microsoft MCP so like SharePoint, Outlook, Calendar, those kind of things, Planner and to DOS and stuff like that. And what I do is I upload a file to my OneDrive and, and then I say, can you download that file and then email it to me with sort of a summary. And then what it is, it's my electricity bill. And then the AI sort of admonishes me for how much money I spend on electricity and gives me all these analogies and like big red letters like you need to do something about this. And Haiku did it just as well as the other models. It was great. And it did it all with simultaneous tool calls as well.
A
Yeah, so that's the thing, like when you're dealing with that kind of important use cases like getting critiqued on how much electricity you use, it's great. It works tremendously well. So I also have been playing around with VO 3.1. Last week we got Sora and Sora 2 Pro and those models, I feel like. And I'm sure maybe, you know, put your hate in the comments below if you disagree, but I think they sort of got somewhat revealed for what they were when they released the API without the watermarking and out without the tune of like the sort of TikTok hilarious video, which I'm not discounting at all. I actually think those videos were really good until the copyright restrictions hit. And so then I just started using it against VO3 which I tried to show on last week's show. But unfortunately OpenAI's new agency computer like.
B
Melted down or blew up or something.
A
It fully crashed my computer. So I wasn't able to show you. But. But yeah, the comparisons were pretty meh. Like the, the like VO3 in general, even though it's a little bit pricier but not much, just far outperformed it. So Google of course pushed out VO 3.1 and you might think, oh, it's just a slight improvement. But no. So there's some new features and the new features are now you can have a start and end frame so you can give it where you want it to start, where you want it to end and it'll like fill in the bits. And so I was able to produce a video which I showed you earlier of I gave it two images. Me with sunglasses on, sitting at this exact desk and then without. And I asked it to. To create a video where I like put the glasses on. So here's the. This is VO 3.1 running on SIM theory with four first and last frame. This is VO 3, so you can.
B
See.
A
Does pretty well. This is the. Keep in mind this is the lower quality model like version of it because it's cheaper and I'm cheap. But yeah, so it goes from the first frame to the last frame and the last frame obviously was me putting on my sunglasses. So it's not perfect yet, but you can see how this could be used for online advertising or just like e commerce websites where it retains the product. Really?
B
Yeah. And I think that the, the only weaknesses for me are things like your teeth look a bit weird and you know, just some of the stuff around the talking. But you can tell that they'll also solve those problems in no time. The ability to keep your character, the ability for it to know how to transition to the last frame is incredible. Like it's a really major advancement. You've got so much more control over what happens in that.
A
So then the other thing they added, which is really cool, cool is the ability to basically like give it a series of images and join them together as part of a scene. So what I did was I uploaded my photo, like just my face and I used the image tool first of all with character reference to create an image of me in an astronaut suit on Mars, right So you can see that on the screen now if you watch. And then I also asked it to create two more images, a cinematic image. So you want to look closely at these. A cinematic image of the Mars landscape and then an alien. And I thought, there's no way this is going to work. And then I said, okay, now use all three images for reference to video. So reference the videos, the skill. I just wanted to be precise in VO3 to make a video where I'm walking along a scene on Mars that looks like the Mars image. Then this alien appears. So it's a terrible prompt. I don't deserve the output I got. And here is what I got. Very cinematic. Here we go. Like, how crazy.
B
That's the actual landscape, the same alien, and your face in the spacesuit.
A
And I mean, like, it's close enough to me that you'd believe it's me, right? Like, I. I mean it. That is just so cool. Once this gets, like, obviously we talked about this with sort of like, longer output and more control and better tooling around it. Like, I would even just say better prompting, better tooling, you could probably put together, like, someone could easily build, I reckon, like, a really fun video editor where you could make your own, like, stories and stuff now.
B
Yeah, I mean, it's. It definitely seems like maybe a bit of a marketing problem for Google in. In respect that, like, if this was open AI now, they would have had keynote speaker, they would have brought everyone out, they would have had, you know, hype everywhere on X and all this sort of stuff. And whereas with VO3, the only reason I even knew it came out is because you told me, because you're into that stuff and you actually tried it, but you don't hear much about it. Like, Google isn't hyping it up the way the others are.
A
No, I don't think it got a lot of pickup because there was a lot of initial excitement around Sora 2 and that Sora app, but then that's, you know, fallen off a cliff. And I. I think in a way, you know, the sort of consumer of AI is a bit exhausted. Like, there's. This is like, oh, wow, cool. But the wows are getting less and less. Even though I think the models are improving. I think people are just getting so used to the technology now and so used to what it can do. Nothing really shocks or surprises them anymore. And therefore, you know, people either are pretty dismissive of it or the reality, which I think is probably closer, is a lot of these tools just aren't Ready for prime time yet. Like they're just simply not good enough for any commercial or not any, I shouldn't say any commercial use, but like obvious commercial uses. So they just, you just dismiss them and move on. But if you think about stitching this stuff together like with 11 labs, training your own voice, like the pro training voice version, being able to produce audio of different characters in a video, putting yourself or just specific characters in a video and then compiling that, like you could do it. Like you definitely could build this stuff. And I, I'm so tempted to upgrade Video Maker to be able to do like AI video film clips and like actual like short films and stuff. But the, the, the problem then comes to, and this is again another problem with these video models is they're just too expensive. Like this video cost me $4 US 4 bucks. Yeah, like, am I really gonna play this to laugh?
B
And I definitely think like when you think about the advertising industry, like making short ads, making you know, individual clips and stuff, the value probably is there for certain people, depending on what they're doing, but it's not there for like what we want to do, which is muck around with it and experiment with it. Because you're gonna need some sort of return in order to justify spending, you know, $4 for 10 seconds or whatever it is. And that's even if you get it right the first time.
A
This is the problem for developers as well. Right. Is the only way you can figure out if there's utility in these applications, especially as like an indie developer, is to play around with them or even as an end user. Like if you've got a subscription to SIM Theory and you've got VO and Sora and these audio tools and you're thinking like maybe I could, you know, play around with these to get an idea if they would have practical utility for something in my business. Like you, like, you really would have to spend like a couple of hundred bucks like I do each time when I'm working on them. And I just, I think it makes them inaccessible and to me, and maybe it's already in place, but I think Google or OpenAI should have some sort of like video developer allocation where they're, they're giving away credits and sort of taking their hit or a loss leader on this stuff so that devs can afford to play around with it. Because honestly if I was doing this for longer, you know, to build that video maker, I think I spent like 250 USD to build that. Now if I had to keep iterating on that day after day after day, like that becomes very inaccessible very quickly.
B
I wonder if they could just have a mode that, that just has like a heavy watermark across the top and so you can have it in development mode to work with it, get an idea of what you want to get to, then if you want to publish it, then you've got to pay. I understand then that'd be taking the loss. But to get the adoption there and get your model being the one that everyone wants to use, it might be an idea.
A
The fact they charge so much says to me that they're probably already making a loss. You know, like I wouldn't be surprised if they lose money on each generation and, or like, I mean OpenAI definitely was with Sora and still is. And so it's like how long can they keep like just absolutely pissing away cash on this stuff. So I think, but to the haiku thing, I think the optimization, as you said, the speed and efficiency of these things is just, has never been more important because if they can get them down, then a lot of these use cases start to become a reality. Like maybe it's like we don't need better tools, we need you to work on optimization. Like please.
B
Exactly. I think that's the thing. Like you were asking what would we want in a, in a Gemini 3 and it's like cheapness and speed because if you've got those things, the models are already very, very capable. And I think some people are artificially constraining their use cases or constraining the adoption of the models because it's too expensive. So if you think about like the mass rollout of quality agentic systems, the way, the only way that's going to be possible is with affordable models. Which means that when people are doing the larger scale outs, they probably are using something more like a haiku or a Flash because you just can't use a Frontier model for that stuff. It's just not the payback isn't there? So therefore I really feel like in some ways those models are better because of that trade off.
A
Yeah. And I increasingly think they're getting to a point, at least the optimized models. And I don't want to speak too soon about I could because I tried to use, after we talked about it, the new Gemini Flash tune because it was so much better at tool calling. But I just got the sense after a while I'm like, oh, this thing's just way too dumb and just doesn't interpret my dumb prompting. Like if you put detailed prompts in. I'm sure it would be really performant but I'm lazy and I do bad prompting so I rely on the more intelligent models to basically fail fix it for me. And so after a while I'm like there's no way I can stick with this. So I'm really interested with Haiku. If I can stick with it then that, that starts to change the equation a little bit. But again, still it's not, I mean it's cheap but when you can access GPT5 for $0.5 more per million input, it's like, you know, you really want it to be fast and pretty good at tool calling. Otherwise why would you switch?
B
We added some other models during the week as well, most notably GLM 4.6. So everyone would be familiar with GLM 4.5 which was the previous one we had. Now we have GLM 4.6. It was really popular in the open source communities. A lot of people liked it. I've used it a little bit. It also seems a bit haiku esque in the sense that it's cheap. It also is really good at tool calling and just seems like an all rounder in terms of like a model. I feel like if you're going to host your own model, if you're going to fine tune a model, GLM 4.6 is a pretty good starting point.
A
Do you know and someone said in the community this week, and I really agree with them that while I think the models are improving, they're equally becoming so commoditized but they, the tunes of them have such different strengths and weaknesses that you know, if they were truly commoditized you would just pick like the cheapest fastest model, right. And just stick with it. But you are still getting performance gains in areas where they seem to focus on in that particular model release. So I think with the Sonnet and the Haiku 4.5 series, it's really just about maximizing agentic tasks and a lot of that work was around the Claude code product which I think they're making an absolute killing from. So that's why they're optimizing the models in that direction. So I do think that there is a lot of commoditization going on, but it's brilliant to be the consumer in this case the models because you have lots of options, lots of choice and you can really now lean into the models with the different strengths and weaknesses once you're familiar with them and like it's not gonna like burn you in price unless of course you want GPT5Pro.
B
Yeah, that one kills them all. But also I think it means you can really build in anticipation that the frontier models will eventually become the cheaper ones. And you can, you can just rely on the fact that you're going to get that next gen model in there for a better price pretty soon. Like it doesn't have to be like a permanent economic equation that makes you lose because you know at some point it's going to work.
A
Yeah. And they keep getting better and the price sort of keeps coming down. Not always, but I think.
B
Well, I mean, the only issue is when they're deprecating the old ones so you have to switch to the new ones and therefore pay the higher price.
A
Yeah, I mean, in an ideal world you'd be able to run the very best model as fast as possible locally on your computer. So you're totally in control and it's super fast inference. But I don't know. That day I think is a ways.
B
Off and there's no money in that for anyone.
A
Yeah, well, yeah, that's true. So there's no incentive to do it. Okay, so big news from OpenAI. Huge news. What's the thing there? We say insane. Insane news. Sam Altman posted this.
B
It would be funny. Sorry, but if a leader of a company actually was insane and they did stuff that wasn't in the company's or their best interests and you're like, that guy's actually insane. He deleted all the files on their corporate server and then he burnt down the building and then he yelled at people on the streets.
A
Yeah, I think then it would actually warrant insane. But anyway, so Sam Altman did something insane. We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. Sure. We realize it's made it less useful, enjoyable to many users who had no mental health problems. But given the seriousness of the issue, we wanted to get this right. Now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases. First of all, don't know what he's talking about. In a few weeks, we plan to put out a new version of Chat GPT that allows people to have a personality that behaves more like what people liked about four oh, we hope you you will like it better. If you want your Chat GPT to respond in a very human like way, or use a ton of emoji, or act like a friend, ChatGPT should do it. But only if you want it not because we are usage maxing in December as we roll out age gating. Age gating. Oh, can't wait. More fully and as part of our treat adult users, like adults principle, we will allow even more like erotica, a verified adult. Now this sent everyone into an absolute spin OpenAI getting into the, the, the, the, the porno, the porn business.
B
So firstly, comparing anything to GPT4O, that is by far the worst model they ever released. It was a piece of crap. So I don't know why people are out there pining to get that back. And I agree the GPT models definitely don't adopt the personality to the level of some of the other models. But I would say at this point, aren't we just kind of used to the restrictions? Like, yeah, it's not going to help you make a pipe bomb and it's not going to like help you like plan to murder, murder someone or you know, those kind of things. It's going to refuse those. But all the models kind of do that. Right? Like it's, I think everyone's over that stuff. Like in the early days I really wanted an uncensored model just to see what the thing would come up with. But I don't really get it now. Like that many people really like, is there really that much demand from their customer base for porn? Like, surely they've got more noble goals than that.
A
Well, there was some good memes around this. There was the meme with the, the two buttons and the sweaty astronaut cure cancer or erotica and the fingers trying to decide between which one. But I, I don't know. Like, I don't want to criticize him too heavily here because I think what he's trying to say, but maybe framed it very poorly, is like, we heard that you want more control over the models and you feel like you have this like personal relationship with them. We don't really want to have too much control over the personality. So we're going to give you tools to have control. I think, great, why not? Like, and I think 4o from a consumer sort of chatbot point of view, people did really like, I think from, we just look at it from a, like actually doing real work. So, so I think.
B
But isn't, isn't the real thing here the age check, as in, I. E. We are going to ID hard ID every single person on the platform with like, you know, passports and face scans and stuff like that?
A
Yeah, so let's talk about that because we had like, we, they introduced this into the sort of Enterprise end. Right. So for us to get access to the latest models, they. They hid this under the guise of, you know, we don't want the Chinese training on our models because then they might beat the US in AI and, you know, we'll. We'll forever be enslaved or something. Some weird fantasy there. And so we had to do this. Like, we had to do the whole.
B
I had to give a blood sample, pee in a cup, send some DNA.
A
A lot, A lot of evidence that we were alive and we didn't really love it, but we had to do it to get access to, like, all the models and things. So I at the time felt like that was disguising the fact that they. Yeah, they just wanted to, like, verify all their customers. Now it feels to me here like this. Treat adult users like adult. I agree with that principle and I think you should have control over the model. To say, like, if I'm interested in this stuff and it's not going to harm anyone, and you're an adult in your own time and your own privacy and you want to go do that stuff, I don't care. Like, that's fine by me. Whatever. You're not hurting anyone. And I think that's good. That. And that's what we were calling for early on. We were like, treat adults like adults. So they're. I think they're finally somewhat agreeing with that. But the problem I have with it is this verification. And I understand why you need to do it because, like, kids, you don't want them accessing things, or you don't want the chatbot getting all, like, you know, horny with them or whatever. And so it's a, it's a tricky situation. So I get why they want to id, but then you also look at the dark side of that, which is like, what if they get hacked? What if the government gets access to it? And Jason Calacana's tweeted and I think it's like, pretty extreme, but I do think it's pretty sensible about where it could go is like, the blackmail is going to be fast and furious. Send me a bitcoin or I'm going to release your late night ChatGPT sessions. Assume someone at OpenAI is reading your prompts and sharing them with their friends. I don't think I. Look, I don't think they're doing this stuff. But it does make you wonder, right?
B
Yeah, but I mean that. The problem with that statement is it's a bit like saying any system could be hacked. Like our national airline, Qantas I think got hacked again this week. Like they get hacked every other week now. So like all my personal stuff's been just distributed over the dark web repeatedly by these big companies. So I think that, yeah, okay, that's like saying any company can be hacked and probably will be. But the thing that really strikes me about it and why I just, I just get the wrong vibe from it is. And I get, I know I don't understand the economics of it, but clearly OpenAI is positioning themselves as like the consumer AI, right? Like they, they on one hand will give examples for businesses, but if you look at a lot of their presentations, like the UI stuff we want to talk about soon, it's all like planning a trip, you know, writing a card, doing a presentation for your mid level white collar job or whatever, and then watching porn at night or like chatting to a porn bot. It's like they clearly are going all in on that sort of consumer side because to me, if anything, businesses actually want control over the restrictions on the model. So for their staff, for example, they don't want to restrict what they can do work wise, but they do want controls in there to protect the staff members themselves when they're working and be alerted when the system is being misused and things like that. So if anything, I would have thought that maybe developing those controls more would actually be valuable rather than going, oh, you know what, screw it, do whatever you want, guys, as long as we get your DNA.
A
Well, I think I, to be fair to them, I think that what they're saying is they're going to allow you to choose as the consumer. And I'm, I think if they have that level of control, then you'd think they could deliver it at a business end. But I do find it kind of weird. Like a couple of weeks ago it was all about him going on a podcast being like, we just don't want anyone to, you know, commit suicide or whatever. So we're, you know, helping guide the model in these areas to stop it, which I think is quite noble. But then the next week it's like, oh, for people without go wild. Also, there's parental controls and I just don't see why they have to do this. Like, why, why is the, the real question and there's got to be more to it. There's always an ulterior motive and it always comes down to money and power with these people. And so I'm, it does. I get the impression they're just throwing things like throwing darts and seeing what sticks. Like if we hear people more addicted to this thing.
B
The question also then becomes, and I know this is probably already possible and already happening, but what happens when the in a crime or something, the government subpoenas OpenAI and says, I want all this person's chat logs? Like that Caliconakis line isn't that far fetched when it comes to like the court's power to compel them to hand over people's chat sessions?
A
Well, yeah, I mean, you're gonna get it. And they hand over Internet sessions, so I don't see how it's any different. But yeah, but yeah, I'm interested in what listeners think. Like, like if you want to leave a comment below, like, are you pro, pro the erotica stuff? You don't really want to put your name to that. Don't say that. Just say like, oh, you know, I don't want age restrictions or something. We can have a safe word. Safe word. But anyway, the memes have been great. So regardless of all this and the controversy that I'm sure they didn't mean it to blow up this much. The memes have been excellent. Let me show you one of them. It is Sam Altman as porn star Bonnie Blue, holding up a sign. A thousand. She held up this sign. So I'll let you figure out what that's referring to. But it's Sam Altman's face on, on her in this particular photo. And it's referencing the Salesforce partnership that, that was announced with OpenAI this week. And the idea that OpenAI is really just the partnering with literally everyone and anything. And it seems like this has become the strategy for SAS companies on the like, public market when their stock price is like flatlining or in short, like slow decline. And there's been a lot of good, good commentary around this. So breaking open AI to partner with OpenAI to help fund OpenAI. OpenAI up 90%. There's another one. The like, say the line bot meme. Say the line. But we're partnering with OpenAI and then everyone cheers. So yeah, it's interesting. And then another one, in case of emergency, break the glass. And the glass is OpenAI partnership. So there's two ways to look at this. I think there's the Salesforce example, like quite literally just waving the white flag to these guys and saying like, you know, take our user interactions into ChatGPT. That's fine. Like, you know, have access to Slack. That's also fine. And I think there's that part of it and then there's the other part. Which is any company that doesn't really have an announcement. In this sort of AI world where the only growth in the US economy is AI, if they want to grow now, their stock price, they just announce any, any partnership. Even if nothing ever happens around.
B
I would really want to know with Salesforce is all their Einstein AI stuff that they had whole events about and like tried to sell people on and stuff a. What was it like? What was actually underneath it? What did it do? Was it just like a mechanical turk with just people just like rapidly typing in the background? But really they're just good at selling stuff. They don't actually really have anything to sell. They just sell ideas and concepts and words and then people pay millions of dollars for them. It's kind of crazy, but I'd be pretty mad if I had paid for that thing and then like, oh, you know what, we'll just use open AI. It's better.
A
Yeah. And they had like, didn't they have an Einstein model too? Like an actual like they, they were putting out a lot of open source models and clearly they just didn't see. They just are never going to get there. So they, they did the weird thing which is like we're partnering with OpenAI but we're also partnering with Anthropic for certain use cases. Like anything that's interesting. Yeah, anything sort of semi serious.
B
Anything that needs to work.
A
Yeah. For anything that requires tool calling, we're partnering with Anthropic. But yeah, so this like integrations across platforms and the main reason I wanted to bring this up is not to like read their press release but just to show how quickly they conceded and, and I'm like, is this a sign of things to come? Like all these companies conceding the, the sort of user interaction with their tool? Because we predicted this, we're like everyone's just going to want to operate through their platform of choice. Whether it's like Claude or, or Chat GPT or a SIM theory or whatever it is like your own custom internal like chat thing.
B
You're.
A
That's the frame that you'll eventually operate all this software through. And I think the MCP ui, while I'm not that certain that is it, I'm pretty sure that this is the way forward. Especially being able to just get so much more done. And so they've said this agent Force360 apps, like what is that name? Is going to be available in, in this chat GPT apps program. You'll have access like CRM data, tableau, all this kind of stuff. Now this. Interestingly, there's already MCPs for all of this stuff. So I'm assuming maybe they'll just like take them and rewrite them a little bit.
B
They're not MCP by them though. It's just by like enthusiasts or whatever.
A
Yeah, yeah, sure. But I don't need to try them.
B
They're all awful. That's the problem. Like you almost need an officially supported one because their software is so complicated to actually get it working. And my big fear is that these big partnerships are going to lead to this walled garden effect where it's like, yeah, we have these open protocol mcp but it's only when there's elite partnerships involved that people can actually use those mcps. And I don't know that that's going to happen. But I worry that these top level partnerships will lead to that kind of thing.
A
Yeah. And it could be especially problematic for organizations where they do want to benefit from the MCP, but then they can't use it in their own system. Like if they're building an agent system, they might want to rely on the salesforce mcp, but they can't. And I think this could be a problem and it could look salesforce and all these guys will definitely hold all of your data that you paid a store in their systems hostage. There's no doubt in my mind, like this is 100% going to be the strategy.
B
Yeah. And it makes you wonder if a lot of companies are sitting back contemplating, like do we have a fully open M. C.P. which would be the best? Or are we going to protect the data like you say, protect the user's data from themselves. So we still own it and we still have control over it. Like, because I reckon they really are thinking that through right now.
A
Like we said this last week or the week before, like they, we know they are like we've talked to some of these people. Like they're all struggling.
B
I just didn't want to like name names, but I know that we know that there's some top level companies. Exactly. Contemplating this thing.
A
Yeah. And so it's like, do you go fully open and just like, like sort of those open. Like you know, in the early days of, you know, everyone had to have an API and then slowly there was pullback where people realized, oh like switching costs are so low because you can just suck out all the data.
B
Yeah.
A
I think this is, this is the interesting path we face here in like a SaaS world is, is the first thing like, okay, I'm in Chat gbt. I'm using Salesforce, I'm interacting with it, like we've said before. And then like, okay, now they have a database now they have better ui now why do I need Salesforce? Like, like that's definitely the.
B
Well, not to mention that clearly. I mean you just saw it make a Mac operating system. The thing could literally just infer how the app works on the back end as it goes. Like as it calls the tools, it could gradually just patch that back end, then like build the backend as it goes and then once it's got all the data, it's replicated your app. Like it would be very straightforward and we've talked about this before, but copying SaaS apps, especially if you had access to the data, like that would be very easy now, like, like in a sort of automated, ongoing fashion.
A
But herein lies the problem. If you go and build your app for this new ChatGPT app store that I assume will eventually come, which I. I don't know, it's probably still a good strategy in the short term for distribution, but like, you know, well, actually on that distribution front, are people really going to go into Chat GPT and go, oh, like Salesforce is in here. Great. I'm going to pick that as my CRM. I'm not so sure for those established markets, maybe like newer markets potentially. But I think what's going to happen is they're going to look at like the top five used apps or top 10, then they're going to go clone them all and just bake them in to Chat GPT. I mean, it's just so obvious that's.
B
Well, they've done it before, right? They took a lot of early AI startups ideas and just took them like cloned them wholesale.
A
Yeah. And it look, if I was them, I'd do the same thing. I'd get my app store going, I'd figure out like what makes the most money, what areas should we focus on? And then I would lean in heavily. And so now like you think, say you're like a productivity company or startup or CRM or whatever it is, you're thinking, well, do I lean into this or do I stay out? And if I stay out and my customer is not going to be happy because they want access to their data and they want me to have an app and maybe they'll churn as a result of me not having an app or do I lean into it really heavily.
B
Going to become very common. People asking their SAS providers, do you have an mcp? No, I'm going to someone else who does because I really, I really need that in order to, for that's how I work now.
A
Well, I would, that's the thing. Like I look at the, the inverse of this problem, like being on the consumer or like the business user side and I'm like, I would 100% leave the SaaS app to go to another one that did have a really good mcp like look at Stripe. I think that's a pretty good example. They have nailed the MCP experience. Like it can do everything. The only thing it can't do is like hard delete stuff. But I don't know if I really even want it to be able to do that. So I'm, I'm fine with that. Like I'm happy to have to log in and do all that stuff. But in terms of you using a.
B
Different like square or something like that now you probably would consider switching for that, right? Like it can really help a business.
A
Well, I would say the companies like you know, Charge B and Recurley and like all these like layers on top of the payment layer that are somewhat, I guess somewhat competitive with Stripe. Well, somewhat directly, but somewhat indirectly at times. I think they're the, the ones that are gonna be crunched by this because if you're on one of them you're like, well you know, Stripe has this great MCP and like I can, I don't really need that layer anymore because the AI can help you figure that stuff out.
B
What value does the app add? Is the, is the value add really just, it's a crud app with a database and a few processes. Right? Like if that's what your business does, I'd be seriously worried right now. Like Salesforce for example, whereas you look at a company like Twilio who provides, you know, phone infrastructure, SMS infrastructure, things like that. I reckon if anything those companies will become more valuable because they will be the end, you know, access to the real world for MCPs and things like that. Like, I actually think it enhances their business quite a lot. Whereas it has the power to completely destroy something like a Recurley or Salesforce. Because really all their whole value add is literally just a database with some code on top of it.
A
I mean, to be fair, there's like a ton of business logic built into these applications. And Salesforce in particular, like it's really just a GUI for staff on top of an SQL database. I think it's far more at risk long, long term, like not short term because like there's so many, like, security protocols. The enterprise moves so slowly. But yeah, for new startups, like, why would you bother if you could have a database within your AI workspace?
B
Well, when you see that line item on your bill, like 20 grand or 30 grand or whatever it is, you just like, well, hang on, I can get most of this just working with an agent. Like, you'd start to wonder like, do I actually want this long term?
A
Well, I, I mean, I, I called this on the support platforms as well, like Help Scout, Zendesk. I mean, even, even though we use Help Scout, it's like all we really need is an MCP that connects to a shared inbox.
B
Yeah.
A
And has a database where you like allocate this email, like this email ID from that inbox or subject line or whatever to a particular agent which is linked to the, say in SIM theory, like the user id. And like, all of a sudden the app is pointless, like, because you could just be like, show me the new tickets. And maybe not even do that. Maybe you just have a home screen where it shows them. And then you're like, okay, now go and answer them all. And then you review them. And then there's somewhat of a process built around that.
B
Like, and, and the, the, the crucial factor here that we spoke about briefly last week is that MCP UI protocol being added to mcp. The idea being that an MCP can specify how you display different UI elements for input and output in order to interact with it. And if you add those elements, then your idea there of having a automatically generated Help Scout dashboard is perfectly possible. You simply add that to its resources in the MCP and then you've got that. And so a lot of these applications actually will be able to be replicated in whichever MCP client you're using, as long as they support the protocol. So it's not like it's even one that will necessarily win here, assuming they're actually open.
A
But this is why I'm bullish on these apps not being powerful for existing startups, I'm bullish on them being powerful for new startups. And the reason being is like, if you came at this and said, I'm going to build the universal MCP for support, just customer support. Right. And you just have a database that can't be seen. Really nice integrated UI elements and good, like all the user also like everything baked in, but it's all connected to like the ChatGPT user or the users of the MCP. All of a sudden you're like, well, I Can just buy my software through this store and I, maybe I pay a few extra bucks per user a month. But like it's all integrated, it's all there. It's, it's how the new sort of AI first worker wants to work.
B
Yeah, and I, I strongly agree with you there that there's a real need for like an MCP first approach. Like as in we didn't just like wrap an MCP around an existing API because it doesn't necessarily work in that way. You see a lot of the MCP struggles where like Outlook, for example, if you want to email someone, it's got like, oh, I'll just look up the contact. Okay, now I've found the contact. Now draft the email. Now send the email. So it's doing a lot of these sort of unnecessary, well, not, they're not, they are necessary because it needs to do them, but like in my opinion, unnecessary steps. Whereas a built for purpose MCP could actually realize use cases and have the tools as use cases rather than being, you know, just a disjointed set of tools that might be automated in another app consuming the API normally, if you know what I mean. So I think that this idea of, you know, dedicated paid MCPs like you say, that are just experts in their area will really take off and I think we deserve full credit when that happens and some of the profits.
A
But yeah, so the, the other thing though, I would say is maybe that's the shorter term. But then the next step you can imagine is a big enterprise is like, okay, cool, we can go and pay for all these disparate apps that we have no control over when we know our own business better than anyone. We might have all of our data stored in a data warehouse where we know the data structure, we're fully aware of it. So it's like, okay, well why do I need any of this? I'll just build my own internal mcps, hire like a couple of people internally to maintain them. And now I can replace like solution after solution. I can drive down my IT spend completely. And because it's powered by AI, it's like it doesn't really matter as long as the core platform's got, you know, the security and the permissions and stuff baked in.
B
Yeah, why would the protocol does. That's the thing. Like if you build an MCP for your company, you can already do it with the OAuth, right? So you've got that level of security there, which is strong. You can host it yourself over HTTPs, you can IP restrict It to just the ones that the clients you want to allow and you've got, you know, enterprise level security straight away. Like it really isn't that hard. Not to mention as you say, think about how many organizations now would be syncing their internal databases and systems off to like a Snowflake or Amazon S3 and then they have something that reads that in and maps it into Tableau or some other, into some other API and some other system and they've got all this, you know, stuff just to get access to their things and they can just cut all of that out of the picture, have an MCP for their company and then just use it in their, their favorite AKA Sim Theory client, MVP client. And I think that that is going to become really common. And the other thing is I really feel like it shouldn't, couldn't be emphasized enough that if you do it in the right way, it's actually a lot more trustworthy than all of these other systems you have going on already or at least the same like. So I really feel like it is the future of interacting with big company data.
A
It also stops the problem of like where you've got all this disparate data and different systems systems where you're sort of thinking oh one day we'll clean that up and you never will. Like the truth. Come on, like never. So it's like if you just build endpoints and it can be one mcp, like that's what we have where it's just one MCP with the tool calls and the tool calls just know where to go fetch it from securely. Then like, then you start to think about that new level of customization of software here where it's like it's customized to how your business works. Like the actual process that you're, you're running to handle customers or you know, whatever it may be and then that starts to become agentic. It has direct access to systems, it can make changes on behalf and then the humans just sort of approving like I think this is so much bigger than people realize right now. Like this, this might be bigger than all of software as a service and all of the App Store and all of that stuff combined. Like I think it could be far in excess and, and especially because the.
B
Models can already do this, it's really just giving them access to what they need and like to the point where not only can they do it, once you've got your base MCP in place, you can point it at your database or your schema or whatever the system you're using and going, hey, you know, could you add, like, what useful tools could you add in here that would help me get these jobs done? And it's like, here they are, sir, and put out all the code and you add that to your mcp. Now you've improved the amount of tools you've got. And I think the next step that we're both looking at is, okay, now we've got the tool set, we need to combine those into skills. And what we mean by skills is like, procedures, like a series of. When this happens, this is the procedure you follow. Then you start to build up a bank of those, and then it's a matter of time before the agency gets good enough where you can start to have a role. And the role is performing these tasks when these things happen. So when this event happens, perform this procedure or these series of procedures. When this happens, you need to seek approval and then perform the procedure or whatever it is. And I think that this is going to be the future of how businesses work. And so this idea of agency is going to come in a sort of iterative fashion. Like, we're going to get there gradually. It's not just going to be one glorious day where it's AGI and it can do everything. It's going to be a gradual process where just like now, we're constantly typing everything into an AI terminal to make decisions and do things. It's going to be, holy, wow, I've got like 20 of these things running and it's basically running my whole business. Like, it's, it's going to be like that kind of thing.
A
Yeah, this is where the, the. Where you, if you're spending a lot of time using it all the time or thinking about where it will go, also using it and trying to get there. It's a lot more obvious than what companies are necessarily presenting as the vision. And I think this is why there's this disconnect between, you know, people in a business going, oh, it's not even that good, it's not that helpful versus what people like us might see. So let me illustrate that here. So as part of the Salesforce announcement, they have this, like, apps and agents section and they've got like Google Agent Space, Claude Dropbox, for some reason, like, who knows why notion, of course, and Perplexity Tableau. But I get it. It's sort of showing this app screen and in their world, they see ChatGPT, and Claude is like a, an app. And then the example is, hey, chat GBT. Can you turn my Q4 deck from Google Drive into a post for leadership. Use bullet points to highlight what's important and cool. But wouldn't it be better if you showed him, like, coordinating, like, oh, like list all of the support tickets or list all of our current sales opportunities and figure out how I can progress those deals or I don't know, more like real use cases. And then that sort of takes me all the way back to what was presented at the dev day, which I wanted to touch on again. Like, We've got the booking.com app, the Canva app, the Coursera app. And these are just so, yeah, like.
B
The future isn't apps. We already have apps. We have helpful. It's like, use your apps 10% more efficiently. Like, like browsing booking.com with an AI, like going, Ah, wouldn't it be great to go to Chicago this weekend? You know, like, it's just not helpful.
A
And this is, so this is what I don't get. Like, I, I, I get why they're doing it because people probably build all sorts of cool things and back ends. And I know I'm slightly contradicting myself earlier when I said, like, there's a big opportunity here to build like a, a database and, and build some of these startups from the ground up. Like, if you take Zendesk and say, okay, I'm just going to build an MCP that just lives in say, chat gbt and that's all I'm going to do. I, I do fundamentally think that's a, a big opportunity. Right. But then if you think about what these MCPs are capable of, which is agentic, like real work, not necessarily like off in the background having full agency, but having the human in the loop where it can say, okay, I'm going to call this tool and I'm going to call this tool, then I'm going to do this, then I'm going to do that. Like this whole app says you can just completely goes against it. It also silos the apps to single use. Like, you've got to click plus and pick which one you want to use. That is not a good workflow for productivity.
B
Like, it's not almost, they're almost diminishing. Yeah. How good their models actually are by forcing you to choose.
A
There is not a single use case I can think of right now, including the support one. I bang on about email. I bang on about like all of these different things. I use MCPS for every single day right now where I benefit from having to force select a single mcp. Unless I'm like producing maybe a video or image where I'm like, hey, I definitely want to use VO 3.1.
B
Well, remember we actually originally had that skills concept in Sim theory because originally the models weren't great at tool calling. So if you gave them say, Google Search, which was, you know, there weren't much around in those days, but yeah, those days, like a few months. But you had Google Search, right? So no matter what you asked the bloody thing, it would use Google Search. It's like, hey, how are you today? I'll just search Google to see how I am today. You know, that kind of thing. So it was super annoying and we're like, this is useless. What we'll do is we'll force people to select which tool they want to use and that way it won't accidentally call things and waste their time and be slow and all that sort of stuff. But then they made the models better and they made them a lot better at tool calling and knowing when to do it. And, and then you can make better tool descriptions where the AI would follow rules about when it calls tools or multiple tools and things like that. That's a solved problem. And yet they've gone backwards in terms of the way they're getting people to interact with it.
A
And that brings me back to the original point. Why I feel like a lot of this is just sucking in the user interfaces and the use cases into ChatGPT so they can see which performs the best and then think about future applications of their own tech. Like that's. It doesn't. Yeah, maybe it's a step thing. Like next year we get like multi app calls. But then again, like the UI starts to become less important and then the further confusion is, you know Greg Brockman, right? I'm Greg Brockman.
B
He got the sound effect.
A
The sound effect did play. You couldn't hear it, but I've been.
B
Wanting that for ages.
A
So Greg Brockman tweeted chat GBT apps are very powerful and can now include full fledged applications. So cool. Let's look at what he was referring to. Someone running Doom in, in that portal. And they, they're like, hey, Chat gbt, let's play Doom. And it like loads Doom through some Next JS plugin.
B
Super useful.
A
But why, I don't understand why he would call this out.
B
Why?
A
What, what, what, what am I missing?
B
Like, I think, I think you're right. Like you've said it on previous episodes. I think these guys don't actually use it day to day. I Think they use it for tweets and stuff because, yeah, that's cool and stuff. It's a nice novelty. But people were done playing with that, like, six months ago. Everyone's doing real work now. They're not mucking around being like, oh, cool, I can make a salt lamp website like you do all the time. You know, it's like, yeah, we know it can do that. But, like, I don't need to do that. Like, I don't need to make doom again. I've got real work to do. And so I think, yeah, I'm a bit confused by the fact that they're just so misguided with those use cases when we know it's capable of so much more. It's weird and doesn't make sense.
A
But then here's another example. So this guy posted an app I guess he's working on, and it's like, like, he calls it, like running generator, right? So he goes in and says, like, find me a running. I think it's like, I want to go for a lunchtime run on Montgomery Street. Where should I go? So, okay, fair enough. Play the video. And it's like it finds two routes, I think maybe on Strava. Yeah, on Strava. And then it says, add to Strava. That's kind of cool. But, like, are you really going to, at lunch in your workplace, be like, I know I'll go to ChatGPT, not Strava, on my phone, which is way more accessible and already has this feature on the home screen.
B
And this is what I was going to say. The whole trade off with AI, I think even now, even for me, even with tool calling done in the proper way, is between do I just open Gmail or do I ask my assistant to read my emails and do something with it? And most of the time, I would still probably go to Gmail right now, right? Because it's an app built for that purpose. Like, I can just go in there. I'm used to it, Whatever. I'm not going to, like, do like, interactive steps with Gmail in an AI console. It's not helpful. What it is helpful at is, say, gathering hundreds of pages of research, preparing an incredible sales email or an incredible report, then sending the email and attaching to it. It's useful in a session, it's useful in a mass context. It's useful in the context of having worked with maybe 20 other MCPs or something. It's not useful as a bespoke, targeted thing where I'm just asking it to do crap for me that I can do. Canva's another great example. Canva's really easy to use. You can go into Canva and make a presentation easily. If you want an AI to generate it, they've got that. I don't need to go into chat GPT and ask it the same question when there's this other thing there. And now, you know, both of us have the vision that people aren't going to go into these point solution apps in the long run. But I don't think the alternative is going to be go into like a chat box and then just enable that app and then also have a UI that, that I could have had in the other app. But better to do that. I just don't really.
A
It doesn't, it doesn't make any sense. Like, it doesn't, it doesn't compute for me. I'm like, am I missing something? Am I just going to be way wrong here?
B
It has all the hallmarks to me of a company that just doesn't do web development. They just don't have experience in it. And it's like they're discovering everything for the first time. Wouldn't it be cool if we could do this? And they're like, okay, okay, go do that. And then they announce it and launch it without ever actually thinking about how it's going to be used by people in a practical way.
A
Maybe. I just think it's like small teams trying stuff and occasionally they launch those things that they've tried. Very reminiscent of early Google. And with Google it was like the one trick pony was they were really good at search and the one trick pony of chat GBT is like, it's really fast and accessible and it's just there. And you just like, you're like, oh, I'll just go to chat.
B
And I think brand, like brand.
A
But yeah, I mean, maybe we'll be proven wrong and people like these apps. My prediction is people will install a few, barely use them, and then the thing will die unless they change it. And as we both know, the big benefit, as you say with mcps, is gathering context and then taking that action with all the context. And to do that context gathering, you need multiple tools, you need simultaneous calls for speed, and you need those agentic capabilities in the model.
B
Yeah. And like I'm seeing people doing custom MCPS where it's doing things like quoting for their industry or it's, you know, making precise measurements and things like that based on specifications. Like things that can't easily be done in another system. But most importantly, the usefulness of it comes when it's in combination with other things on its own. Of course, you can just have that tool, but it's the combination of the MCPS and the AI's ability to a, know what to do right, without you having to figure it out, and B, take complex pieces of data and map them into the protocol or the, sorry, like the schema of the other app's function calls. Like if it can gather all the information, but most importantly take that information and feed it into the next step in the chain, that is where you get the value. That's what saves you time. That's what actually gives your, like, gives you power and leverage. It's not, it's not just being able to like basically manually call an API through plain English. It's just, that's not the useful bit.
A
I mean, the only, the only counterpoint I would make to this is maybe the play here is sort of getting the user, like we said earlier, to nudge it and say, like, my end goal here is a canva presentation. So I select the canva, then ask it like, hey, go off and research and do all this stuff to get to this endpoint, which is my presentation. But I guess even then, right, I want control over which MCPS it can access at any given point, but I can only select one. So then it's like it's going to go off and consult whatever it wants to in terms of researching, and you have no fine grained control over that.
B
So I also think like, I don't know about you, but when I'm, when I'm working at my most productively with AI, I get in a long session where we build up this massive shared context. We've already produced artifacts and things that go along with that.
A
Lots of context.
B
And I'm like, you know what? Pardon?
A
I said lots of context. Foreplay. It goes with the theme of the episode.
B
Exactly, Foreplay. Before I get to the erotica at the end and I'm like, all right, let's make a picture that will break these censorship filters right down. But that, I mean, it's kind of true in the sense that I'm like, well, just one more thing. Hey, let's be bold here. What if we did this now? Like, what if we actually added a new module to do this? Or now that we've done this, I want to do this again. Like, just to give you a concrete example, I've made an MCP for say, Microsoft to dos. I'm like, all right, well let's take a Microsoft planner. Now, you know the procedure, you know everything we've done. Here's the input information you need. Bang, make the whole thing. And because it's got all that, it's now more efficient at that task. And I think this is where we'll get to training skills and stuff later, because you can do that process once, bottle it, and then use that multiple times, but now it's still possible. You've just got to get there manually. And I think that that paradigm of working just, just negates all of that benefit. And I just. It just seems crazy to me that the guys who started this whole thing don't get that, but they didn't.
A
They didn't, to be fair, like, anthropic is the MCP people and they have a. I mean, they handle it verbatim how we are. And because it's the right way. And I think chat GPTs come in and they didn't really invent it, then they like smacked on this UI paradigm, which might be good. I want to try it. We're going to add it next week, so we'll see and we'll be able to.
B
I'm not opposed to it. There's definitely cases where, like we said, for example, you've produced a draft email, you've produced a draft something or like a, you know, receipt or something, and you want to see it in a nice visual way or, you know, you need an approval slip or. Okay, well, you've decided to make a video. Do you want to configure some of the parameters? Here's what I have available. You can pick, but it's. It's like, it's got to be in context. It's got to be when it makes sense, not like every time. It's not like you're not building a UI here. The UI components are there for when they're needed.
A
Yeah. Anyway, we will see how it evolves. I think it just like, yeah, it's leadership needed on this to educate people about the benefits of them. And my, my fear deep down is people are gonna go and play around with these and be like, mcps suck. Like, and quite frankly, they're hard to get. Right. Like, I feel like only now we've gotten it to a point in Sim 3 where it's pretty stable pretty fast. We're starting to introduce productivity tools. It's taken us like, probably, what, two months or something to get it right. And so I.
B
But I still think there's a long way to go. Like in terms of optimization of the tools, as I said earlier, like, I think that some of the tools are probably less efficient than they could be and need to be rethought in certain MCPs.
A
Yeah. But I guess my larger point is, are people just going to get turned off the whole thing because they go in and use the, like, Coursera app and they're like, oh, like, this isn't. Like, I could just. I could just ask you these questions anyway, like, why do I need Coursera? I can have it in another tab. Anyway. I just wonder if it'll. It'll push people away from them instead of reveal the true capabilities of these MCPs, especially in the workplace, I think, like, in the consumer world, they're kind of boring to me, but in the workplace, that's where I think you get the biggest bang for buck.
B
Yeah, totally agree. I mean, I personally think there's going to be an explosion in it, but.
A
You know, I agree, and I'm willing to sort of die on that hill. I think it's going to be. It's going to be a big deal. All right, any. Any final thoughts? Gemini 3 rumors Claude Haiku 4.5 VO3.
B
I'm banned from Polymarket, so because I'm Australian and we're not gonna get a scam unless it's in a pub on a pokey. Other than other than that, we're not allowed to do it. And so I don't know. I assume Google's smashing it on there on polymarket.
A
You know what? Like, I'm pretty sure. What's the thing called again? Which AI model?
B
It's like the one with the poor grammar. Which AI model? End of Oct or something.
A
Okay, so let me see if I can still access it, because I'm pretty sure I can. It doesn't appear in Google search, which is really interesting. Yeah. So I can still access it on Starlink. So. Thank you, Elon. I haven't been. So it's like, which AI model? So let's have a look at the current standing, shall we? Which company has top AI model end of October style control on in brackets? What does that mean?
B
I don't know.
A
Which company has best AI model end of October? I think that's the one. Although the other one on A volume on this one's a million. Okay, so we've got Google. Yeah. I mean, come on. Gemini 3 is going to dominate and I think they all know it and we know it. It's gonna. It's gonna dominate it. They will. I'm assuming there'll be shortcomings. But I feel like even Gemini 2.5 Pro, it's the best all round model still today. And if they can knock it out of the park with Gemini 3, then like, this is a sure bet, which is why you're not going to make much money.
B
No, I've given, I've given up on that. I'll stick to. I'll stick to making mcps, I think.
A
All right, before we go, we do have coupon code for you. So you can use coupon SIM Link S I M L I N K Sim Link as a coupon. In SIM theory, if you're upgrading to an annual plan or you're buying an annual plan, the Pro and Max plans, or the family plans, you can use that coupon SIM Link to upgrade. It gives you 30% off the annual plans. It's. We've never done this before. It's a good deal. And it will also give you early access to SIM Link. So if you want to be able to control like you have a spare computer and you want to use it as a sort of MCP connection to go and do real world tasks authenticated, either, you know, an old computer stored in your basement, a Raspberry PI or your existing computer, you'll be able to very soon. And we're going to start slowly releasing that MCP and getting your feedback on it, picking like the best models to run on it and things like that. One of the things I'll say I love about it or I think will be very popular is just when you have to fill in forms and it sounds really basic, but I swear it's good. So you.
B
I've been delaying a security training I have to do just. I'm not doing it until I get simlink.
A
Yeah. So the idea being that you got to fill in this like application for like your kids soccer or something, which I've complained about on the show before. And you know, now it can just hijack another computer and go off and do that and then you can move on, it'll come back and attempt to like complete it as best it can and then you can pick it up either off that computer directly or control that computer and get it done. So we are excited to bring it back. You can use coupon SIM Link, the promotional end October 31st. So if you're listening to this a little bit later, I apologize, but October 31st SIM link if you're an existing user or work if you're a new user, it'll also work, but it's only on the annual plans. The Pro Max and Family edition. All right, we will see you next week. Thanks again for listening. Goodbye.
"Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach?"
Hosts: Michael Sharkey & Chris Sharkey
Date: October 16, 2025
Michael and Chris dive into the rapidly evolving world of AI, focusing on the buzz surrounding the upcoming Gemini 3 release, freshly launched Claude Haiku 4.5, OpenAI’s controversial "Erotic Mode," and the future of MCP (Multi-Channel Protocol) apps and integrations. With their trademark humor and “average guy” approach, they explore how AI benchmarks are shifting, the competitive landscape among top models, the economics of AI tooling, and debate how enterprise and consumer AI will intersect with new tooling and use cases.
“It even has the bouncy ball effect at the bottom… it’s pretty phenomenal.”
– Michael ([00:48])
“Simultaneous tool calling and the agentic flow… could use a lot of work. Right now it’s turn-based.”
– Michael ([02:55])
“It never felt stupid to me… never felt like a lesser model.”
– Michael ([05:22])
Limitations
Hands-On “One-Shot” Benchmark ([08:21])
Tool Calling & Agentic Use Cases
“It did it all with simultaneous tool calls as well.”
– Chris ([11:43])
“The ability for it to transition to the last frame is incredible… a really major advancement.”
– Chris ([14:08])
“It’s close enough to me that you’d believe it’s me, right?”
– Michael ([15:49])
“Honestly if I was… to build that video maker, I think I spent like $250 USD to build that.”
– Michael ([18:46])
“If you’re going to host your own model… GLM 4.6 is a pretty good starting point.”
– Chris ([22:42])
“We hope you will like it better… as part of our treat adult users like adults principle, we will allow even more like erotica, a verified adult.”
– Michael, quoting Altman ([26:47])
Community & Privacy Concerns
Economic & Strategic Motives
Salesforce & Einstein AI “Pivot” ([35:54])
User Data & Platform Lock-In
AI as The New Interface
Custom MCPs & Workplace AI
Real-World Use Cases
“The future isn’t apps. We already have apps.”
– Chris ([55:29])
“Not a single use case where I benefit from having to force select a single MCP.”
– Michael ([56:47])
Agentic Flow & Multi-MCP Use
Anthropic’s Approach Praised
“I think what they're saying is: we heard that you want more control over the models… so we're going to give you tools to have control.”
– Michael on OpenAI’s ChatGPT changes ([28:07])
“This might be bigger than all of software as a service and all of the App Store and all of that stuff combined.”
– Michael on AI agentic workflow & MCPs ([52:22])
“The usefulness of it comes when it’s in combination with other things… That’s what actually gives you power and leverage.”
– Chris on MCP app orchestration ([64:38])
“I strongly agree with you… there’s a real need for an MC-first approach… That’s really going to take off.”
– Chris ([47:46])
| Time | Segment / Topic | |--------------|---------------------------------------------------| | 00:23–02:40 | Gemini 3 rumors, desktop UI benchmark, old vs new | | 03:43–07:53 | Haiku 4.5 intro, benchmarks, pricing | | 09:52–11:43 | Haiku's tool calling, real-world MCP use | | 12:41–18:19 | VO 3.1 demos, AI video advances, pricing issues | | 20:16–22:42 | Economics: speed, price vs. intelligence | | 23:15–24:44 | GLM 4.6, commoditization, model evolution | | 25:27–33:31 | OpenAI “Erotic Mode,” age gating, verification | | 35:54–39:32 | OpenAI–Salesforce partnerships, SaaS surrender | | 43:21–46:12 | App lock-in risks, MCP as interface to SaaS | | 49:48–52:22 | Internal MCPs, agentic work, cost savings | | 55:29–58:15 | Critique: ChatGPT app-centric UI, agentic vision | | 65:37–69:15 | AI session workflow, collaborative context | | 70:31–72:29 | Market predictions, listener call-to-action, offers|
Light-hearted, irreverent, self-deprecating, and energetic. The Sharkey brothers balance honest skepticism with genuine enthusiasm, using humor and plain-speak to demystify fast-moving AI news and tools.
This episode frames the future of AI as one where speed, cost, and deep integration (not shiny UI “apps”) will determine winners. The “agentic” tools and MCP vision are positioned as the transformative backbone of next-gen work—while both hosts question the PR strategies of AI’s biggest players (“Do they even use their own products?”) and poke fun at the “erotic mode” arms race.
For those riding the AI wave, this episode offers both reality checks and inspiration for where AI workflows could soon land—especially if you’re building, rather than simply using, these new tools.
Coupon: SIMLINK – 30% off Simtheory annual plans, including pre-release access to the “Simlink” agentic MCP tool ([71:16]).
If you only listen to one section:
Check out the candid breakdown of MCPs, AI workflow orchestration, and why the “app store” model misses the real paradigm shift—start at [43:21] and follow the debate on where AI productivity will truly happen.