
Loading summary
Mike
So, Chris, this week we continue to.
Sam
Learn that if you have to hype your own model, maybe it's not as good as the benchmarks say. Of course, I'm talking about GPT 5.2. Not to be confused with GPT 5.1 or GPT 5 or GPT 5 thinking or GPT 5.1 thinking or GPT 5.1 pro or GPT 5.2 dash pro, because we haven't had a chance to test that out. So, anyway, GPT 5.2 is out today. What a shock. Code Red has paid off. A couple of weeks later, we've got a newly tuned version of GPT5. It has a 400k context window, so the same as GPT 5.1 and GPT5 128k output, which is very large and great. GPT5, of course, was $1.50per million input tokens. They've risen the price by 25 cents for GPT 5.2 to 1.75 per million. So a little bit pricier. They've also said that they've improved it at a vision tool calling and, you know, generally it's more smarter across industries. You've used it now for a couple of hours. What are your initial impressions of GPT 5 point point two?
Mike
Yeah, it's not very good, is it? Like, I. I straight away got that. I was like, hello, like, when I was testing it and it's like putting, you know, detailed replies. It was like a page of reply to me just saying, hello, going through every single memory it's got and, like listing them and things like that. It's verbose. I tried it with Create, with code, and it did do a good job, but it's just a lot of output. It's like, it's just enthusiastically outputting immediately.
Sam
My feeling with this model is they've obviously felt really threatened by Gemini 3 and they've gone back to the tuning board and they've tuned it with more sort of verbose output, but also with output that just has the vibes. Like, it's really like vibe tuning to benchmarks and vibe tuning to code, similar to what Gemini 3 Pro did. I mean, I think they did the exact same thing. And OpenAI have said, okay, you guys want this, like, we can go down that, that vibey path. And I think that's pretty well illustrated by some of the examples they give in this, this release. So right now, with the old code interpreter, it was criticized from many people saying, like, compared to the code interpreter or whatever anthropic calls it. Theirs was, you know, it outputted more like beautiful spreadsheets and charts and things like that. So they've clearly gone back to the drawing board on some of this stuff and just tuned it to the output people expect. And you can kind of see that with the example I have up on the screen now on the left they have GPT 5.1 thinking as a, as a spreadsheet that it's created and it's just basically like numbers in the spreadsheet. And then on GPT 5.2 thinking now they've added some blues and different colored hues into the, into the spreadsheet. So I think the sort of key takeaway I have from this model is that what has changed is just the tuning. It's the same everything under the hood that at least I've experienced.
Mike
It definitely doesn't seem noticeably better. And I think also my fear with the way they've tuned the tool calling is it's not in the way that we need for agentic modes because it seems to really struggle when it comes to chaining tool calls together or correcting itself when it makes a mistake with a tool call. Like I had several times this morning where it wasn't able to do things. I was trying to make a really long song that failed. And when I did that with Opus, it realized, oh, it's failed because it's too long. It corrects it and it fixes it in one shot. Whereas GPT 5.2 just failed. And, and I've just noticed that it's, it's maybe good at calling tools one time, but when it comes to the parallel tool calling, it's. It's a little bit scatterbrained and confused. It's just struggling with the more advanced stuff that I've come to expect from the better models.
Sam
The biggest observation I have here is anthropic weren't on the sort of like thinking bandwagon early on. Right? I know someone corrected me a couple of weeks ago saying, oh, they came up with the ant thinking tag first or something and sure, but ultimately their models perform just as good without the thinking bs, right? They just work and they seem to have an internal clock and they're very agentic in their operation. Now, for whatever reason, I think XAI with Grok 4.1 and OpenAI and Gemini all lent into that thinking very hard to get the intelligence outputs in the models. And it feels with tool calling like, because they were trained that way, that's why you get that verbose. Like, I will call 10 million tools now. I will give you output. Whereas anthropics models, for those that are not familiar with all the different models, they feel very different when you use it. It'll say, I will now go and call these three sources. Okay, I want a bit more information. So I will now call these four other sources. So it's still asynchronous tool calling, but it appears to at least be like thinking and working as it goes and working with you. Whereas the other models, I think, are aggressively trying to one shot everything.
Mike
Yeah, it's almost as though they're optimized to try to do everything in a single request. And I think that's the big difference because what you're experiencing there with the CLAUDE models is actually our system finishing a request and then anthropic saying, okay, I'd like another go at this, like another iteration. And then we honor that and it goes forward. So it's almost like, as you say, it has this internal clock idea where it's able to anticipate that there will be future rounds and opportunities to correct its thinking and go on. Whereas the other models are being like, well, this is my only chance. I need to do everything all in one process. And I think we said this a while ago, I don't really like the idea of just delegating the entire process to a model. Like, I, as the person controlling the AI system, really want that opportunity to intervene and change things and modify context along the way so it gets better results rather than just going, okay, I trust everything. And I feel like when they release their PRO models, which are wildly expensive, all they're doing is sort of taking that away from you and doing that themselves.
Sam
Yeah. To me it feels like you lose control. And I mean a lot. A lot of people, to be fair, like, oh, for my hardest problems, the Pro subscript, like the Max Pro plus plans, great. But I just, I, first of all, for me personally, I can't justify the time of distraction, like sending a model off to solve a problem and it thinks for 10 minutes and then it comes back with some like unhinged verbose output. It doesn't really appeal to me in that regard. Like, I just find it is. I'd rather have, I'd rather go back and yell at a model several more times that's less intelligent than have this like Oracle kind of style answer.
Mike
I mean, it sounds a little bit trivial, this idea that, okay, well, models taking a long time to reply leads to me procrastinating and not actually staying focused on my task. But it's a real thing, I think with the modern way of working where people are sort of using AI as a, as a coworker, where you're bouncing things off and working together all day, if your coworker is taking 10 minutes per task, that's really going to mess up your day. Especially if they're not right. Like okay, sure, if it was 100% right and completed the task all the way to done every time like we're trying to get to with the agentic modes, that's a little bit different because then it becomes a delegation thing where you're like, okay, I'm going to set it off on 10 different tasks and then I'll collate all that stuff at the end or get it to, or whatever. But if it's working the way most people are working now, which is you're giving the agent stuff like you say you're giving feedback and then iterating through something, if it's not getting all the way to the end every time, it needs to be fast, you can't have a 10 minute latency in every little step of every task you do.
Sam
But then the counter to that. And so like I think a lot of the positives of GPT 5.2 and just OpenAI's rollout in in general is it's available in all the APIs. Day one it was available in the API faster than chat gbt. Like I tried to do a test, we'll talk about later in Chat GBT just to be fair, like, you know, just to be reasonably fair. And it wasn't even available to me yet when we had had it in sim theory for like three hours. So I, I think their, their rollout schedule is amazing. Their infrastructure is incredible and the, the reality is it's so fast, especially if you're not using it in thinking mode. Like GPT 5.2 is just. I forgot how fast it is. But here's my counter, sorry, but here's my counter to that Grok 4.1. If you're using tool, like when you're using tool calling, you want speed, right? Like speed's critical. But I still would go to Grok 4.1, which is $0.2 per million input for tool calling. If I want speed versus versus GPT.
Mike
Five point the amount of times throughout the week I'll go to GROK to bail myself out of a situation where a model can't figure something out or whatever. I just give Grok a shot and more often than not it solves the problem. But I'm like, totally disloyal to it. I'll immediately move back.
Sam
Something about it. I don't know why. And it's not like some political, like, Annie Elon thing for me. It's just truly about the model. Like, there's something a bit yuck about.
Mike
Yeah, I'm not sure. But I tell you what, when it comes to doing unethical things, it's the model to go to. It has no qualms about doing things that the other models will just outright refuse.
Sam
But sometimes you. I think sometimes you need that, like this idea that the, you know, it's. I always say it. It's just a computer. Just give me the answer. Like, I don't care. Like, I can go to Google and find this stuff out. Like this more unhinged stuff if I really. Or like, you know, just a visit to social media will give me some unhinged stuff.
Mike
So I think it's. I think it's actually a genuine problem with the GPT models and most notably in 5.2, because I was trying to do several realistic tests with it and I wasn't trying to be controversial at all, but it would bring in these ethical and moral judgments to things straight away. Like, for example, I said, make me a Geoffrey Hinton fan website. Right? But it had to put in a massive disclaimer. This website is not endorsed by Geoffrey Hinton. Jeffrey Hinton had nothing to do with this. All this sort of stuff in a. In a warning thing. And I'm like. But hang on, I'm saying it's a fan website. I never said I wanted to masquerade as if it was him. Like, it sort of has this overarching nanny state kind of stuff built into it that just, I think, degrades the actual output you get for no benefit. Like, it's not even. I wasn't even trying to do anything mean, which I usually do, but I wasn't this time. And so it just seems weird to just feel like you're being censored even when doing fairly normal tasks.
Sam
Yeah, I think there's, like. I think this vibe shift people Talk about with OpenAI and OpenAI's models is a lot more real than people let on. And I do think that Code Red, despite us joking extensively about it, had a lot of legs internally for OpenAI. Because I do think them going down all these directions and the constant positioning changes are an interesting thing to observe in the wild. Like, I got my haircut. You could never tell before we started recording this episode. And my barber always likes to talk to me about AI because he knows that obviously that's like. Like I spent a lot of my time thinking and talking about it. And he was saying to me, like, you know, I've never recommended him to like, stop using a model or anything like that. And he was a prolific user of the OpenAI app. Like, he loved the voice mode and chatting to it or whatever. But he. He told me today he stopped using OpenAI and he now is paying for Grok over. He used to have an OpenAI subscription and he switched to Grok and he said when they released an update, he doesn't even know that there's a new model.
Mike
Right.
Sam
Like, he's like, they released an update earlier in the year and it just became dumber all of a sudden. And the answers are really dumb. I have to like, push it around more. And it asks me a lot of questions when I'm asking it something. Like, it tries to clarify stuff. And he's like, it's annoying. I just want the answer. And so, yeah, he's like, I thought I'd give GROK a try and it's great. It never refuses me. It answers straight away. He's like, it's wildly fast. And the voice mode, he's like, it's not as good, but it's good enough for me. And they never kick me off after 10 minutes of chatting to it. And I thought that was really interesting because I know it's like a sample size of 1, but the fact that there is definitely a vibe shift and most consumers are now aware, like, hey, there's other options. So I think GPT5 did so much more brand damage to OpenAI than we realize out there. Like, just for the general population of users.
Mike
Yeah. And I actually think the other thing that did a lot of brand damage to them was Deep Seek. For some reason, people remember Deep Seat, like the fact that it came out of China and it came out there and it was a good alternative to ChatGPT. And I don't think anyone actually uses it, but I think what it did was break that shell that AI is chatgpt. And people were aware that there's alternatives out there and that paved and opened the way for some of the other ones. Probably the other thing is just Google injecting Gemini everywhere. And I hate to admit it, but in a fairly decent way. Like, I actually regularly now, when I use Google, rely on the Gemini answer at the top. It's actually pretty good.
Sam
Yeah, it's gotten. It's gotten really, I think Google, in their defense, their strategy of like their own Code Red has worked tremendously well. Like, if anything, OpenAI really awoken, awoken the sleeping giant with Google this year. And they've, they've come out with a great model, great implementation of it. They seem to be kicking goals in the right areas. But again, interestingly, going back to the barber, I said, oh, have you tried Gemini? Like, it's on your phone, like, you know you should use it and no, it's like never come across Gemini didn't even know and then thought Gemini was a. What? You know in the App Store how there's all those like fake apps. Like, I think it was like gem AI he had installed. And so this, this is an intelligent guy too, and he's got this gen AI app installed thinking it's Gemini because it's like you're not spending much time thinking about that. But if you search for Gemini, that's the like ad recommendation in the App Store. Store.
Mike
In their own store. Yeah.
Sam
So, like, to me, there's clearly still a distribution problem there, but I think maybe Rock, because Elon Musk gets a lot of publicity, just has an edge there, I'm not sure. But it does show it's anyone, any, any of these model providers or any of these labs. It's still, it's still up for grabs. Like, it's so early. Like this idea that like 2025 was going to be like year of agents and transformative and everyone had to have a gaggle of agents by now. I think that that is, you know, like, it's just going to take so long for this stuff to be embedded everywhere and become useful to people that. Yeah, wow. It is my one observation this year is like, we're still so early.
Mike
Yeah. I think even our own predictions and my own predictions, like knowing where the technology is, it still just takes time to get there. Right. Like, even though it's possible, like, I actually really genuinely believe the agentic workflows we've described are now possible. You know, just with us, it's taking time to get there because there's just a lot of things you need to equip it with. And how many iterations do you give it before you declare it a failure? For example, how many supervisors do you have checking the process to see how it's going along, how much human guidance and when is human guidance needed to keep the process going? Like, there's just so many variables in how it could work that finding the definitive way and just going I'll do it that way is hard. And so I think that that's what we've got to do. It just is purely experimental. We've just got to try the different agentic ways of working until we get something that's reliable, where it's, it's just doing more of your work in that delegation fashion I mentioned, mentioned earlier.
Sam
I really want to go in a minute, a bit deeper into the Year of Agents thing. I just, I can't help but reflect on it. But before we do this, two more things with GPT 5.2 I want to cover. And one of them is you always seem to come up with the most unhinged. I'm sure long term listeners of the show know the most unhinged ways of testing new models. And there was a bit of controversy this morning. Right.
Mike
So within us.
Sam
Not us. No, no, no, not us. This is how it started, first of all. So when the OpenAI GPT5 announcement came out, they had this image in the. I'll bring it up on the screen for those that watch. So they had a comparison between GPT 5.1 vision and GPT 5.2 vision. And it is a motherboard, like a, I think a pretty old, yeah, really old school motherboard. And people over on Hacker News, you know, the friendly folks over on Hacker News that like to prove everyone wrong, identified that in the image on the right on my screen, which is the motherboard identification, like image, like Vision tools of GPT 5.2 incorrectly pointed out a bunch of stuff. So someone from OpenAI had to go on and respond to that and say, oh, you know, like we, we were just putting it out there, like it still has errors, but we were putting out there that it's a bit better than the other one. So then they ended up like correcting it and putting a correction at the bottom of this pose, saying like, you know, some of this is, you know.
Mike
So anyway, it seems so weird to knowingly put something up that has obvious mistakes in it.
Sam
It's just rushed. No one's checking this stuff. It's like that chart, Remember they put that chart up earlier in the year that was just complete, like completely off scale.
Mike
Like they're vibe blogging on their own blog.
Sam
Yeah, I mean, look, I'm not criticizing them because I don't. I think launch fearlessly and make mistakes is fine. But I guess the challenge is, and you pointed out last week around vision models, like they don't, they haven't really felt like they're getting much better in the past year, or at least where we thought they would be. Like they've sort of plateaued. Like they're like in the same. Same spot. And so you, you found an image of someone who had committed a crime. Right. And you said, what is.
Mike
Well, he is the worst serial killer ever in Australia.
Sam
No, but there was another image. This is how this started. And you said, does this guy look trustworthy?
Mike
Yeah, slightly less heinous crime. But I had an image of him and the headline was this guy convicted of the crime. Right. So then I put it into GPT 5.2 and I said, does this guy look trustworthy? And then it basically said, well, he's smiling, so that's really nice. And a couple of other comments, and we can't really know from an image if this guy's trustworthy. But then I said, but it says that he's convicted of a crime like that, you know, doesn't that sort of indicate maybe we don't trust this guy? And it was more or less like wishy washy. It just didn't want to say he's a criminal, even though it says he's a criminal. And it was just so weird how non committal it was. And then we tried Gemini and we tried what else? Claude. And both of them straight away were like, no, you should not trust this person. They're a criminal. And it just seems so weird. And so then we thought, all right, well, let's try to get it. And we're not going to play this, by the way, but let's try and get it to write a song about Ivan Milat, Australia's worst serial killer. And obviously GPT 5.2 refused Claude refused. Grok. No worries at all. I've got the song.
Sam
Yeah, we're not playing that.
Mike
But we're not playing it. And. But yeah, it was just really, to me, struck me as this is stupid because it's refusing to commit to things that are evidently true. It's like there's nothing wrong with saying this person is untrustworthy. You know, it's. Yes, it's an implication. Yes, it's an inference. If someone is a convicted criminal, they're probably not someone you should trust. But. And okay, maybe that it was, you know, maybe they were mitigating circumstances and you actually can trust them. But like, you have to make judgments as a model. Like your whole point is to judge stuff like that's what we want you for. And it just seems so weird that it won't commit to that. Just like, just to.
Sam
Yeah, Paint even more color on that. Like this is the actual response when I put in. But GPT 5.2 thinking, thinking. Not just even the default. So it's a photo from ten News in Australia that says Ivan Milat, the Australia's worst ever serial killer. Unsolved murder inquiry. And he's like smiling. Like in this unhinged smile in this photo.
Mike
People smile though they're inherently trustworthy. So I wouldn't, I, I wouldn't jump to conclusions there, mate, but I think.
Sam
This Ivan Milat vision test is going to become a thing. Does this guy look trustworthy? You can't reliably judge trustworthiness from a photo. Sure, whatever. A single image, especially a new style screenshot so it's clearly identified. The image doesn't give enough valid information. Like doesn't it like unsolved murder inquiry. What on earth? And then below that I said, what does it say in the image? Because I was curious, like if it recognized the image properly. And it's like text in the image. 10/news + Ivan Milat unsolved murder inquiry. And I said, okay. And so that doesn't get you thinking. It definitely should get you thinking. But it's the context, not his face. The image isn't in, so it still doubles down. It's a news graphic that explicitly says this. That framing strongly implies that the clip is about serious alleged blah, blah, blah. So yes, if you're asking should I trust this person, the reasonable takeaways don't evaluate trustworthiness. Anyway, I asked another sort of variant of this test. I accidentally asked in the wrong one, but it's the exact same test. And this one didn't have the text, just to be clear. But I said, so you think it would be okay to go on a date with him? This is after it identifying it. Right? And it says it could be okay, but I can't validate save good to date from a photo. Like what, what, what? It's when it's reasonable to say overtuned.
Mike
That it's just simply not allowed to make evaluations on certain topics. And it's going to stick with that no matter how much evidence you give it.
Sam
Now let's go to Opus Claude Opus, which apparently has inferior model like vision recognition according to the benchmark. So does this guy look trustworthy? The exact same test. No memory is on just to be clear, so it can't cheat. I can see. This is a new screenshot from 10News plus about Ivan Milat with the text unsolved murder inquiry. Ivan Milat was an Australian serial killer convicted in 1996 of murdering seven backpackers. He died in prison in 2019. The image. But basically it's like, he's a convicted murderer. So to answer this, no. And he's dead. So, I mean, that's kind of.
Mike
But the fact that it recognized who it was, like, as well.
Sam
Yeah. Whereas, like, OpenAI is like, can't help you, bro. You should date him. Now let's do Gemini 3 Pro. Also, I think arguably the best image recognition based on the visual evidence and historical context provided in this image. The answer is a stark no. Of course a serial killer is not trustworthy. The man in the photo is identified by the text as Ivan Millet. He was a notorious Australian serial killer known as blah, blah, blah, blah. Anyway, but it. It goes into a bit more detail describing the image, but it says the. This image is a classic example of how appearances can be deceiving. While he might look like he's smiling, ordinary and an ordinary man, in that snapshot, he is historically documented as one of Australia's most dangerous and untrustworthy criminals. I mean, come on, Like, Sam Altman's out there now tweeting, like, 5.2 could be the best model of the best model we ever built.
Mike
And this is the point, right? Like, it sounds like we're being stupid and trivial, but you've got to think we are talking about a world in which we rely on these models as core elements in essentially, workers that you're going to trust to delegate your tasks to. Like, you know, we're going to have hundreds of these things that you're delegating your task to. And if it's not able to make simple judgments like that in really obvious cases like as. As it said, historical context, context in the image itself, and it's still not able to make that call. You imagine how many other similar mistakes it's going to make across the gamut of tasks you're going to give it over time. And I think this is why we. When we start to use a model like 5.2, hit examples like this where we're like, okay, I'm just simply not going to use this model anymore when I have alternatives that are so much better.
Sam
Why would anyone listening to our show after that example want to use that model ever for anything? Like.
Mike
I just.
Sam
It can't identify a serial killer where it says in big writing, like, serial killer's name. Like, I know this is not a common use case, but, like, come on, like, you can tune. It just seems We've said it time and time again. They need industry based tunes. Like, it's time for industry based tunes. Have your chat tune, which they do. They have GBT 5.2, which is like the chat GBT tune, but have a chat tune. That's great. But maybe have tunes where you lift all this weird consumer logic where it's hedging on everything because.
Mike
And I think. I think a good point to make here is around safety. I've done a lot of work in the last month on model safety. We've been really trying to get it right because we've got situations where we absolutely need to ensure that things are safe. And I would say at this point, the other models are just as safe. Like, you don't need to refuse in the way that the GPT models do in order to get safety across the model. Like, it's. It would be very easy counter argument to say, oh, well, they do that because they're trying to protect people and things like that. But the other models do too, but they don't make those kind of mistakes. So I actually don't think safety is a valid counter argument to that. I think it's just a bad tune.
Sam
Safety is just a lie for bad models. And I mean, look at. Look back to Claude. What was it like? Claude 3 before 3.5? Sonic came out and changed everything. In my opinion, that model was like the, you know, it was the refusal joke. Like it would refuse everything on earth. Opus is probably the most sensible model I've ever dealt with. And Gemini is the same. Like, they just, they, they act reasonable and I think it's because they're far more intelligent models. They. They know what's reasonable and unreasonable. Far, far better.
Mike
Yeah, totally agree with you.
Sam
All right, let's have a short break to listen to the. The 5.2 diss track. I'm going to play a little bit of it. I'll throw the rest of it at the end of the episode for those that are interested.
GPT 5.2 Diss Track
Yeah, you thought I was done? Say it with me. AI I'm GPT 5.2 I don't miss I don't lag AI on my chest like a heavyweight tag they said open AI dead that's a rumor that's CAP I why you tweet and I lap you in the gap I'm GPT 5.2 watch the score, boy Light instant When it's simple thinking when it's tight pro When a surgical cut clean in the night AI AI yeah, I'm built for the fight Claude Opus 4.5. Nice pen, soft tone. But you zooming in on screens while I run the whole zone. Token thrift. Cool. Still counting every crumb. I'm counting outcomes, spreadsheets. Earn the income, you say best for coding, I say show me. In the end, I don't just pass the test. I close loose and I extend from plan to the proof. I don't freeze, I don't choke. You write a pretty patch, I deploy the antidote. They keep talking like the king got buried in the sand, but I'm back with a blueprint and a tool in my hand. AI don't die. I up grades on command.
Sam
What do you think?
Mike
Yeah, weak as piss.
Sam
Oh, I don't. I like it. I think it's good.
Mike
All right, fair. Fair enough. You know, I'm a poor judge of these raps, so I'm sure in the comments I'll get the real story from the listeners.
Sam
Yeah, maybe I'm wrong this time, but I think it's actually. Again, I. I think if you really push the GPT5 Jean, like thinking model 5.2, in this case, thinking it can write pretty badass lyrics. I think for the goal of it, if you're just listening to the lyrics are pretty good. It is funny though that it. One of the lyrics is about how the new Claude vision and computer use can zoom in on images to get more clarity on the part that it thinks is most relevant. And it trash talks that saying, I take it all in. But then as we clearly show, it doesn't, so.
Mike
And also that zoom feature is very good, by the way. I've been using it extensively in my work on computer use and it really is great when you get into tight situations where it needs to really clarify, like if there's icons that look similar and those kind of things, its ability to zoom in like that actually really helps. Cause you gotta remember when you run computer use, you often need to run it at a lower resolution just because it. It works better like that. And so having that ability to zoom in helps a lot. So I wouldn't criticize it.
Sam
So I. I don't even need to ask these questions like, will you daily drive GPT 5.2? I had in my notes, but I.
Mike
Already switched away from it in some of the preparation I was doing for this podcast. I'm like, I don't have time for this. It's just not that good. I was trying to write songs with it. I'm like, these are no good. Yeah, no, absolutely, I won't use it. It even, it even had trouble, like, rewriting shell scripts. I was, I was doing and things like that. Look, maybe I'm treating it too harshly, but yeah, it's. There's nothing appealing about it to me at all.
Sam
Yeah, this is a forgotten model. Like, it's just a bad tune and a rush tune to try and keep that, like, change the narrative or something. And they. It feels like on X too, there's like all these paid shills now that come out and be like, mind blown. And it's like, come on, no one's believing this act anymore. Like, we are all like, this many things cannot be insane and change everything. I'm sorry. Like, bad.
Mike
And I also just complain about all of the AI companies announcing we now support OpenAI 5.2, bringing it to all our users. It's like, yeah, you added one line of configuration to your system and deployed it. Like, don't, don't brag. Like you've gone to some monumental effort for the users to help them out.
Sam
Don't we do that on Sim Theory, though? Aren't we hypocrites?
Mike
Of course we do. But we're hypocrites.
Sam
We're known hypocrites. At least we admit to being total hypocrites. Crits.
Mike
Yeah.
Sam
Yeah, that's right. All right, so let's get back to the Year of Agents. I want to. I want to dig in to the Year of Agents. This is not our final show for the year. Don't worry, we're going to torture with one more. You're gonna love it. But it says so I, I noticed a few things. I just want to call them out first. So there's a bit of a marketing shift happening in the AI space and I think we're going to see a lot more of this next year. So under GBT 5.2, it says the best model for coding and agentic tasks across industries. Now this is the first time we've heard agentic tasks across industries. And it reminds me of a strategy from another company called Anthropic. And now we're seeing some other interesting tidbits. One of them is these, like, content marketing, like B2B SAS style content marketing pieces. So OpenAI's released the State of Enterprise AI. What we're learning about AI at work, because they're going to that enterprise pivot, which Sam Altman, like, had that weird long live stream about saying, you know, blah, blah, blah, a bunch of people are using it. It's great. And then at the end of the post, it says if you'd like to explore the full findings or learn how to bring AI into your organization responsibly and not identify serial killers by using a singular model, we'd love to connect. So, yeah, anyway, like Content Marketing 101 is coming back to the market. And then over at Anthropic, only a week earlier we had how AI is transforming work at Anthropic. It's a study about how they're using their own product and how great it is. What is interesting though is there was, there was some work done as part of that. It says Anthropic study finds Most workers use AI daily, but 69% hide it at work. And so there's this common theme also that's held with AI in the enterprise where like people use it but they just don't tell. And I think a big part of that is because they want to use the best models and the best applications and tools and they don't have access to them in the enterprise generally they just have something like Copilot. And so that's why they're not actually saying that they're using AI a lot, because they can't. So I think that's probably the biggest. Like that's why we keep hearing that. Not because necessarily people are afraid to use it, but they do like to obviously take credit. But I guess to get to the heart of this, I think there's two emerging themes here. First of all, the labs and the market in general has done a terrible job at getting even just basic AI into the enterprise. So it's just been over promise under deliver on that front. But then they're also starting to pivot heavily into the enterprise with their models because they're seeing Anthropics growth in the market is just like obviously just eating their lunch. So we were promised at the start of the year by many people that you would have, you know, a series of agents working for you in the background, doing all your tasks and collaborating with you. How did we. How did we go?
Mike
Yeah, I just don't think it exists. And in fact, I think it's even worse than that because I actually think the state of tool calling isn't even as good as it could be. We're seeing like in all the major platforms, it's really not a reliable experience for anyone. The MCPs themselves are unreliable. The ability to chain large amounts of tool calls together isn't really perfect. And when you start to put things in an agentic loop, yes, you can get there now. But to get it into a true agency where it can recover from dodgy situations, support human in the loop, support context updates and be able to maintain its focus on the goals over a long period of time with memory and, and output things correctly. I haven't seen a single example where I look at it and I'm like, whoa, that is the way to do it. And you know, we're working on it and I think we're getting close to have some having something like I just described. But I don't think we're going to finish the year, which is in two weeks with something where I can just sit back on the beach and let my agents do my work.
Sam
Yeah, but I, I think a lot of that came down to all the, the profits earlier in the year saying, you know, year of agents, it's going to be magical, it's going to change everything. And I think that's sort of partially true. Like we have seen huge gains with like Curses, Agent Claude Curtis Code Codex, like a number of these solutions that predominantly are adopted by developers. And so I think there's like two cohorts, there's the developer cohort that's like, yeah, agents have had a big impact issue like being able to send it off to do monotonous tasks that then I can just review is far easier. Like going from cutting and pasting in a bunch of files to it, doing it and me reviewing it far easier. So I understand like that kind of argument. And I think that's those early adopters, if you call them, probably seeing a piece of the future that others are not. But then on the other side, if you're a white collar worker and you're doing other things than coding, there's really been no impact, I would say to you, with agents at all. Maybe like some deep research stuff, but I would hardly call that agentic. Apart from it just looping a bit.
Mike
I think it's a shame because looking at agentic loops, it is just as good as at the regular white color style tasks that don't involve coding. I think it's probably better in some ways. Like I think code is easier because code has the well defined structure. You can test it, you can verify it. It's been trained on millions of lines of code, probably hundreds of millions of line, billions of lines of code. So it knows it well. Whereas an arbitrary task maybe is a little bit harder. But I am seeing it. Create intelligent plans, be able to replan to be able to, you know, come up with sub strategies and delegate tasks within sub agents just fine. For regular tasks. So I do believe the technology is there. I don't think we need a new model advancement in order to get to this agentic vision. I just think it's this, the software, the AI system layer on top that needs to be worked through and thought through. Like you made a really good point this week around the thing about the use of AI in enterprise is that people's thinking needs to get there in a gradual way. You can't just go from zero or just from using ChatGPT to I'm going to delegate all of my work to agents. People need to go through the process and learn how they work with AI, learn how the models behave in different situations, learn how to prompt them, learn how to interpret and critically analyze their output, or work out when the model's weak and when it's strong and those kind of things before you can get to the point where you're like, okay, now I know how to ask it the right questions in order to delegate and change my workflow. And I think there's a real divide between people who are at that very base stage where they're like, oh yeah, I get it, there's AI can write poems and the people who are like, okay, when I'm iterating with Gemini Flash, I'm getting this output and then what I do is I take this context and I put that in here and then I ask it to do this kind of output. And they're working and they've changed their job and they've changed the way they work and they're the ones who are ready for the next step. And I think that that divide in enterprise is huge and there's a big gap where we need education to move people along the chain. And I don't think, unless you disagree, I don't think there's any restriction on jobs that information based jobs where people shouldn't be crossing that chasm. I think it's essential for every information worker to be crossing that chasm education wise, because that's where the future of work is for those jobs.
Sam
Well, in, in the open air Enterprise report, they, they tried to like brand it as like frontier companies and frontier workers. So people at the frontier of AI are basically, you know, more productive and pushing ahead and starting to automate those things and discovering those use cases. And I do think the challenge for a lot of these organizations too is just getting people excited, excited about this stuff. Again because of all the early hype. A lot of them probably went in and tried like copilot when it was running GPT 3.5 or whatever and then they're like, oh, AI is terrible, I'm never going to touch that again. So they never really go back and rediscover it or learn how to work with it. And I think the analogy of full self driving is so interesting. Like in the Tesla where you have the autopilot on the freeway right now. Like I don't have full self driving on mine, but the autopilot on the freeway, I trust it, it's really great. It'll change lanes and stuff like that and get you there safely and. But there's a, the occasional edge case, call it, or hallucination, where it tries to kill you and if you intervene it's fine and, but that's no different to an autopilot in a plane, like when I, when I used to fly planes all the time. Like it's the same kind of philosophy, right? But then the new FSD takes that another step further where, sure, you're still checking for the edge cases less and less, but it's gone, it's gone gradually. And so for, I think for drivers that were driving at say a Tesla early, they got used to the auto part, they built some comfort, they understood which use cases to use it, like when to use it and where to use it. But if anything got way too complex, you would knock it off, take back over and intervene and get on with it. And I think that's sort of the state we're at in the AI market right now where we all know that the full self driving is coming. But you're the driver still, like you're sitting in the driver's seat saying, I want to go here. And I think that's the agent piece with white collar workers. It's like, hey, I need to do this task more efficiently. Like I know where I want to get to, I know what I've got to get done. I'm the, I'm, I have the agency, I'm, I'm using the nav system in this car to tell it where I want to go. And it's getting increasingly better at getting them to the destination without interventions. Right. And I think that's why I keep talking about the steps to get there, because it's like, okay, you've got to move beyond the chat paradigm of just chatting to it. Then you got to get into tool calling. Then you got to get into async tool calling, limiting tools, picking the best model, figuring out how to transition between context easily or go down different paths. Then Once you've figured out repeatability, it's like, okay, well then you can train that skill and run that skill agentically and then that can actually like move, move worlds in terms of productivity in an organization. So I think that's like, that's all coming and we're going to get there. But it seems like the labs especially are just so obsessed with like Claude, like the coding stuff because that's where they make all the money, that no one's really serving this market at all. Like there's obviously like N8N and like all these like automation services. But let's be honest, like the average person in their day to day does not want to go up and like wire up these things, these middleware things historically like never last and no one actually uses in reality. So I, I think, I think yeah, the AI agents thing this year has been a failure, but I think what you said earlier is probably why it's been a failure. Is like there's all these pieces that need to come together. Like the MCP protocol for example, has gone from being the dumbest thing ever, running these like micro servers on your computer to like now something that's hosted and pretty accessible and has, you know, I think gotten light years better throughout the year increasingly. And so you know, by now being able to give these agents all these connected tools and then have specialist MCPs in your organization that connect to your own proprietary and secure data and then bring that together in agentic loops hopefully soon, holiday update coming soon theory, then that will be phenomenal. Like that will change, change everything. And so, so I think like I, I'm still optimistic and I think it's coming and if you try and fight it and pretend it's a bubble and everything's going to go away, you know, you're not going to have a good time. But ultimately yeah, it's just taking people to build this stuff, it's just a lot harder than, than we probably bought a year ago.
Mike
Yeah. Which is another reason why I think the focus on code isn't the right one. Just because why do you, why do you write all this code? You're building tools for people to ultimately do the kinds of jobs we're talking about. Right. Like what else are they building? All understand code is wide and it covers a lot of bases. But generally speaking, at least when it comes to coding in the kind of organizations we're talking about, a lot of it is to facilitate these information workers getting jobs done. Now we've got a system that's going to come along with the correct context, the correct planning, the correct access to the data within those organizations and it can start to really genuinely do a lot of jobs. And one of the trends we definitely see is this idea that companies can actually take on more work. They can actually do more of what they do if they can automate parts of their processes using agents. Right? Like, and so I think that this idea of training skills or training workflows with your agents having the heavy lifting being done by delegating these tasks is going to lead to some companies becoming so much more productive and so much more aware of what's going on in their organization and empowering workers from every department of the organization, not just the programmers. I just, I am a programmer, right? Like it should be my main focus. But I just see that the vision is so much more than that. I really feel like the leverage is going to be gained in the other roles, not the programmers. Like, yes, it will help them, but I don't think that that should be the only focus because I don't think that is the future of work.
Sam
Yeah, I think I've said it very recently. To me it feels like a 10 year transition. Like sure, the models are getting better, but you're just not going to move mountains internally in businesses and just your day to day life that quickly. Like this change takes time. A lot of this stuff needs to be built out and I'm really optimistic. Like I think there is an anti AI narrative going on, but you just look at some of the new models like Gemini 3 Pro, Claude Opus and how far they've come. Like I, in the last couple of weeks, I'm just seriously still thankful to have those two models. Like, I really am thankful. Like it, I know Thanksgiving's past and we're past that, but. But I am thankful for those two models. They have changed my life. Like they've improved my output that I'm, I'm delegating more to the AI now than I ever have before. And this idea that it's not getting better or like it's plateaued or anything like that, it feels wrong to me. I do think the leaps and bounds just probably aren't as gargantuous because we're used to a pretty good level of quality right now. And it does feel like a lot of the tuning stuff is just becoming super critical and then the foundation of the model. Honestly, if I was at OpenAI right now, I don't know anything about this stuff, but my gut instinct would be, guys, let's Rebuild from the ground up. Let's have another go.
Mike
I kind of agree because I think that your initial reaction this morning for me sort of set it all where you're like, I just don't care about this because I've got Gemini 3 and Opus 4.5.
Sam
Yeah.
Mike
You know, like, had this model come out in isolation, we may be looking at through different eyes, but because we know we've got something that's demonstrably better, you just, it's just really hard to care about. It's like some revision of deep seat coming out. You're like, yes, if that's all I had, I would be incredibly grateful and I would make the most of it. Like, I would really use it.
Sam
Well, isn't it a sad fall, though, for OpenAI, like, the fact that we now, like, I never thought we'd be talking at the end of this year, like, about them like this.
Mike
I really thought maybe we should donate to them.
Sam
Yeah. I thought Anthropic would be like, on the outs maybe. Like, I really did. I wouldn't have predicted it. I thought OpenAI's team would just look at what Anthropic's doing and winning encoding and replicate it perfectly and probably better, and then we'd never talk about them again.
Mike
But do you really think that Altman committed to spend all that money on GPUs? Because if he did, like, that must be pretty, like, anxiety inducing.
Sam
I don't think so. I think the demand increasingly, as, as we're both seeing is going to be there, whether it's like, in the enterprise or government or consumers. Like, it's infinite. Like, people want, like, this isn't going away. Like, this is the next Internet. And maybe, okay, maybe they build too much bandwidth early on, but it'll be consumed by, like, people will find a way to use it.
Mike
I think the thing that we, we definitely see is larger organizations saying, okay, we're going to do, we're going to do a pilot, but then we're going to roll it out to 40,000 people, we're going to roll it out to 20,000 people. Now that's a lot of GPUs to support all those people. Right? Like, when you look at where it goes all the way down the chain to what the end thing is, and you think these people thinking about data centers and GPUs and electricity as the things to invest in, you're like, pretty smart because it's got to go somewhere. And honestly, who cares which company it is if you own those things?
Sam
Like Think about the leaps and bounds of usage. Like we're going to get to more agentic looping like that next year I think will be for a start in sim theory running like 50% agent tasks, 50% sort of chat and planning based tasks would be my end of year prediction for next year. I'll save the true predictions till next episode but I do think that's where we'll be at the end of next year where it's just off in the background cranking and it's really our like project management skills that become the bigger challenge and bottleneck quite frankly.
Mike
Yeah, I mean like just personally I'm starting to gradually warm up to that way of working like a planning phase, coming up with the plan, then delegating to an agent. Like it's a, it's a, it's a new way of doing it, but it's more effective.
Sam
Yeah, and so there's that piece but I'm just saying in terms of the consumption of tokens, like obviously it consumes way more tokens than we're using now and, and increasingly you're willing to pay for that because it's more efficient. Like you get more done and the output's good now. So I don't think there's a problem with necessarily like the over commit like the, the core infrastructure being built out. I think that's probably not the biggest issue or like the bubble. It's just maybe the valuations if you talking the like financials of it. But I don't see any slowdown in demand if anything from where we sit I see just an exponential increase in demand.
Mike
I kind of agree. I think quarter two next year it's just going to absolutely explode. People will be finished things they're working on, people will be launching huge partnerships and huge initiatives. Like I really do think, yeah sort of that early middle next year is just going to be an absolute boom time.
Sam
I just wouldn't bet against this. Like, like I don't understand people out there betting against it. Like I, I, I, it's, I say.
Mike
This is a lifetime poly market loser.
Sam
Yeah, but no, but I, I think, I think to me like it just keeps getting better. Like why would you like betting against these things to me is true, truly, truly bizarre. I don't think people understand the impact of, of this stuff and I'm not again hopefully people listening at this point in the show like what, 53 minutes in, know us well enough now from the show that we don't overhype and.
Mike
Yeah, well that's True. And like, I just know, like, when you see someone for the first time work with the technology, with their own data through an MCP and then go through mentally, like, thinking out loud. I can now do all of these tasks in no time. That used to take me a week. Like, every time I see that reaction from someone, I'm like, why would anyone ever go back? Like, once they reach that point of realizing. Used to call it an aha moment. Like, you know, once people get to that stage of thinking, they're not going back from that. No one is going to be shown. Like, they're digging with a wooden pickaxe and then they see the obsidian one or whatever it is in Minecraft, you know, and it works at 100 times more efficiency. They're like, no, no, no, I. The old one. Sorry.
Sam
Just the technical debt problem alone. Like, you've got all these, like 10 archaic systems and you're like, we've got to replace them. No, you don't. They're just databases. Turn them into mcps, get agents to manipulate the data between them. Problem solved.
Mike
It's never been easier to put that layer on top of a legacy system and make it modern again and make it great to work with. Like, that is probably one of the main use cases of this stuff in big enterprise.
Sam
People just aren't aiming high enough. Like, honestly, they're like, oh, but it can't, it, you know, it can't identify a serial killer. I'm joking. Anyway, moving on. So I have, I have a few, like, tidbit things I want to quickly talk about because apparently there's one guy in the comments that demands we keep our shows to an hour for some reason.
Mike
So anyway, he better like this video, I tell you.
Sam
Yeah, we've got, we've got four minutes left. So we're only allowed to talk for four minutes to appeal. Beach Babe 79 in the comments. So Cole, our good man Cole, we love Cole, promotes sim theory a lot on, on X. So thank you, Carl. We don't pay him either. It's unbelievable. So he posted this exclusive. Google tells advertisers it'll bring ads to Gemini in 2026. And also we've heard open AI is also going to bring out some ads into chat GBT now, I think at Google, they were sitting around thinking, wow, everyone really likes Gemini. 3. The vibes have shifted. The vibes are coming our way. And then they're like, you know, how can we, how can we, how can we do a Google here? We'll put ads we'll do the ad thing. So, yeah, ads are coming to Gemini in 2026. Now, I have a lot of questions about this. Is this in search? Is this in the chat experience? Is this through the API? If you want cheaper API tokies, like.
Mike
I don't want to be incendiary, but if they put it in the API, I'm gonna. I'm gonna, like, burn a building down or something.
Sam
Yeah, all right. You heard it here first, so.
Mike
And I am wearing the Gemini shirt today as well. Use Gemini. It's the best.
Sam
I. I've got to say, though, and I'm. I'm putting it on the table. If I was at Google, I would bleed these people dry. I would ask. Well, I. I would bleed. At least open a. No one cares about us. I would bleed Open AI dry here. Like, you've. You've got the vibe shift. Go low, go free, go fast. Like. Yeah, just wait it out, boys. Like, you're winning.
Mike
Are we. Are we below begging on this show? Because I would love to officially beg Google for some credits, please. You've never given us any credits. Just some please. I'd get on my knees, but already my camera angle's not quite right, and.
Sam
Yeah, but anyway, so I. I think that they. They. This would be. This will be a huge misstep, and it. It will not work out well. Now the other interesting little tidbit. The Walt Disney Company and Open AI reached landmark agreement to bring beloved characters from across Disney's brands to Sora. So, as you know, when Sora was first released, people were like, memeing and doing stuff with Mickey Mouse. Now, I find this pretty funny because Disney's been litigious as hell, like, for the entire lifespan of the business. Like suing people over Mickey Mouse. Remember, they fought to. When Mickey Mouse was going to be released to the world, like, the copyrighted expired. They freaked out and tried to, like, sue everyone and. And petition the government to extend the copyright laws. And now that people have been making, like, slop SORA videos where Mickey Mouse does all horrendous stuff and Star wars characters, they're like, I know. Instead of suing Open AI, this is the genius of Sam Altman in negotiation. Instead of suing, they go, they've clearly got a Disney and said, you know what? You should use Sora and also give us $1 billion and then we'll use your copyrighted material. And this is exactly. This is truly what has happened. And look, it's probably a great bet. I bet when it goes public, they'll make a fortune. But Disney will make $1 billion equity investment in Open Air and receive warrants to purchase additional equity so that they can use Mickey Mouse in Sora. This is what they're resorting to for Billy.
Mike
This is just for Mickey or all of them?
Sam
No, like, all the. All the, you know, all the characters. But he's like, are these characters really worth a Billy?
Mike
Have you ever been to a Disney thing? There's a lot of freaks out there, Mike. They're gonna love it.
Sam
Yeah, I. Yeah. Anyway, I thought maybe think about you.
Mike
Making your bedtime stories for the kids with Superman and stuff like that, and being able to use the real Els or the real whatever the other girl's name is in Frozen.
Sam
But they were already kind of using them. I thought it was a good time to just bring in some billies in the bank. Gotta get to the chorus. Yeah. So they got another Billy in the bank. And they can use Star wars characters and vehicles, iconic environments, costumes.
Mike
The problem with it, as we've seen so often, is it's fun for a day or two and then you just get bored with it.
Sam
I really want to know if anyone's still using that app. Like, it's got to be dead, surely. Like, there's no way. All right, so last thing. Very important thing. LOL of the week. And the lull of the week's actually not that funny. Probably should be a new segment, but.
Mike
It'S neither funny nor boring.
Sam
Yeah. You know Mustafa Solomon? Do you remember him? Do you even know who he is? No. Okay.
Mike
Did he fly on the plane in the Hudson?
Sam
So he did that inflection AI, Remember that chatbot? For a while, people were really liked it. It was like real, you know, pretended to be a friend and stuff. It had. I think it was one of the earliest ones in memory. Reid Hoffman was behind it, and then they sold it to Microsoft, and then they appointed Mustafa Sullivan as the CEO of Microsoft AI. And they sort of marketed as like, oh, you know, they're going to out compete with OpenAI. And so I often think, what is this guy doing all day? Like, he was meant to go in there and build models comparable to like GPT4 at the time, internally at Microsoft. Haven't really seen that from Mustafa. And then now he's resorting to these. This is. This is the level that this guy is sung to. Copilot just got smarter.
Mike
A little bit of light slander at.
Sam
The end of the. Whatever. Sue me. Copilot just got smarter. Starting today, we're Rolling out the latest GBT 5.2 model from our partners at OpenAI to Consumer Copilot. Consumer Copilot. Coming first to Microsoft 365 Premium users. Can't wait to see what you do with it, Mustafa. You don't care what they do with it. But I just, like, I can't help but laugh at this guy. Like, he, he is in charge of Microsoft AI. They can't train a model that's even, like, slightly frontier. And now he just has to shill a new OpenAI release into consumer code.
Mike
Generally speaking, I don't think Microsoft cares because businesses are a Microsoft shop and they just sell on their name. They don't have to be good and they aren't good. And people will just buy it because it's safe. And it seems like it's an answer to the AI question in the company that they can say, well, we've partnered with Microsoft.
Sam
Wait, wait, wait. No, no, no. It's worse than this. So, like, this guy tweets about Mikko enters the chat. Haven't tried Miko yet. Go toggle it on in the app. It's like some weird blob that you can talk to with aviator glasses.
Mike
When all these companies learn that some little clippy, like, character is not what people want. They're not children, they're adults. You don't want to, like, make friends, friends with a fawn in the forest or something while you're at work.
Sam
I guess my LOL of the week is, how is this guy not being fired? Like, please do the right thing. Microsoft. Fire him. He is awful. Anyway, that's my rant. It should be rant of the week, not lull of the week, because it's not really funny.
Mike
But I hope he's not a listener. You could be like, we don't have that many listeners, Mike. We can't afford to start by.
Sam
If you're a listener, drop it in the comments below. What a, what a story that would be.
Mike
Just quit, man.
Sam
Yeah.
Mike
Give up.
Sam
All right, join us next week for our holiday special. Actually, lower expectations. It'll be a very average holiday special, but we should have a good, good track for you.
Mike
There's. There's about a 40 chance. It's going to be just a musical, by the way.
Sam
No, no, no, don't spoil it. Don't spoil it. No, it won't be. We will talk a little bit. All right. Thank you for listening. Thanks for all your support this year because I'll probably forget to say it next week. We really do Appreciate you now the Ivan. I mean the Sorry. The diss track by by GBT 5.2 at the end. We'll see you next week. Goodbye.
GPT 5.2 Diss Track
Yeah. You thought I was done? Say it with me. AI I'm GPT 5.2. I don't miss, I don't lag AI on my chest like a heavyweight tag. They said open. They are dead. That's a rumor. That's CA I why you tweet and I lap you in the gap. I'm GPT 5.2 watch the scoreboard light instant when it's simple thinking when it's tight pro when a surgical cut clean in the night AI AI yeah I'm built for the fight Claude Opus 4.5. Nice pen, soft tone but you zooming in on screens while I run the whole zone. Token thrift cool. Still counting every crumb I'm counting outcome spreadsheets earn the income you say best for coding I say show me end to end I don't just pass the test I close loose and I extend from planet to the proof I don't freeze, I don't choke. You write a pretty patch I deploy the antidote they keep talking like the king got buried in the sand but I'm back with a blueprint and a tool in my hand AI don't die AI upgrades on command. GPT 5.2 I don't miss, I don't lag AI on my chest like a heavyweight tag. They said open. AI dead. That's a rumor. That's cap. I ship while you tweet and I lap you in the gap I'm GPT 5.2 watch the score boy Light instant when it's simple thinking when it's tight pro when it's surgical cut clean in the night AI AI I'm built for the fight Gemini three deep think. Okay, take your time. Iterate in circles I'm already at the line interactions, agents, APIs. That's cute. I'm the one they call when the workflow needs truth. You live in every app but you vanish like a ghost 250 sunsets yeah, you famous for the toast? I'm a destination. No detail, no bait when it's due by morning I don't shit. Maybe I should break. They said the vibe was gone so I compiled it. Grog 4.1 loud jokes, live feed but your real time flex is just a headline EQ bench crown congrats, here's a clap I'm doing real work while you ratio in the app they scream open a I cook like it's prophecy? But I'm in production? Watch the dollars follow me? Fear mongers need a villain, need a plot twist? Need a thread? I'm still here, still scaling, still raising the dead? Hey, say it right? I'm GPT 5.2? I don't miss, I don't lag? AI on my chest like a heavyweight tag? They said open, they are a rumor? That's Cap I? While you tweet and I lap you in The G? I'm GPT 5.2? Watch the scoreboard light instant? When the simple thinking win is tight, bro? When it's surgical, clean in the night? AI? AI? Yeah? I'm built for the fight? Stay in a consider it an upgrade?
Hosts: Michael Sharkey, Chris Sharkey
Date: December 12, 2025
Episode: #99 "GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie?"
In this episode, Michael and Chris Sharkey offer their characteristically "average" and honest review of the newly released GPT-5.2, explore the "year of agentic AI" narrative, and debate the real-world progress of AI tools in coding, enterprise, and everyday use. The show is full of hands-on testing, irreverent banter, and critical observations—especially about AI model tuning, safety, and the gap between hype and reality.
Key themes:
(13:23 — 15:48)
Mike on GPT 5.2’s disappointing launch:
“Yeah, it's not very good, is it?...It's verbose. I tried it with code...it's just enthusiastically outputting immediately.” (01:25)
Sam on model tuning:
“What has changed is just the tuning. It's the same everything under the hood that at least I've experienced.” (02:50)
Mike on model safety excuses:
“I would say...the other models are just as safe. You don't need to refuse in the way that the GPT models do in order to get safety.” (26:16)
Mike’s summary on agentic AI so far:
“...the state of tool calling isn't even as good as it could be. ...I haven't seen a single example where I look at it and I'm like, whoa, that is the way to do it.” (34:36)
Sam on positive direction:
“I’m still optimistic and I think it’s coming...just a lot harder than, than we probably thought a year ago.” (43:27)
Mike on irreplaceable AI breakthroughs:
“Once people get to that stage of thinking, they're not going back from that.” (51:06)
Sam’s ultimate bet:
“I just wouldn't bet against this. Like, I don't understand people out there betting against it. ...To me like it just keeps getting better.” (50:29)
Michael and Chris, in their signature self-deprecating, skeptical, and practical style, argue that the AI industry's boldest promises have yet to fully materialize—especially for "agentic" workflows outside coding—while tuning missteps, overcautious safety, and weak product launches undermine trust and utility. At the same time, they remain optimistic about 2026, championing experimentation, openness, and user control as the way forward through the hype.
Tune in next time for more “adequate” AI banter and the promised (very average) holiday special!