
Loading summary
A
GPT 5.5 has arrived. OpenAI's new flagship model has officially entered the chat.
B
Smarter, faster, cheaper, thinkier. GPT5 is better on long term tasks. Begins a new stage of iterative learning, which means much faster rollouts and might just make us lazy as hell.
A
Yeah, the updates are causing that.
B
Before previously, a lot of my prompts had to be very detailed or very instruction y, kind of. Whereas with GPT 5.5, sometimes I become lazy and I kind of give it a very ambiguous task. But then it will figure it out.
A
We will show you some incredible projects that people have already built with 5.5 and unveil Gavin's latest animal deathmatch arena thing, which we haven't even looked at yet. That's right.
B
There are new crazy ways that you can integrate GPT Image 2, the image model, into Codex into the new model. And I did it and I'm gonna show you all right here. It's a ton of fun. And it's GPT 5.5 day. And this is AI for humans. That's my. That's my brain dying at this point. I got a little Superman curl.
A
Oh, that looks good. Yeah, it doesn't look like a tapeworm at all.
B
Welcome, everybody, to AI for Humans, your twice a week guide to the world of AI news. And Kevin, what a week. What a crazy month we have had. The AI world continues to kind of get nuts again and again and it's not going to slow down anytime soon. We have a new flagship model. Finally, Spud, the giant potato has landed.
A
Okay, Gavin, hold on a second. Where I go to my Codex and I click check for updates and it's okay. Sorry, hold on, let me. It's not there. Let me go to my chat. Hold on. Let me go to my chat GPT app. Let me go to my chat GPT app. A new file.
B
Check for updates.
A
Ed, it's. It's not. It's not there. It's not. I have an enterprise account and a pro account.
B
Some of us can't. Some of us can't be as early as others. Kevin, I did get access. I do have access.
A
Kevin, you found a magazine on your brother's. On your brother's floor under his bed.
B
Yes, I did. I did. It's rolling out to everybody today. I got it. Kevin does not have it yet. We are recording this on Thursday Dying. But it is a very cool new model. We need to dive in and really talk through some of this stuff. Kevin, I think the big thing that I was expecting from the get go with this was like there's a lot of hype around this. There's a lot of vague posting, as they say in the AI world. We saw the mythos, like kind of mythical benchmarks, the model that Anthropic says is too dangerous for anybody to use. And, and then we saw Opus 4. 7 come out last week. So this is kind of OpenAI's answer to that. Now, just the basics right now, Some very important things to know and we're going to dive into the specifics. The number one thing that they are touting here is that this is much more reliable on long running tasks. And later on the show I'm going to talk about a thing that I just did an hour ago that I literally only had to kind of input a couple things and I got something useful out of it and it ran for about an hour. The other thing that I have seen a lot of people talk about is that it thinks more for cheaper and better. And I know that's kind of a lot to unpack there, but one of the things that was going on with both the move from 4.6 to 4.7 on Opus was this idea that they were going to try to find a way to control the token costs and that it was going to get better thinking. And I'm curious to know what you think about that kind of idea now that we're at this place where things are getting better. But also these companies are a little bit trying to maybe control their costs on the other side.
A
Well, I mean they need to control costs for the users, obviously they need to do it for themselves. But you know, the, the meme was the GPT 5.5 nail in anthropic's coffin and people were, you know, posting and reposting that and sharing that. Because with the latest 4.7 opus there was a whole bunch of users felt regressions, right? It was, it was costing more. Their limits were being eaten up and 4.7 was supposed to help with that. So when you're running these long term agents that need to spawn sub agents and go out and read documentation and explore the web and write things and test things, all of that takes compute, it takes tokens. And so it behooves the companies that serve this in some way to make that more. Behooves it does. It behooves them if they are, if they are minotaurs, if they are horse. Horse. Like this is. Yeah, this they behooved, they behave in their best interest. It's in some ways in Their best interest to make these models more efficient.
B
Right.
A
Because they can serve more. They can serve it faster. On the other hand, in some ways, they're not incentivized because the more tokens these models take, the more the end user has to pay. So it's this delicate balance of trying to extract as much from the end users and their corporate bank accounts as they can while not extracting too much that people say, hey, Anthropic, we're done with you. We're now leaping to OpenAI. So this has been a huge issue. In fact, by the way, like, not to get too in the weeds, but about 30 minutes after 5.5 was announced, anthropic posted a big. Hey, our bad.
B
Y' all said, oh, they did.
A
I didn't see this. Yeah, they said, Claude Code was getting rough. A bunch of engineers that I chat with were like, hey, look at this. They basically found three major issues. So instead of gaslighting users, they said, actually, yeah, you're right, we had some issues on. We're going to reset all of your limits. Even though some of you might have already paid out the nose because of these errors. I digress. Let's talk. It's 5.5 today. Let's give OpenAI some flowers. Because, yes, this model should be more optimized.
B
Yeah, I mean, I think this is all part of the big conversation right now that we have to talk about as we talk about these new models, is you would really have these two companies kind of neck and neck in getting into this. And I think, Kevin, it might as well. Might as well jump into it now. It's time for some benchmark boy conversations. Benchmark boys with a benchmark boys and the test up loud. Check the charts. Make the damn brown. There was a lot of people last time who were confused. That is benchmark boys and not bros. And we said benchmark bros. So just to clear that up from
A
everybody's perspective, I think they're two separate warrants. Think. But let's talk. Let's get the boys off the bleachers. Let's get them in. You're a good gang. Let's talk benchmarks.
B
So I want everybody to know, first and foremost, benchmarks are a weird thing in that these are, as many people in this audience probably already know, in case you don't, benchmarks are these numbers that are released that are testing these AI models on various specific tests and what they're good at. And every time they come out, they release a series of these numbers. And, Kevin, the GPT 5.5 benchmarks are good. They are not as good as Mythos. Right. And I think just to talk about this as a whole, Mythos had some higher numbers. The one thing I think that I was expecting a little bit from this kind of how people were talking about this were was a larger jump because Mythos, when you saw the Mythos benchmark numbers, and again all of this is about like what it really feels like to use the model, which we'll talk about in a bit. These numbers are still very strong. Like we are Talking about the GPT 5.5 thinking number is at an 82.7 and and Opus 4. 7 is at a 69.4. So that is a significant jump over that particular agentic terminal use number. But in some of the other benchmarks that have come out, like even on the one that OpenAI released, like the CS World Verified, the number is almost the same. Right. So anyway, this is a long way of saying it's another step. It is not the kind of thing where you're like, it's going to do everything for me, but I do think it's important and again I'll get to this later on to kind of talk about what I did with the model already today, that the idea here is that you can give the model more stuff to do that's harder and it can go away and do it on its own. That is the life change that we're all looking at now.
A
So a couple things like on the benchmark front, we've talked about this before, there's benchmaxing, which is where companies overfit their model to crush the benchmarks. And it typically takes a couple days or a few weeks for the vibes to come through. And people say, oh, this is what it excels at. And here's where it falls short. Looking at the benchmark numbers, as you said, there's a couple places where even the, you know, Mythos is whatever, it's not out, it's not out.
B
So who knows.
A
So comparing against Opus 4.7, which I have open right now in a terminal window, Opus bests this new model in some benchmarks. Yes, the early vibes coming out from like Dan Shipper and every and whatnot. Like the early vibes are that this thing is the best model in certain use cases. Yes, that for creative writing it got a lot better. For longer term horizon tasks which are more specific, it got better. But that for, for being a generalist, some people still prefer Opus. And so what I think is, is, is happening here is that, you know, companies have their own philosophies with how engineering should be done in general. Forget the way these models should work, right? And they, they tune the models to their preferences, to their tastes. And so we're getting like a Pepsi Cola or an Android iPhone sort of existence where it's like, look, iPhones are amazing. Android phones are amazing. Some people absolutely hate Android.
B
I hate Android. I hate Android. I don't want to ever see Android in my face ever. So there you go, Kevin.
A
Oh wow, that's right. Gavin will kick a clanker if you're getting tacos delivered in a little Rollie bot. I didn't say he doesn't want an Android in his face. He said it. He's a clanker kicker.
B
You're right there.
A
Hashtag clanker kicker in the chat.
B
No, hashtag clanker kicker.
A
I love clanker. Put it in the comments.
B
OpenAI CTO Jakob, I think his name is Jakob. Let me make sure I understand. Jakob Pachocki. Jakob Pachocki had something really interesting to say about this and Sam kind of reiterated this in a couple tweets. Basically they are saying we see pretty significant improvements in the short term, but extremely significant improvements in the medium term. I would say the last few years have been surprisingly slow. So everybody at OpenAI is kind of saying this is a new way that they are developing. They're going to be much more iterative with rolling this stuff out, which we've also seen from opus. And Kevin, there's been a little, there's a little piece of this in the blog post, but like, this is another model that did a lot of work on itself and I think this is just the speeding up of stuff. And as we've seen OPUS ship all those features for cloud code and other stuff, I suspect we're about to see the same thing with OpenAI as well.
A
Please, let's go. Let's, let's, let's take off, friends. Let's do it. I mean, look, we even see it with like in the open source model community, right? A new Quinn model will drop and then you wait 30 minutes and then there's a distilled or fine tuned and then a couple of minutes later there's another one that's optimized for a different operating system or a different, you know, processor entirely. Like the pace of the evolution here is getting faster and faster and it would make sense that, that if as their foundational models get better, they're better at improving themselves as well.
B
Yeah, and I want to call out a Couple other tweets that are really interesting. Prins has had it for a little bit and said that the GPT 5.5 thinking heavy, that's there's different versions of this, delivers better answers in two minutes than GPT 5.4 heavy delivered in 10. So, like, that's a little bit of what's going on here. The other thing I do want to shout out is Sam wrote a longer tweet, which was this idea about iterative development. But then he also then said, we believe in democratization. We want people to be able to use lots of AI. We aim to have the most efficient models, the most efficient inference stack and the most compute, blah, blah, blah. So this is definitely a shot that feels like it's being taken anthropic. And then I love, at the end of this, he says, we love you and we want to win. We want to be a platform for every company, scientist or entrepreneur in person. My whole career has been largely about magic of startups, and I think we're about to see that magic at Hyperscale. But we love you and we want to win. So we have a combination of things going on here. This is a little bit of interesting stuff that's happening overall. The other thing we should talk about is Codex, right? So not only is this new model out, but Codex actually dropped a bunch of new features, which is really cool. And I just used Codex with these features. I'm sorry, Kevin, I know you don't have it yet. Updating and see if it arrives. Better browser use, better docs. One of the experiences I had with this Kev was in Codex in the past. I don't know if you've had this experience. When I'm trying to build something, the browser is kind of funky. And the in browser, which just came out like a week ago or a week and a half ago, sometimes pops up, sometimes it doesn't. This time it was really solid. Like it popped up. It showed me as it was working, I saw the little arrow moving around within the Codex window, all very clean. So to me that's a pretty big deal. And this also follows up on the announcement that kind of didn't get enough hype earlier this week, which was about the shared agents in ChatGPT. Did you see this?
A
Yes. Yeah, yeah, yeah.
B
So that's another way that like, you know, you can open the door to specific agents that have use cases within either Codex or ChatGPT. This whole world of like, having things that can be spun up, it feels like to me there's A little bit of like a setting of the table for things like an open claw like world where you can go out and get all these agents that can do stuff for you, but maybe living within the OpenAI world itself.
A
Well, that's exactly what it was. That's the open clarification of the Codex app, was adding these agents. So if you want an agent that just does email triage for you now, you can easily set that up. If you're running a small business and you need a dedicated agent to look at your CRM and check the status of your AB testing of your ads on your marketplace, like now you can have all of these dedicated agents that can talk to each other and be shared in the ecosystem. The browser and computer use, specifically the computer use on the Mac version of Codecs is incredible. I think it bests the anthropic Claude plugin.
B
It definitely does. I think it 100% does.
A
Yeah, it seems way faster, seems way more capable. Odd to me that Sam Altman got on a live stream this week and it wasn't for GPT 5.5, it was for image generation. So that just goes to show you how powerful Tuesday's announcement was, how. How powerful the new Image 2 model is. Every day I'm seeing people generating wild stuff with Image two, like generate a birthday cake that has code on it that, when rendered, actually makes an image of a birthday cake was one of the ones that I saw that kind of blew my mind.
B
Or.
A
Or complex mathematical functions integrated into, like, children's rugs. Like that would. They would play on, like, weird, weird stuff. And when you start pairing that with a model like 5.5 now, you start unlocking some really incredible capabilities.
B
I'm very excited to talk about that and show off some really cool examples of what's been made with 5.5 with the image model. But first, a message from a new sponsor. I'm about to do something I never thought I'd be able to do with a laptop. And that's because I have this HP ZBook Fury workstation to work with. There are powerful computers and then there is this. We are very thankful to HP and Intel for sponsoring AI for humans this week and sending us this AI, absolute beast of a PC. This thing is powered by an Intel Core Ultra V9 Pro processor, and it came ready to go right out of the box. I've been using it for everything. Local AI, AI, video running, cloud code, and even spinning up local LLMs for my own private research. It's that powerful. I'm going to spin up Combi UI for local AI image gen right now. So I've installed a bunch of local models like Quentin Flex, which are free to download and free to generate. And I'm going to start making something really important. Images for my new AI series, the Raccoon Bachelor. Here's why this matters. Because I'm doing this locally and the models are open source. I'm not paying per generation, I'm not waiting in a cloud queue, and I'm not sending anyone to anyone else's server that's at least a subscription or two. I'm saving per month and I can just make a lot more. And because this bad boy has an Nvidia RTX Pro 5000 Blackwell GPU, you can see just the size of it. It's crazy. It can handle the bigger models and it has 256 gigabytes of RAM, a crazy powerful Intel CPU. I am running stuff that used to require a dedicated desktop computer on my laptop, which is pretty incredible. And now thanks to this computer, I've got all the images I need to make that little Raccoon Bachelor break the Raccoon Ladies hearts. Check out the link in our description if you want to spec out the Z Book Fury. And thanks again to HP and Intel for sponsoring AI for Humans.
A
Well, as much as I love words from sponsors, Gavin, I love words from our dedicated followers and you can leave them as a comment below. And if you don't want to say anything, I guess that's chill too. Just like. And subscribe, subscribe, leave a 5 star review. And if you want to back us on Patreon or buy us a coffee, you can do all that too. AI for Humans show.
B
That's our site.
A
But sincerely thank you to our sponsor and thank you to everybody who helps grow this operation each and every week. We appreciate your time.
B
That's right. And last week, thank you to everybody who said Kevin is beautiful at the end of the show. I see you YouTube commenters, there were a lot of them. Kevin, you're very happy. Okay, let's talk more about 5.5 because there are some really cool examples I've seen already and I'm going to show off my 64 animal tournament game. First and foremost, Kev. There was a really interesting demo from Peter Ghost Dev which he made. He asked 5.5 to make a toy train set in GPT. 5.5 heavy, kind of crushed it. What was really interesting here is seeing he compared it to what 5.4 did and you can really get a sense of like, okay, these are the kind of different quality sets of the model. Like if you're not watching it, it's just very, very detailed. It's all being done like in a browser. He can kind of spin around it and, and it's just a much less detailed version in the 5.4 version. And I don't know, it's one of those cool things that lets you see what the differences are a little bit.
A
Yeah, I love these same prompt tests. And for those that are just getting the audio version, the 5.4 is cool, right? It's like a table with a model train set literally chugging along and then you can jump into like the conductor seat and look first person through it. But it, you know, it looks a little primitive. It looks like an old Roblox type game. When you jump to the new 5.5 high, the town that the toy track is going around is fully flushed out. There's buildings, there's trees, there's a little river with a boat going through it or whatever. And when you jump to the first person mode, you have controls that make sense and they're labeled appropriately. And it's like just staring at it and going like, oh, that's a cool prompt. I like that comparison. It makes my head spin about what this test is going to look like in a year from now, Gavin.
B
Or six months from now. Right? Sure, yeah, sure.
A
But like the whole room is going to be modeled and you'll be able to go in and take photos, full control. And it will be multiplayer and it will run in browser and it's just, I like, I'm so excited for this near future.
B
I know I had a moment of that this morning thinking about like a year, year and a half ago when you and I would be excited about what these new models would look like. And the fact that we can just spin up these things so much faster is crazy to me right now. Another cool thing from Sebastian Bubek, who actually works at OpenAI, put a unicorn together with an SVG and he said, basically he says GPT 5.5. Not fully saturating the Tick Z unicorn test yet, but getting awfully close. He says this is actual Tick Z code. I find it so unbelievable that I'm putting the code below for anyone to verify for themselves. So what you're seeing here is a code generated unicorn that kind of looks like a My Little Pony, but it's definitely a few far steps from what we used to see with code generated graphics. Like even the unicorn looks a little demure, like it's kind of like sadly winking at us. Or maybe not winking. Maybe it's closing its eyes. It could be sleeping. I don't know what you think, Kevin, is it winking? We don't see the other eye, so who knows?
A
Gavin, I actually don't want to explore this but this is a weird unicorn Rorschach test for you and you're like,
B
move on to your thing.
A
I actually love the way the unicorn is playing koi and it's subtly kind of just. It's a little wink towards me, letting me know, Gavin, everything you're doing is working these days in the gym, really looking. I came across a UFO tank game by in the world of AI and this was a like supposedly like a one shot and it there is a 3D tank that you can drive around a map as little UFOs whiz about and shoot at you and you can shoot at them and when you make a collision with a bullet, pew pew, UFO go bye bye. This is just like again like the new grounds of gaming. I'm sure there's a thousand startups that are going after it, but the games are going to start being good enough that you're going to actually want to participate in them and create them and remix them.
B
Yeah. So let's talk about that. The project I gave 5.5 this morning was a classic project that I have given lots of times to AI models. Kevin will remember this. Well, you and the audience may be new, you may not. I had an idea forever ago, I think it was two and a half years ago, which was I wanted to make a March Madness tournament of the world's most dangerous animals. You take 64 of the world's most dangerous animals and you fight them one by one until there's a champion. The goal here is you as the player play one animal and. And then you go through this. And this morning, literally this is 45 minutes ago, I gave it two additional prompts for this. I said go make this as a card battler. I gave it a pretty complicated prompt to start just so it had the information on it. But Kevin, the big difference here is I gave it the Image Gen tool in Codex. So what I said to it was like, hey, don't just give me. Because often what happens with this when you try to get to make a game, it'll give you like some sort of almost looks like a website. I said don't do that. Pull up images. So you're going to pull it up for the first time right now. I've pulled it up earlier. It's not great, but it's also, like, amazing that I made this in 45 minutes.
A
Okay, so I'm at the Dangerous Animal Madness site. I love that there's some particle effects going on in the background or whatever right off the rip, Gav. Nice. Okay, win six ridiculous fights with the animal. The wheel gives you. I'm going to spin for my animal here. And. Oh, I got Chaos Intern, which is. Oh, I got.
B
Which is a chimpanzee parky lot. Menace is a goose. So, yeah, play with yours and we'll keep your.
A
I'm going to enter the bracket here. So I see the Dangerous Animal Madness bracket. I'm using Chaos Intern versus the Buzzkill Committee, which is a tsetse fly swarm. So let's see if I can win. I'm going to zoom to the match here. Chaos Intern versus Buzzkill Committee. I'm entering the match. Opponent Intent Clamp Down Attack eight Block nine. Let's go. Come on. Oh, I have to choose. I have to choose my hand, right?
B
You have to choose your hand. Yes, you have to choose your hand. It kind of plays out like Slay the Spire or another game like that.
A
Well, I guess I'll brace for weirdness, which is a defense move, and then I'm going to do a wild swing. Okay. Yeah, yeah. Take that, teensy fly swarm. Okay, I guess I got to end my turn now. All right, this is. This is actually too complex for me to just shoot from the head clicking. Yeah. Like, dude, I don't want to actually lose here.
B
No. Well, so here's an interesting thing about this. So basically, again, it's the first time I'm testing it or seeing it. What's very cool about this is it's the speed to demo, right? Like, that's what we've been talking about here before. The idea that you can get from zero to, like, this is probably, I'd say maybe 25 to 50% through a game, but the idea that you can play it right away makes a huge difference and.
A
Oh, dude, I'm op. Yeah, I'm op. Sorry. Yeah, yeah, no, no, you go ahead. You go ahead. I'm just op. I am crushing this tsetse flies.
B
But you get the sense of what it means to, like, be able to demo something quickly in your brain and just drop it out. This was about a one paragraph prompt, and I sent it away. It worked for about a half an hour. For the first time, it came back and I said, do it a little bit better. Make sure you're using the Imagen tool. It worked then for 45 minutes and came back with this. Now, it's not perfect yet, clearly, but to the speed to demo idea is pretty phenomenal.
A
Chaos Intern survives. Choose one card. I can choose an evasive flop, panic geometry or double tap dance. Woo. Gavin.
B
So all of that was stuff that was just kind of prompted in. Now again, there's going to be a lot of balancing in a game like this. I'm playing a lot of slay the Spire 2 right now. That's part of where this inspiration came from. But like, you get the sense that like you the person at home, I am dummy. I do not have coding abilities. But the fact that you could spin up a demo like this very quickly and actually get it playable and get it. So I mean, it's not pretty yet, but like, is not ugly, right? Like this idea that like, it's not just like a prototype that looks like, you know, boxes knocking to each other, that sort of thing.
A
The fact that there's any graphics on screen this early in what would be a development cycle is wild. The fact that that's deployed and playable and you can share it is also wild. And I'm assuming you just told it, hey, go put this website up on Vercel or whatever and it deployed it for you.
B
Yeah, that's exactly right. So I, even while it was working, I said, hey, I steered it, you know, I said like, hey, throw this up on Vercel so I can share it with Kevin in the middle of this conversation. So again, speed to demo capability, long form agents, like all of this stuff is finally coming together.
A
Let's. Let's focus in on GPT Image 2 as well. Because it has only been out for a few days. I am amazed at how good it is at certain tasks to the point where like it has disrupted my usual workflow, which, you know, I'm working on a feature for tele right now. I typically make a prd, I talk with our designer, I make some mockups, whatever. But now the speed with which I am iterating is. It's almost quicker and easier for me to make the full thing, have the designer anoint it, like make their adjustments because they're better at design than me. But then I go and implement it as well like that. And that just changed.
B
This week I had a crazy moment. I'm consulting with a friend of mine on some stuff for him and he had an idea. So I spun up me, not coder. I spun up the demo. I Spun up the design. And one of the things you can do with Image two is so fascinating. It's like you get, hey, give me a website what this might look like, right? So you get a file back. But the thing that I did, Kevin, which I was kind of blew me away because when it tries to implement that file, sometimes it's better or worse at it knowing all the different elements on the screen. You can ask GPT2 Image to, to send you just the elements on the screen so that like in my thing, it had a really good logo and it had a couple other things that were cool. I said, give me all that stuff as individual elements and then you put that in your file and you let it build. You can do it all. It's like a one person shop. It really is shocking.
A
So I had it do the mock up of this like product that I'm building, basically. And then I said, oh, go ahead and install Hyper Frames or use Remotion. In fact, use both. And then make the mockup move like this. I want the icons to come in, I want things to highlight, animated, blah, blah, blah. And then give me like a 15 second video. It went off. This was 5.4, but it went off and did all of that using the GPT, the image to image. And it looks great. It like just. It looks like a fantastic little mockup. And I mean, that's like, okay, whatever. That's me being actually productive. Let's get to the Where's Waldo games.
B
Yeah, well, that's. There's a bunch of people making Where's Waldo versions with this because one of the things it can do is very detailed, very specific, larger prompts. There's a good example from Jeff Ladish, who made a University of Berkeley anti AI Where's Waldo sort of thing where there's a bunch of jokes. And then I stole his prompt and used it to make a thing about the NFL draft today. If you're a football fan, you know the NFL draft happened. So I had it make one of these things. And what was interesting for me was going like what we said last time on the show, like the little jokes and little things that ads are so interesting. And this image, this NFL draft image I made is so complicated. There's so much stuff going on in there. And now not all of it's perfect. There's a few things that are wrong, but like it's making jokes. Like a Mad Magazine sort of thing, right? Like, it almost feels like it's like this giant thing that somebody drove Drew and wrote A bunch of stuff on. So it is a shocking moment when it comes to what's possible with that. And then when you compare it and contrast it with what you can do with the code, those two things together just, like, overpower a person.
A
I feel like, yeah, I saw your draft image, and I zoomed around and was, like, looking at things I don't understand. Like, this is me looking at, like, actual code. I don't understand half the references, but I can tell that every little frame is packed. Like, every little pixel is playing some sort of joke or being part of, like, referential humor. I don't what is. I don't even know what some of these things are.
B
Well, the funny thing about it is, like, there's a couple of things it gets wrong. Like, it, like, one of the teams. It gets the wrong team, but stuff like that. But it goes through. There's 10 draft picks in the middle, and it's the actual people. I asked it when I. When I created the images that, like, go find who these draft picks are and put little jokes about each of them in. And some of them have very specific jokes, but then all around the edges, there are other jokes about what happens during the draft or things like that. So, anyway, this is a very fun prompt to try for yourself, for whatever world that you live in. Like, it's probably a good thing if you're a corporate person. Like, you could do a thing where it's like, make it about my company. Like, it probably knows a fair amount of stuff, you know, and you can make these little jokes. It's a very cool thing to show off. I do want to say one more thing, Kevin. I. I sent this image to my daughter last night because my daughter was like, oh, open eyes. New image Model is interesting. And she made a picture of herself and did some stuff with it when my daughter was a kid. Hopefully, they don't kill me for telling this story. There was a character that she created called Mr. Brewster where she wore this kind of white wig, and she went around. It was like an old man character that she made. She was very embarrassed of that character. We loved it. My wife and I thought it was one of the funniest things in the world at the time. She was probably 8 or 9. She's always had this thing of like, oh, you guys thought Mr. Brewster was so funny. It was stupid, but I think it's funny. Anyway, I sent her back this image, and I said, hey, you wouldn't believe what I saw at Whole Foods. And I made an Image that was Mr. Brewster's wonderful concoction. Like, it was kombucha. And she's like, wait, what is that? And I was like, did somebody take
A
out a real end cap with all the different, you know, different kombuchas available in a Whole Foods branded appropriately? That's amazing. She. She actually thought that someone made Mr. Brewster's for a moment.
B
Yeah, she thought so. Yeah. So my. My daughter said, I thought you saw this in the store. And my other daughter said, is this AI? Like, that's just an interesting thing at large. So this is where we're at right now, folks. All right, everybody, that is it for now. We will see you all next week. Thank you for joining us. And play around with 5.5.
A
I still don't have 5.
B
Kevin still doesn't have it. He'll have it soon. All right, bye, y'. All. We'll see you next week.
Hosts: Kevin Pereira & Gavin Purcell
Date: April 24, 2026
This lively episode centers on the just-announced release of OpenAI’s GPT 5.5, a major new version of the AI model that promises smarter, faster, and more efficient performance—particularly for extended ("long running") tasks. Kevin and Gavin break down what this means for the AI landscape, compare it to Anthropic’s recent releases, and showcase real-world, hands-on demos highlighting the speed and new creative possibilities unlocked by 5.5. They also dive into the latest with OpenAI's Codex and Image 2 models, illustrating how the AI toolchain is enabling users—experts and non-coders alike—to build more ambitious projects, faster than ever before.
“I got it. Kevin does not have it yet. We are recording this on Thursday Dying. But it is a very cool new model.” (02:00, Gavin)
“With GPT 5.5, sometimes I become lazy and I kind of give it a very ambiguous task. But then it will figure it out.” (00:20, Gavin)
Feature Leapfrogging
“About 30 minutes after 5.5 was announced, Anthropic posted a big, ‘Hey, our bad...’” (04:16, Kevin)
Balancing Cost, Speed, and Token Efficiency
“It's this delicate balance of trying to extract as much from the end users...while not extracting too much that people say, ‘Hey, Anthropic, we're done with you. We're now leaping to OpenAI.’” (04:17, Kevin)
Benchmarks Are Not the Whole Story
“Benchmarks are a weird thing...every time they come out, they release a series of these numbers. And, Kevin, the GPT 5.5 benchmarks are good. They are not as good as Mythos.” (05:53, Gavin)
Taste as Differentiator
Notable Quotes:
“Basically they are saying we see pretty significant improvements in the short term, but extremely significant improvements in the medium term.” (09:02, Gavin)
“This time it was really solid...all very clean. So to me that’s a pretty big deal.” (11:22, Gavin)
“Now you can have all of these dedicated agents that can talk to each other and be shared in the ecosystem.” (12:36, Kevin)
“Every day I’m seeing people generating wild stuff with Image 2...” (13:13, Kevin)
“The town that the toy track is going around is fully flushed out...it makes my head spin about what this test is going to look like in a year from now.” (16:50, Kevin)
“The fact that there’s any graphics on screen this early in what would be a development cycle is wild.” (23:23, Kevin)
[25:58] Where’s Waldo-Style Images & Mockups
“There’s so much stuff going on in there...like a Mad Magazine sort of thing...” (26:58, Gavin)
Workflow Transformation for Design & Rapid Prototyping
Real-world confusion signals photorealism and believability:
“My daughter said, ‘I thought you saw this in the store.’…My other daughter said, ‘Is this AI?’ Like, that's just an interesting thing at large.” (28:52, Gavin)
| Time | Segment | Highlights | |-----------|-------------------------------|----------------------------------------------------------------------------------------| | 00:00–01:32 | GPT 5.5 arrives | Gavin’s firsthand access, initial impressions, and real-time prompt improvements | | 03:21–05:16 | OpenAI vs. Anthropic | Competitive dynamics, model regressions, token cost issues | | 05:43–08:57 | Benchmark Boys | Benchmarks reality, model “taste,” pros/cons of 5.5 vs. rivals | | 09:01–10:25 | OpenAI’s new development pace | Iterative rollouts and self-improving models | | 11:22–13:11 | Codex & agents | Enhanced browser/code experience, agent marketplace possibilities | | 13:13–13:58 | Image 2 breakthoughs | Creative possibilities unlocked by new image model | | 16:00–17:48 | Demo: Prompt Parity Test | Toy train game in 5.5 vs. 5.4; stark leap in complexity | | 19:30–23:57 | Demo: Animal Madness Game | Gavin’s rapid creation of a playable game using 5.5, Codex, Image | | 25:58–27:19 | Image 2: Where’s Waldo | Complex, joke-laden visual outputs; rapid design iteration | | 28:40–28:52 | Real-life deception | AI-generated product art fools family members |
This episode captures AI’s accelerating trajectory: new capabilities, community creativity, and the fading boundaries between expert and casual user. If you want to know what’s possible with AI right now—and what’s coming within mere months—this energetic breakdown will bring you up to speed, and then some.