Transcript
A (0:00)
OpenAI has just dropped a brand new image model. I've been testing it out and playing with it today. I'm quite impressed with what they've been able to accomplish. TechCrunch said that they are continuing their Code Red war path by putting out this model. I don't know if it's a Code Red warpath, but I do think this is a really impressive model. And I also think, I mean, I think it was just time for them to update it, but perhaps it is because prior to them releasing this model, their last image model update I was begging them to make for over a year. The old version of Dall E. So like two generations ago was absolute garbage. They're getting smoked by literally everybody, including midjourney and everyone. And so when they made their their previous update to the image model, it was a huge, huge upgrade. Playing with this newest model is really cool. There's a bunch of cool features, but there's still some places that it failed when I was testing it. So I'll give you the pros and the cons on this episode and, and break down what I think it is capable, what it isn't capable of doing, the areas I think that there are for improvement and some of the shockingly impressive things I was able to get it to do. So we're going to get into all of that on the podcast today. But if you want to test out all of the models I talk about on the show, go check out my own startup, which is called AI box. AI. You get access to over 40 of the top AI models, a whole bunch of image models that are really cool, a whole bunch of audio models like 11 Labs, OpenAI's audio model for text, you have anthropic Google OpenAI meta, tons of cool open source models all on there for $20 a month so you can save money and have them all consolidated into one place. If you want to go check that out, it's AI Box AI. I'll leave a link in the description. All right, let's get into OpenAI's latest model. So they've just rolled out this new image model. Apparently it's a lot better at following instructions. I've tested it out. I have found that it is more precise at editing and it's four times faster at generating images, which, let's honest, is the biggest thing that would drive me crazy with OpenAI. And the reason why I was using Gemini's Nano Banana, because it was just so much faster at creating images. So I actually think this is a big moment for OpenAI, they obviously didn't want their image model to get lapped. People, everyone was switching to nano Banana for image generation and so I think that they are, they're really trying to push to make sure that they're not falling behind in this. I think this model catches them up and possibly surpasses Nano Banana in some ways. So what's going to be, what's cool about this is they made the announcement, they're calling it image 1.5. It's available on ChatGPT for everybody that has ChatGPT and it's also on the API. So it's an amazing new image model. OpenAI's Sam Altman last month said that they were in code red and a leaked internal document essentially saying that they're, you know, losing market share to Google. They weren't the market leader anymore, they were falling behind, they had room to grow and it seems like this is something that they have been working on. So the newest version of Google's rival image generator, Nano Banana, topped the LM arena leaderboard across a bunch of different benchmarks. And I do not think OpenAI appreciated that. So right now, Google still has its lead over OpenAI in the launch of GPT5. 2. And because of that, basically that means that people are preferring Gemini responses and that is something that OpenAI does not want. I think they, they basically at this point every week, every month that they're behind in the benchmarks, a bad sign for them, they lose market share, so they're trying to be faster. So on that note, apparently OpenAI had been planning on releasing this new image generator in early January next year. But because of the benchmarks, because of the code red, because of everything going on, they decided to just accelerate those plans and push it out as fast as they could. And so they got this model out. The last time they had a model update was in April. This was quite a while ago and I think it was definitely due. So now that they're doing this new 1.5 image model and the image model updates, you have to also imagine that the video generator in Sora is going to get a good upgrade soon because all of the video generators are based off of image generators. So just like nano Banana Pro Chat GPT Image has post production features which give you a lot more granular editing control when you're making some of these images. So there's like facial likeness, there's lighting, there's competition composition, there's color tone across different edits, there's a bunch of cool things that you can do with it. When I was playing with this earlier today, I was making a thumbnail is like the number one way I test image models because I'm like asking it to do text, I'm asking it to do images, I'm asking it to take a picture of me and put it in there and other people and logos of companies and like all this kind of stuff. And I was actually impressed by a couple of things, but I think there's room to grow in a couple other areas. So the first thing that I was impressed by was right off the bat, gave it a picture of myself and I said, generate a YouTube thumbnail of me looking shocked and staring at a giant cloud with letters in the sky written by an airplane that say new AI image. The airplane has an OpenAI logo and is being flown by Sam Altman. Okay, I gave it a lot of things and I also gave it some concepts where the, the reasoning model had to think about what was going on. Like how is it going to display the cloud letters? How is it going to make you be also able to see the airplane and the person driving the airplane. Like there's a. There's a bunch of things that I was curious how was going to do it did this like 100% better than the old model ever could have. It did a really impressive image for me. The one thing that I will say in its first, in its first go is that the OpenAI logo was not the OpenAI logo. I've had it accurately find it on the web before and put it on there. It did put all of the cloud letters really good. It had the airplane at a really great place. That all made sense. The person flying it didn't really look like Sam Altman. Was my biggest, was my, I guess, biggest complaint about this. And they have a really cool feature now where if you click on an AI image, you have this feature called select area and you can select a part of the image and have it regenerate that bit of the image only so you don't have to get the whole image regenerated, just the part that you're talking about. Now one thing I will say that I, I feel like it didn't do a great job of was I selected just the head of the person flying the airplane is this per, like this random person that was apparently Sam Altman but didn't really look like him. And it literally I just like put a circle around his head and had it regenerate. And when it regenerated his head, it put like a Better looking head on, but all of the space around his head didn't match the sky beside it. So like you could tell it looked like I was in Photoshop. And I like cut and pasted a little piece of an image on top, so it kind of looked bad. I'm assuming what I probably should have done was selected the entire like maybe the whole airplane or something and had it regenerate the whole airplane. Maybe really granular, small bits, it's not as good at generating. So in any case, I think it definitely has some room for improvement there. But afterwards I literally, without using that like selection tool, I just uploaded a picture of Sam Altman's head and uploaded a picture of the OpenAI logo. And I was like, update the logo to use this one and the image of Sam Altman to be this one. And once I did that, it got the correct OpenAI logo and Sam Altman's head and actually everything looked great. So if I had done that from the beginning where I provided, you know, the pictures of all the people I wanted to be used and the pictures of the logo that I want to be used, like it could have done it right off the bat. Probably. The image looks a hundred times better than its last model. So I'm really, really impressed. And beyond just making better images, it's also able to make them a lot higher quality. You could do 4K images. I think something that a lot of people have been talking about is just that most of the generative AI image tools are really bad at iteration, like if you're trying to change it. So like this whole process I just walked through where I was like editing the image live. So, you know, in the past if you said like adjust the facial expression or make the lighting colder, it would just reenter. It would like regenerate the entire image and maybe the next one wouldn't look like how you wanted. This update that they've added, you can tell it to make small updates like that and it will make the small update across the entire image. So it's more like they're saying like OpenAI's CEO of applications, he has a whole blog post about it and he said that it's quote, more like a creative studio. I actually think it is. Someone else was saying that, you know, the new image viewing and editing screens make it easier to create images that match your vision or get inspiration from trending prompts and preset filters. That's another thing that I should mention is that on Chat GPT now on the left hand side, you will See that there is an Images tab and inside the Images tab, if you're just trying to create an image, you don't have to in ChatGPT be like create an image of XYZ. You just describe the image you're creating. So that will save you a couple prompts. In addition, you can see all of the images you've ever created. So that's kind of useful to for you to go see and you can download them. You can discover like holiday cards or you know, me is an album cover or what would I look like if I was a K pop star? I don't know. We have like a bunch of like funny ideas that you can go try. I think they're trying to like create some trends or something. But I do think it's, it's nice if it saves you a couple seconds instead of having to go and, you know, add that into your prompt. You just click on the image generation button and it knows that you're doing that. It also has a button for adding images in. It knows that you're going to ask it to manipulate images of yourself or things that you're working on, which I find makes it really, really useful. So overall, I'm really impressed with it. If you learned anything new or appreciated the podcast, I would really appreciate it if you could leave a rating and review. Wherever you get your podcast, they help the show out a ton to get found by more amazing people like yourself. And as always, make sure you go check out AI box AI to get access to 40 of the top AI models for 20 bucks a month. Thanks so much for tuning in. I'll leave a link in the description to AI Box and I hope you have a great rest of your day.
