Transcript
A (0:00)
Today on the AI Daily Brief, 25 things you can do with Nanobanana Pro that you couldn't do with AI image generation before the AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right friends, quick notes before we dive in. First of all, thank you to today's sponsors, kpmg, Rovo, Robots and Pencils and Blitzy. To get an ad free version of the show, go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. To learn about sponsoring the show. Send us a Note@ SponsorsDailyBrief AI lastly, we are in the last couple days of the AI ROI Benchmarking study. Check it out at roisurvey AI and thanks to all of you who have contributed so far. Final quick note before we dive in. We are just wallowing in all of the fun ways to use this new model today, so we will be skipping headlines. Then of course over the weekend we have a big Think episode and we will be back with our normal format on Monday. With that out of the way, let's dive in. Welcome back to the AI Daily Brief. The last two weeks have been an absolute embarrassment of riches. We got GPT 5.1 followed by Gemini 3, followed by GPT5.1 Pro and Codex Max, followed it turns out by Nano Banana Pro. And in some ways, in terms of immediate impact and change in your capabilities with AI, I think that this image model might be the big one. Now quick warning, if you are not already, you gotta switch over to the visual version for this episode. You can find it on Spotify. There's also a version of it on the YouTube. Today we are talking about 25 things you can do right now with Nanobanana Pro, which is of course Google's latest image generation model. And importantly, this isn't just cool things that you can do, this is stuff that you pretty much couldn't do as of like 3 days ago with the previous image state of the art. Now by way of background, you might remember that a couple months ago everyone started going wild for the Nano Banana image model. Nanobanana was of course its codename, but it became so beloved that it sort of stuck. Its technical name was, I think Gemini 2.5 flash image gen or something like that. In any case, what made nanobanana so interesting to people was not that it had raw output that definitively beat other image models, it's that it was so unbelievably steerable. It provided for fine grained editing in ways that really opened up new use cases so much that it got me thinking that we need a different type of heuristic or metric or eval when we look at a new model that's not so much about its performance on benchmarks, but is instead about what capabilities it unlocks. I proposed the idea for an unlock score, which would basically be exactly that, a determination of what new possibilities a model opened up. And in any version of that, nanobanana Pro would just score off the charts. Now, in terms of the capabilities, there are really two big things that feed into everything else in my estimation. The first is text representation. The difference between any other model putting words on an image and what nanobanana can do is the single biggest jump that I've ever seen between models when it comes to image generation, period, full stop. Now, the second factor that combines with that to open up all sorts of new possibilities is, is the fact that Gemini is able to reason on top of image generation. So when you are inside Gemini, it is not two disconnected experiences, but you don't just have to prompt an image, you can actually talk to the model and figure out exactly what you're trying to do. That reasoning on top of image generation once again opens up totally new possibilities. Finally, a third factor which should be mentioned as well, is the incredible fidelity that it has to whatever edits you want to make. All of which adds up to this being a model that is going to turn you from an average image generator to an absolute professional in a lot of ways. I think that the core story here in some ways, and the meta category that a huge number of these use cases fall out of, is the idea of visual compression. One of the big themes that you see across a ton of the early experiments that people are doing is taking a bunch of information and making it visual. In other words, the difference in how nanobananapro can use text is not just a change in scale, but a change in kind. The new unlock is not just being better able to use text, but being able to do it so well that you can start to tell visual stories. So a first example of what you can do with NB Pro is compressing lots of data into visuals. Take for example, financial results. Didi Das of Menlo Ventures took the entire Nvidia Q3 earnings PDF and generated a single page infographic that had the key highlights like revenue, operating income, net income, gross margin, and was also able to highlight other key parts of the report including segment performance and drivers and capital strategy and risks. Justine Moore from A16Z did something similar with Alphabet's Q1 earnings release, that one was even higher on chart density with accurate representation charts showing revenue growth, operating income growth as well as revenue composition. Accurate scale charts is actually a great example of where the more sophisticated intelligence changes the nature of what you can do. To demonstrate this capability, Simon Smith made a chart of bananas where he showed the difference in magnitude between 25%, 50%, 75% and 100% and it actually gets it right. Simon said, I've previously tried to generate charts with image generators and they failed to get the correct lengths for bars and columns. Kaushik Shivakumar found something similar. The Google Deepminder wrote an emergent capability of nanobanana Pro that took me by surprise. The ability to generate beautiful and accurate charts that are to scale. In his case, he tested GDP per capita and it not only is accurate, it really is aesthetic at the same time, another approach to this idea of compressing lots of data individuals that I kind of think is going to become a trend is the whiteboard trend. Pietro Charano writes nanobananapro is wild. Here's my favorite use case so far. Take papers or really long articles and turn them into a detailed whiteboard photo. It's basically the greatest compression algorithm in human history. He then shows a Compression of a 92 page PDF from the LLAMA 3 herd of models converted to a professor's whiteboard. And while obviously you can only fit so many details into a whiteboard, it's an incredibly impressive summarization. So again, just to reinforce the points about what's different here we have two things coming. Better ability to handle and represent text and the ability to reason on top of image generation to create truly native visual outputs. This leads to a fourth very broad category of use cases which we'll call educational. Effectively, image generation can be an educational tool alongside LLMs now in a way that just was impossible up until literally a couple of days ago. We have infinite examples here, but to take a few we have a visualization showing what parts of robotics are solved versus where there are key bottlenecks and hurdles. Clark Wimberley created a visualization explaining how a touchscreen works from tap to action from the literal dead simple prompt, make an infographic explaining how a touchscreen works. Nanobananapro was able to put together a four part visual that looks great and explains how the process goes from physical touch to sensing the interaction to signal processing to executing the command. Swix went meta and asked nanobananapro to explain nanobananapro. He got it in two very different ways. One is a sort of good looking but ultimately academic infographic, while the other is a literal classic comic strip explaining what nanobanana can do. I am of course seeing a ton of parental use cases. Google's Jacqueline Conzelman created this gorgeous tour of our solar system that looks like the type of poster that you'd put on a three or four year old's wall. Speaking of three or four year olds, I have a four year old who is learning to read and who is very, very much into construction equipment and construction tools. And so of course I had to put together an Alphabet chart that was based on that theme. And if you've ever tried to do something like this, you will know that while this feels like it should have been table stakes, it absolutely was not. It was nearly impossible to get something like this before, and pretty much genuinely impossible to get it with no errors and without specifying all the different elements. I didn't have to tell it to put an asphalt paver for A or a bulldozer for B. I just told it that I wanted an Alphabet chart with these themes and it figured out all the rest. As you can see, we're starting with these very broad use cases that are actually lots of use cases bundled together. But the next one we'll look at is a sort of subset of infographics which is flowcharts. Ethan Molik prompted it I need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible. And it did that with grand fashion. Now Ethan was being silly, but this ability to actually show the representation of different visual elements as a flowchart is obviously incredibly valuable, not just in a silly way as well as foreign. What if AI wasn't just a buzzword, but a business imperative? On you can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises. Hosted by me, Nathaniel Whittemore and powered by kpmg, this seven part series delivers real world insights from leaders who are scaling AI with purpose. From aligning culture and leadership to building trust, they data readiness and deploying AI agents. Whether you're a C suite, executive strategist or innovator, this podcast is your front row seat to the Future of Enterprise AI. So go check it out at www.kpmg.us aipodcasts or search you can with AI on Spotify, Apple Podcasts or wherever you get your podcasts. Meet Rovo, your AI powered teammate. Rovo unleashes the potential of your team with AI powered search, chat and agents or build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Rovo is already built into Jira Confluence and Jira Service Management Standard, Premium and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know Discover Rovo, your new AI teammate powered by Atlassian get started at ROV as in victory o.comai changes fast. You need a partner built for the long game. Robots and pencils work side by side with organizations to turn AI ambition into real human impact. As an AWS Certified partner, they modernize infrastructure, design cloud native systems and apply AI to create business value. And their partnerships don't end at launch as AI changes robots and pencils stays by your side so you keep pace. The difference is close partnership that builds value and compounds over time. Plus with delivery centers across the us, Canada, Europe and Latin America, clients get local expertise and global scale. For AI that delivers progress, not promises, visit robotsandpencils.com aidaily Brief this episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzi platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and precompiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the Sprint. Public Companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI assisted to AI native. Next up number six is visual tutorials. Callum Clark put together a chart of the correct bowing procedure for Taekwondo, dividing it into the four steps and as well as providing insight on when to bow. Once again, Callum didn't provide it a ton of information. When someone asked, he said the prompt was fairly simple Generate me an infographic explaining how to bow correctly in ITF Taekwondo. Now, I didn't see any versions of this, but can you imagine how valuable this would be for instructions on assembling something? Another sort of separate category of instruction that a lot of people are experimenting with is visual recipes. Chubby on X built a chart showing how to make the perfect cardamom tea. Vittorio created a step by step guide for cooking the perfect pasta. Anatomical and technical drawings was a huge theme. The JSON Prompts account showed a bunch of Pikachu anatomy drawings, including Squirtle, Bulbasaur, Charmander, and Pikachu himself. Another use case, which I think we'll see a ton of, is taking one type of media and turning it into another type of media. Shopify CEO Toby Lutke took a video of a speech that he gave a number of years ago to his team and turned it into a rich, complex visualization, which is something you better believe I'm going to try with the transcripts to this show. Another subgenre of technical drawings that we're seeing people experiment with is blueprints. The AI for Success account wrote it did not just create the image, it first read the blueprint properly and then created the final output with every small detail. Another great representation of the power of true multimodal understanding. Sort of related to that is another use case that I think is going to be highly commercialized, which is virtual staging. Justine Moore from A16Z gave it a set of three pieces of furniture and said, stage a living room with this couch, table and two chairs. It executed the output very well. And Justine wrote, the first iteration of the model was good at this, but I find the new model is much better at retaining textures and asymmetry and unique features of objects. Alcine went farther writing, nanobananapro is making millions of interior designers obsolete. I uploaded my floor plan and it designed the whole house for me and even generated real images for each room based on the dimension. Now, in my opinion, I think that this is once again an example of where professional interior designers are just going to be able to do more, faster, potentially for less and for a greater number of clients. But the capability increase is absolutely huge. Another capability that Google talked a lot about when they launched this was the ability to combine multiple people into a single photo. FOFR found that nanobananapro could take up to 14 reference images and that while it worked best with around five people, sometimes you could push it farther. If you've ever tried to combine people into an image, you'll know that AI models frequently have a hard time and end up kind of blending people's features together rather than putting them next to one another. Fofr actually showed a couple examples of the different styles that you could combine all these reference images to create something with a big theme. Here is really precise instruction. Following Halim Al Rasihi gave the model two characters, a style reference and a sketch of an action pose that they wanted and got the exact image that they were looking for. Now this precision instructions and precise editing is once again, I think going to be one of the most commercially viable and important unlocks of the whole model. The original Nanobanana was good at this. This was actually the core of where the unlock score idea came from. This ability to spot edit photos. But pro takes it to another level. Clark Wimberley took a photo of a man in a warehouse and prompted the model. This man just got a supplier price change request and looks concerned. The model makes the change in ways that look incredibly natural and not over exaggerated. Clark also turned a white claw into a glass of soda with a striped straw. Gotta give a shout out to Prinz as well, who took a handful of magic cards split between red and black and made them all into black cards. Now if you don't know magic, I want to underscore a couple things about this that make it even more impressive than it seems. The first is that it was able to tell that based on what the user was asking, it needed to change the mountain on the left to a swamp, which is the basic land associated with black in the game. That involves a whole different level of understanding and comprehension that wasn't there in the prompt. The second part is that the borders of different colors have different visual cues. And so the model knew that it couldn't just change the color of this pattern. It actually had to change the pattern to what black cards look like. Now, while this was just a demonstration example, that level of precise editing opens up such a crazy world of new opportunities. Speaking of fidelity to instructions and precise editing, the ad agencies are going to be absolutely salivating. One of the most common type of examples that people were sharing were product and brand shots. Someone created high fidelity advertising visuals for earbuds. Hedra Labs took its logo and put it on a billboard. Jacob Paul took a set of reference product images and turned it into a magazine style ad. Now, staying in the brand marketing advertising theme, a lot of people also experimented with logos. Now this is one where I will say, for the sake of having some amount of skepticism, I still gotta think that the logo outputs of nanobanana Pro are, to put it uncharitably, tasteless and ugly. But I also haven't gone in and tried to get something really great out of it. And to give it credit and acknowledgement, most of the logos that it was trained on I also think are absolutely horrifyingly ugly. Still, bringing it back to the very impressive Pro isn't just able to generate a logo or a brand asset, it's able to generate bulk brand assets. Crystal Maria writes one shot at a brand and put it on merch with a low effort prompt. On nanobanana Pro, she created a new chicken pizza company and designed a pizza box, a T shirt and a hat, all with an integrated logo system that was consistent. Andrew Lane did the same for a Matcha Energy and Collagen brand. Now one thing that people noticed as they were doing this is that Google appears to have wound back the guardrails just a little bit. It's more comfortable producing images of people and owned ip. For example, folks were able to get the Star wars and Disney logos really accurately. Now whether this is something that will persist I have fairly big questions of. But the more that within reason they can just let people do those sort of logo identities at least I think the more use cases it opens up. Just a few more use cases before we wrap up. Tons of people were experimenting with movie stills. Viral AI advertiser PJ Ace wrote, nanobanana Pro is the most cinematic model on the planet. I asked Gemini to generate photorealistic leaked images from the new Legend of Zelda movie and this will change Hollywood. Architrathy did the same thing with a Wallace and Gromit still, but was able to take it from multiple angles, calling it a leapfrog moment for AI filmmaking. Speaking of filmmaking, Nanobanana Pro's text capabilities also open up improved possibilities for using AI video as well. Nick Mataris writes, Step 1 Upload an image or generate an image using Nanobanana. Step 2 Use NanobananaPro to annotate the image. His prompt was add sketch annotations on top of this image explaining the camera movement. I want it to crane up and look down as an aerial shot. Step 3 Use VO 3.1's frames to video to bring it to life. Basically the annotation on the image allow the video model to know what to do. There is a ton of media remixing, like people putting digital news articles on old newsprint, people taking contemporary logos and turning them fluffy, people taking photos of their kids and turning it into Movie posters. In an impressive display of physics, Christopher Friant found that he could apply an image, in this case of Sydney Sweeney to a dodecahedron and Pfaffer again was able to take a meme and turn it into Legos. Indeed, I think the meme potential here is pretty unlimited. I took the bass face kid meme, basically an image that people share when they really like a song, particularly in dance music circles, and I asked nanobanana to turn it into a four part visual scale where the face goes from normal to the most insane bass face. You can see here that I think that it absolutely nailed it. Creating settings for normal, mild bass, intense and insane bass face. What's clear from all of this is not just that the unlock score of Nano Banana is off the charts, but that it pretty fundamentally redefines how we have to think about image generation capabilities. For those of you who have followed Ethan Malik for a while, you'll know that he's used a similar test prompt for years to see how new image generation models fail. It's basically otters on a plane using WI fi, he writes with tongue in cheek. I think my otters on a plane using WI Fi may be a saturated benchmark now that nanobananapro can do this. The image is a set of white lab coat and glasses clad otters describing on a whiteboard why models previously had a hard time with this. With a gallery wall on the right showing all of the previous generations. We are, in other words, in very new territory. Now we'll explore a lot more about just what the implications of this are. For now, if you have access to Gemini, I would highly recommend just going and spending a bunch of time playing with this. Try exploring not just interesting image generation, but something where you need to convey a lot of information density with visuals. I think you'll be impressed and I think it'll change in a good way what you think is possible with AI image generation. For now, that's gonna do it for today's AI daily brief. Appreciate you listening or watching as always and until next time, peace.
