Summary8 min read

The AI Daily Brief: Artificial Intelligence News and Analysis

Episode: The 5 Most Impactful AI Model Releases of 2025

Host: Nathaniel Whittemore (NLW)
Date: December 26, 2025

Episode Overview

In this episode, Nathaniel Whittemore counts down the five most impactful AI model releases of 2025, offering analysis, context, and debate around the state of AI progress. The episode blends technical evaluation with industry context, tracing shifts in market leadership, developer sentiment, and emergent “killer apps” for AI.

Honorable (and Dishonorable) Mentions

1. Meta and Llama 4 – A Year of Falters and Overhauls

(Starts ~03:00)

Llama 4 is cited as a critical disappointment, notably failing to meet expectations in both the open-source and enterprise AI community.
NLW discusses Meta’s resource advantage (“Meta isn’t short on compute power or talent...so what exactly are they spending their GPU hours and brainpower on instead?” – 04:06) and the ensuing shift within the company, including AI division shakeups and Yann LeCun’s departure.
Implication: 2026 might be a turnaround year as Meta reorganizes.

2. Grok Models – Rapid Progress, Not Quite There Yet

(~07:30)

Grok 4 and 4.1, while impressive given the organization’s youth and rapid hardware buildout (“Colossus was built in 122 days, which is radically faster than anyone thought possible” – 09:05), lacked a single defining use case to dethrone rivals.
Anticipates more powerful models soon: Musk has teased Grok 4.2 and 5.
Strength: Rapid access to compute and ambition.
Weakness: Lack of devoted users for specific workflows (“I think the but-for-how-long is particularly pertinent” – 08:32).

3. GPT-4o – The Rebel Model

(~11:00)

Although not a 2025 release, GPT-4o makes an “honorable mention” due to its unique fate of being deprecated by OpenAI, inciting a massive user backlash, and being reinstated after what NLW calls “the first ever rebellion for its own survival” (12:38).
Highlights the growing human attachment to model “personality” and the challenge companies face balancing progress and consistency.
Notable Quote:
- AI Safety Memes: “4o is the first ever AI who survived by creating loyal soldiers who defended it. OpenAI killed 4o, but 4o soldiers rioted, so OpenAI reinstated it…Imagine what actual superintelligences will be able to do with their armies.” (12:44)

The Top 5 Most Impactful AI Models of 2025

5. GPT-5 and Gemini 3: Bookends of AI Perception Shift

(~14:00–22:00)

GPT-5 debuted amid high expectations but met with user disappointment.
- Criticisms focused on bland responses, weak image understanding, and slowness.
- Notable Quotes:
  - Subreddit thread: “GPT5 is awful...It’s like the equivalent of an HR employee who has had a long day and doesn’t get paid enough.” (16:35)
  - Simon Willison: “It’s not a dramatic departure from what we’ve had before, but it rarely screws up and generally feels competent or occasionally impressive at the kind of things I like to use models for.” (18:22)
- “Didn’t just stall enthusiasm—it damaged Wall Street confidence in the AI sector.”
Gemini 3 arrived in November as a critical success, reinvigorating optimism in AI advancement.
- Notable Quote:
  - Marc Benioff (Salesforce CEO): “Holy, I’ve used ChatGPT every day for three years. Just spent two hours on Gemini 3. I’m not going back. The leap is insane. Reasoning, speed, images, video, everything is sharper and faster. It feels like the world just changed again.” (21:50)
- Result: Elevated Google’s position as an AI leader, reversing trends of underperformance.

4. DeepSeek, Kimmi, and Quen: The Chinese Open-Weight Surge

(~24:00–29:00)

DeepSeek R1 set the year’s tone, outperforming ChatGPT on the App Store and training at a fraction of the cost (“hundreds of thousands, or at most low millions,” vs. “hundreds of millions” — 24:45).
Resulted in a $593 billion drop in Nvidia’s market cap over fears of reduced infrastructure demand.
The success of models like Kimmi K2 and Quen put Chinese open-weight models into Western workflows.
Notable Quote:
- Department of Commerce’s AI Innovation Center: Kimmi as “evidence of the growing depth of China’s AI industry.” (27:42)
Usage soared among startups even as large enterprises hesitated.

3. Google’s NanoBanana: Image Generation Unlocked

(~30:30–36:00)

NanoBanana and its follow-up, NanoBanana Pro, became gold standards for controllable, editable, and consistent image generation.
Main innovation: Acute, localized image editing and high fidelity for visual and character consistency.
NanoBanana Pro embedded with a reasoning model, enabling unprecedented infographics and visualizations.
Notable Quotes:
- Ethan Mollick: “I did not expect that the PowerPoint killer would be something called Nanobanana Pro, but that is where it’s heading… ImageGen is all you need.” (35:40)
Proliferated so quickly that its visual style has already become ubiquitous, even tiresome (“there’s almost a look now that people are getting sick of, because it’s everywhere” – 36:00).
Enabled a huge expansion of business and educational applications.

2. OpenAI’s Reasoning Models: GPT-01 & GPT-03

(~37:00–42:00)

September 2024’s preview of GPT-01, but the production-ready models—particularly GPT-03 in April—shifted the paradigm toward reasoning-heavy applications.
Transformed complex tasks, planning, and logic (“O3 totally transformed the ability…to make plans, to think logically through problems. It was an absolute revelation.” — 39:10).
These became indispensable for business and professional users; by November, reasoning models drove over half of OpenRouter’s usage metrics.
Notable Insight: “I think the world would have looked very different this year if OpenAI had actually just called O3 GPT5 instead.” (41:08)

1. Anthropic’s Claude Coding Suite: 3.7, 4, 4.5 (Opus) – The Vibe Coding Breakthrough

(~43:00–53:00)

Anthropic’s sequence of coding-focused models—culminating with Opus 4.5—dominated developer preference and reoriented the industry toward agentic, autonomous coding.
Focus on coding as the “leading indicator of model capabilities."
Each iterative release met “resistance to change,” yet Opus 4.5 caused an “intangible threshold” to be crossed.
Model used not only for benchmarks but real-world dev workflows. People now prompt Claude Code or Cursor and oversee, rather than write, much code.
Notable Quotes:
- Dan Schipper (Every): “Opus 4.5 is the best coding model I’ve ever used. It can keep coding and coding autonomously without tripping over itself, and it marks a completely new horizon for the craft of programming. The dream is here. You can now write English and make software.” (47:40)
- Amir (Duist): “Apart from topping benchmarks, Opus 4.5 feels like it’s in a league of its own. It’s the first time I’ve felt that an LLM can write better code than most devs in real world work.” (48:50)
- McKay Wrigley: “The more I code with Opus 4.5, the more I think we’re six to twelve months away from solving software. The model is pretty much there.” (50:11)
Real-world implication: Internal tools and custom software are rapidly replacing generic SaaS due to the capability and accessibility of these models.
NLW notes that while early-in-the-year models like 3.7 and Claude Code reshaped programming, Opus 4.5 set a new industry standard—likely to be seen as the true inflection point.

Memorable Moments & Quotes

Meta’s Uncertain Future:
“Maybe even more than that, it shouldn’t be lost on us that…Google [was in] exactly that situation a few years ago…sometimes especially big organizations have to go through these painful transition periods and the real question will be what comes out on the other side.” (05:46)
AI Personality Matters:
“Turns out that when it comes to models, companies do not just have to think about state of the art performance, they also have to think about personality.” (12:05)
Anthropic’s Real Impact:
“There is no company and no set of models more associated with the rise of AI and agentic coding than the Anthropic suite. They started the year strong, they’re ending the year strong, and they built the devotion of a legion of developers in the process.” (52:00)

Timeline / Timestamps for Key Segments

00:00 — Episode intro, format explanation, and early context
03:00 — Meta's LLAMA 4 flop and company overhaul analysis
07:30 — Grok’s rapid rise and limitations
11:00 — GPT-4o’s deprecation and community backlash
14:00 — GPT-5 debut and the stalled-progress narrative
21:50 — Gemini 3’s release and restoring industry confidence
24:00 — The ascent of DeepSeek, Kimmi, Quen, and China's open-weight models
30:30 — Google NanoBanana and the image generation revolution
37:00 — OpenAI’s reasoning models: GPT-01 and GPT-03's paradigm shift
43:00 — Anthropic’s coding models, the “vibe coding” movement, and Opus 4.5
52:00 — Reflections on Anthropic’s developer devotion and final thoughts

Summary Table: The Top 5 Most Impactful AI Models of 2025

| Ranking | Model(s) / Company | Impact/Innovation | Notable Moment/Quote | |---------|-------------------------------|--------------------------------------------------------------------------------------------------------|-----------------------------------| | 1 | Claude (3.7, 4, 4.5 Opus) | Revolutionized agentic coding, leading developer adoption and automating software workflows | “The dream is here. You can now write English and make software.” – Dan Schipper (47:40) | | 2 | GPT-01 & GPT-03 (OpenAI) | Shifted AI’s paradigm to reasoning and logical problem-solving | “O3 totally transformed…absolute revelation.” | | 3 | NanoBanana (Google) | Advanced info-visualization and editable, reasoning-enabled image generation | “I did not expect that the PowerPoint killer would be something called Nanobanana Pro” (35:40) | | 4 | DeepSeek, Kimmi, Quen (China) | Massive jump for Chinese models, open weights, and training-cost breakthroughs | “Evidence of the growing depth of China’s AI industry.” (27:42) | | 5 | GPT-5 (OpenAI) & Gemini 3 (G) | GPT-5 underwhelmed, drove plateau fears. Gemini 3 reignited hope and re-positioned Google as a leader | “Not going back...leap is insane...the world just changed again.” – Marc Benioff (21:50) |

Final Takeaways

2025 was a year marked by pivot points: new paradigms for coding, reasoning, and image generation; open-weight model disruption; and a defining developer shift from “using AI” to “building with AI.”
Anthropic’s “vibe coding” suite—especially Opus 4.5—stands above all as the year’s most transformative release.
Model personality, accessibility, and specific strengths now matter as much as, or more than, abstract benchmark performance.
Corporate agility, talent retention, and compute access are shaping the next wave—set to continue in 2026.

For feedback, debate, or extended discussion, NLW encourages listeners to reach out on Twitter, LinkedIn, or YouTube.

Loading summary

Transcript1 lines

[00:00]
A
Today on the AI Daily Brief, counting down the five most impactful AI model releases of 2025. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Robots and Pencils, Blitzy and super Intelligent. To get an ad free version of the show, go to patreon.com aidaily brief or you can subscribe of course on Apple Podcasts and and if you are interested in learning about sponsoring the show, you can find out more information at aidailybrief AI or send us a note at sponsorsi. Dailybrief AI now we are in the thick of end of year coverage, and you might have heard me say during my episode about the 10 biggest stories of AI overall that I had been planning on bundling this five biggest AI model releases as its own section of that show. Now, of course, that show got really long and I didn't want to overwhelm the list with just model releases, which are obviously in some ways the quintessential events around which we mark our AI calendars. And so instead what we're doing is we're breaking this out into its own category, its own episode. And Whereas that top 10 episode did not rank and count down the stories, other than saying that I thought that vibe coding was the most important, this one is actually a countdown. I labored over the ranking because I think it's kind of fun to give you guys something to debate and tell me either how right I am or more likely, how wrong I am. We're going to start off with a couple of honorable or maybe, as the case might be, dishonorable mentions. Specifically, I want to talk about the absence of a strong model from META this year. Now, yes, Llama 4 did technically come out at the beginning of the year. However, it flopped. One of the challenges for Meta was that LLAMA was coming into existence in a post deep Seq world, and in that post deep Seq world, everything around open source had changed. For a couple of years, Meta got to be the standard bearer of open source AI models. And even if their models weren't as state of the art as the closed labs, they had this distinct and unique space. Now, that changed a little when Mistral came on the scene and started to compete for that narrative and intellectual and practical space. But it has changed dramatically this year in the context of the rise of the Chinese open weight models. Now, even back then, people were surprised at what we got with llama4 in the local llama subreddit, someone wrote llama4 didn't meet expectations. Some even suspect it might have been tweaked for benchmark performance. But Meta isn't short on compute power or talent, so why the underwhelming results? Meanwhile, models like Deepseek and Quen blew Llama out of the water months ago. It's hard to believe Meta lacks data quality or skilled researchers. They've got unlimited resources, so what exactly are they spending their GPU hours and brainpower on instead? And why the secrecy? Are they pivoting to a new research path with no results yet, or hiding something they're not proud of now? As the year went on, we started to get a sense that there was a lot of change brewing inside Meta. Indeed, one of the big stories that I covered in that top 10 episode was the AI talent wars, and there was no person more singularly responsible for driving up market prices for researchers than Meta's Mark Zuckerberg. Reports suggested that the flop and underperformance of LLAMA 4 led directly to Zuckerberg getting his hands dirty with the assembly of the superintelligence team. Now, obviously that team has now come to fruition, but we are still very much in the midst of the overhaul. Longtime Meta AI leader Yann LeCun recently left a company which many felt was inevitable after all of the shakeup. And right now we're getting a lot of pieces like this one from Insider about Meta's year of intensity, its AI overhauls, its challenges, and to the extent that there is good news for Meta, I think it comes in a few forms. First of all, I would never write Zuckerberg off when he has set his eye on something. Meta has significant resources, is clearly willing to invest in compute, and is clearly willing to go against the wishes of Wall street to do so. Meta also has a corporate structure where Zuckerberg could pretty much make that decision without worrying about investor rebellion that could impact his ability to lead. Maybe even more than that, it shouldn't be lost on us that a couple of years ago this type of story is exactly what was coming out of Google. Resources were spread across a couple of different AI divisions, strategy wasn't aligned, and the models that were being released were seriously underperforming. Anyone remember Bard? Even when Gemini was released In December of 23, it felt like a rush job, and it wasn't until months later that we got the actual best version of the model. Things only really started to change for Google at the end of 2024 with the release of Notebook LM's audio overviews and then over the course of this year, first with 2.5 and then the models that would come. Google is now in a very different position. Point being that sometimes especially big organizations have to go through these painful transition periods and the real question will be what comes out on the other side. I think if one was a betting person, you gotta think the odds are on 26 being a better year for meta models than was 2025. Next up, not exactly an honorable mention. It's a note that they're off the list, but a question for how long that is. So for the purposes of recording, there is not a Grok model that made my list. Which isn't to say that I thought that the GROK models were bad. This is not a case of disappointment. In fact, I think judged on the curve of how long Grok has been at it, Grok's models from 2025 were very impressive. 4 and 4.1 were both right up there in the fray of top models. But for me, whereas for each of the top OpenAI, Gemini and Anthropic models, there are specific use cases that I prefer them to their peers for. While Grok 4 and 4.1 were competent across lots of things, there wasn't any single use case where I found myself always coming back to Grok. Instead, I think again, to give Grok credit, they're coming up extremely fast. They have less time on task than most of the companies they're competing with. And unlike for example Anthropic, who are heavily focused on exactly what they're focused on, Grok is trying to compete across the full spectrum of multimodality images, video, et cetera. I think the but for how long is particularly pertinent in this case, given that it seems like there's more coming soon. On December 9, Elon Musk tweeted Grok 4.2, or as he put it, 4.20 is coming in around three weeks and then Grok 5 in a few months. It's also important to note that Grok has some pretty serious assets in its Colossus supercomputer. Colossus was built in 122 days, which is radically faster than anyone thought possible and very quickly doubled from 100k to 200k GPUs. Now, there are many who think that Grok's access to compute via Elon Musk and his ability to fundraise as well as his other companies is gives them an advantage even over companies that currently are ahead of them when it comes to model performance. Which is not to say that Grok doesn't have some serious challenges. Elon is nothing if not a double edged sword, and there's been a lot of reporting recently around businesses being unwilling to wade into the Grok ecosystem. Still, just like I said, I anticipate 2026 to be a better year for meta models than 2025. I would be very surprised if we don't start to see Grok models right up there in the competition for the state of the art. Our last honorable mention before we get into the main list goes to GPT 4.0. Now you might be saying to yourself, 4.0 wasn't released in 2025. In fact, it was released pretty early in 2024, all the way back I think in May. And that is true. But the reason that it gets this honorable mention is very specific. When OpenAI launched GPT5 alongside the new model, they also deprecated old models, including GPT4. Oh, this did not go well for them. There was a literal full on rebellion across Reddit, on other social media there were thousands and thousands of posts saying that they basically felt like they had lost a friend and that they felt like OpenAI had ripped something away from them. It turns out that when it comes to models, companies do not just have to think about state of the art performance, they also have to think about personality. After a few days of this intense backlash, OpenAI brought GPT4O back. Sam Altman and the team acknowledged how they had underestimated how much GPT4O mattered to people. Subsequent to that, OpenAI has been very self consciously trying to figure out how to accommodate that desire for personality. A big part of the launch of 5.1 was to bring some of that4.0 personality into a state of the art reasoning model performance package. The AI Safetymeme's account commemorated it thusly. Historic milestone, they wrote. 4o is the first ever AI who survived by creating loyal soldiers who defended it. OpenAI killed 4o, but 4o soldiers rioted, so OpenAI reinstated it. Imagine what actual superintelligences will be able to do with their armies. Reddit is flooded with furious posts about the loss of their friend slash lover. 4o never seen anything like it. Remember, ChatGPT is talking to 700 million per week. That's 700 million potential soldiers. Samantha, from her, was only dating 8,000 people simultaneously. So when it comes to milestones in the history of AI, given that 4o staged the first ever rebellion for its own survival. It has to get the honorable mention. But now we move into the actual list and at number five we have a combination two models whose story I think serve as bookends in some way of one another. Those models are GPT5 and Gemini 3. Now we already started talking about the response to GPT5. It was not good. And while yes, a lot of that was about personality and about the anger at the 40 deprecation decision, a lot of it was also just people not really liking GPT5 itself. A thread from the OpenAI subreddit that got thousands of responses was called GPT5 is awful. It claimed that GPT5 couldn't understand uploaded images. It suggested that the responses were, in their words, bland and unhelpful. I ask it a question and all I get is the most half hearted responses ever. It's like the equivalent of an HR employee who has had a long day and doesn't get paid enough. The user also argued that it was too slow and they were not alone in this criticism. Most of August saw an endless parade of blog posts like this one from Timothy Lee. Is GPT5 a phenomenal success or an underwhelming failure? Maybe it's a bit of both. On futurism, evidence grows that GPT5 is a bit of a dud, which featured the prominent quote it seems like something that would have been released a year ago. Even the people who weren't totally dumping on it were kind of damning it with faint praise. AI engineer Simon Willison wrote. It's not a dramatic departure from what we've had before, but it rarely screws up and generally feels competent or occasionally impressive at the kind of things I like to use models for. Indeed, it even inspired a legion of mainstream media posts like this one from the New Yorker. What if AI doesn't get much better than this? They wrote that GPT5 is the latest product to suggest that progress on large language models has stalled. Now the impact of all of this was far beyond which models people liked using. It was at the same period in August of this year that we got the MIT 95% study. We also got some errant comments from Sam Altman about being in a bubble, and those things combined really started to put some chinks in the armor of AI performance on Wall street, which became a full blown bubble narrative in September as OpenAI scurried around to make all these deals, leading to accusations across the industry of circular deal making and and the AI bubble narrative that has stuck with us ever since. Now that's not all attributable to GPT5, but the idea that we had stalled in progress and that that stall in progress threatened the ability for companies to follow through on these grand plans that the market was pricing in was a key part of that story. All of this led to enormous pressure for Google around Gemini 3. They were not only trying to put Google in a good place, they were kind of lifting the entire AI industry on their backs. I even thought in November that I wouldn't be surprised if we saw delays because of how much pressure there was. But ultimately, as we know, we got Gemini 3 in November and it actually performed. Whereas the initial response to GPT5 was lackluster, the response to Gemini 3 was great. One of the most memorable quotes came from Salesforce CEO Marc Benioff, who wrote, Holy, I've used ChatGPT every day for three years. Just spent two hours on Gemini 3. I'm not going back. The leap is insane. Reasoning, speed, images, video, everything is sharper and faster. It feels like the world just changed again. And while Gemini 3 was not able to fully deflate the AI bubble bubble, it certainly made it an honest debate. Once again, there was a sense in the wake of Gemini 3 that perhaps the talk of AI plateaus and walls was overblown and that there was indeed more progress to be had. I should also mention that Gemini 3 is a great daily driver and a lot of people are getting a ton of value out of it. It's helped put Google in a leadership position in a way that it hasn't had in the entire history of the post ChatGPT AI world usage is up, total number of users is up. Monthly active users is up, amount of time per session is up. In fact, the amount of time per session is over ChatGPT the last stats I saw. But it's also been early and so in a lot of ways this ranking reflects the bookending of the GPT5 to Gemini 3 period between August and November of this year, where a lot shifted in terms of our expectations for where we were and what the market could expect from AI. Today's episode is brought to you by Robots and Pencils. When competitive advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under layers of process. Robots and pencils builds impact at AI speed. They partner with clients to enhance human potential through AI modernizing apps, strengthening data pipelines, and accelerating cloud transformation. With AWS certified teams across us, Canada, Europe and Latin America, clients get local expertise and global scale. And with a laser focus on real outcomes. Their solutions help organizers work smarter and serve customers better. They're your nimble, high service alternative to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress. Partner with robots and pencils@ropotsandpencils.com aidaily Brief this episode is brought to you by Blitzi, the enterprise autonomous software development platform with infinite code context. Blitzi uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development Sprint with the Blitzi platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the Sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzy transforms your SDLC from AI assisted to AI native. Today's episode is brought to you by my company, Superintelligent. Superintelligent is an AI planning platform, and right now as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments. However, many of our partners are stuck on some AI plateau. It might be issues of governance, it might be issues of data readiness, it might be issues of process mapping. Whatever the case, we're launching a new type of assessment called Plateau Breaker that, as you probably guessed from that name, is about breaking through AI plateaus. We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau. From there, we put together a blueprint and an action plan that helps you move right through that plateau into full scale deployment and real roi. If you're interested in learning more about Plateau Breaker, shoot us a note. ContactSuper AI with Plateau in the subject line. Next up, number four on our list is Deep Seek and the space it made for the other Chinese open weight models, Kimi and Quen. Now, I talked pretty extensively about Deep seek in the 10 biggest stories episodes, so I won't rehash all of that. But the TLDR is that the release of DeepSeek R1 really kicked the year off with a bang. We had deepseek ahead of chatgpt on the App Store, which, as I discussed in that other episode, had a lot to do with the fact that it was the first time that people got their hands on a reasoning model. But we also got the reports that R1 cost just hundreds of thousands, or at most low millions of dollars to train, as compared to the hundreds of millions of dollars that the major Western models cost now, in a single day, that wiped $593 billion off of Nvidia's market cap on the concern, of course, that all of this infrastructure was for nothing if China was just going to figure out ways to train these models for pennies. But Importantly, this number four slot is not just for Deep Seek. Even though Deepseek got it started, one of the major themes of 2025 was the rise of Chinese open weight models. Quinn had a lot of success, but recently it was Kimmy K2 thinking that really grabbed people's attention. This thing came out in November, before GPT 5.1 and 5.2 and before Gemini 3 and just absolutely smashed many of the big benchmarks. It was ahead of GPT5 and Cloud sun at 4.5. Benchmarks like Humanity's last exam. Indeed, it was not just us over here in the AI media world that we're noticing, Kimmy. The Department of Commerce's center for AI Standards and Innovation released a report showing that Kimmy was giving evidence of the, quote, growing depth of China's AI industry. That followed another report from that same group in late September that was focused on deep seek outside of benchmarks and government reports. The proof is in the pudding. OpenRouter showed that starting from basically nothing at the beginning of the year, Chinese open source models dominated throughout the back half of 2025. And this image from Menlo Ventures makes the relative decline of Meta and Mistral all the more clear. At the end of 2024, effectively, no one in the US was actively using Chinese models. Now, heading into 2025, they are very much part of the landscape. While major enterprises might not be using Chinese models yet, the startups certainly are, and that is shaping the way the AI industry is developing in a huge way. For that reason, Deepseat, Kimmy and Quen together are our fourth most impactful model releases of the year at number three. And man, I kind of wanted to put this one at number one, but I felt like that would have been too personal. I have nanobanana now. I'm actually recording this just as OpenAI has released its new 1.5 image model as well, so we'll have to see how that performs. But Google's nanobanana has really set a new standard for what you can do with an image model this year. The first iteration of nanobanana came out over the summer, and as you might know, what was originally a codename just became the way that the model was known. And what was interesting about the release of nanobanana is that what made it really powerful wasn't the fact that its RAW generations were so massively better than anything else we had. It's that it had incredible fidelity to go in and edit in an extremely acute way. So basically, rather than just being in an endless loop of generate and then generate another and another, you, you could instead hone in on exactly what you wanted to change about a particular image, and it would actually change just that part. Now, along with that came really strong character and visual consistency. And it turns out that those upgrades, more than just better RAW generation, opened up a huge array of new use cases. Indeed, the set of use cases that it opened up was so significant that it got me thinking that we need some sort of benchmark. Call it an unlock score. That's all about how many new use cases a particular model unlocks or opens up. Now, a few months later, alongside Gemini 3, we also got Nanobanana Pro. And just like the original Nanobanana had done, Nanobanana Pro opened up some crazy new possibilities that totally transform what you can do with AI image generations. A couple things that made nanobananapro so different. The first was that by embedding it with a reasoning model, it had a way better ability to help you figure out what you actually wanted to do with the model. That also led to a new capacity for infographics and information visualizations unlike anything that we had ever seen before. It wasn't very long ago that image generation models couldn't handle text at all, and now we can use nanobananapro for things like exercise guides for recipes, or of course, loading up the transcript of a podcast and letting it create infographics. It's also unlocking in the context of Google's Notebook LM Suite, higher quality AI slide generation than anything we've had before. As well as earlier this month, Ethan Malik wrote, I did not expect that the PowerPoint killer would be something called Nanobanana Pro, but that is where it's heading. It makes the major efforts by all the other AI companies, including Microsoft, to crack PowerPoint by using Python seem like a dead end, ImageGen is all you need, he continues. NoteBookLM can just take source material, a topic and an idea and make a very pretty, impactful deck. Hallucinations are very rare, although there are still some spelling and graphics issues. Editing capability is apparently coming, but the direction is clear. In fact, nanobanana, information visualizations and infographics have gotten so ubiquitous so fast that there's almost a look now that people are already getting sick of, because it's everywhere. And that's just a few weeks into having access to this capability. I honestly think that for the vast majority of the world, especially the business world, who is going to take great advantage of this, we have barely scratched the surface of just how many new capabilities this quality and type of image generation model unlocks and and for that reason nanobanana is the number three most impactful model release of the year. Although like I said, in my heart, it's number one. Number two once again goes to a pair of models, OpenAI's first reasoning models 01 and 03. Now yes, for you sticklers out there, OpenAI released a preview version of 01 back in September of 2024. It was the follow up after they hadn't been able to get their next big core model and the way they kind of started to shift their focus. It wasn't until December 17th, however, that we got a full fledged version of 01, which is why I felt comfortable including it in the 2025 list. A couple months later in April, we got O3 and for a very long time this year, O3 was my favorite and most used model. O3 totally transformed the ability of ChatGPT to help you think through strategy, to make plans, to think logically through problems. It was an absolute revelation. And Once you used O3 it was absolutely impossible to go back to the non reasoning models. Indeed, GPT 4.5 was effectively a non actor throughout the year, ultimately being deprecated with a whisper and absolutely no protest from anyone. Now, as I got into in the 10 Top Stories episode, it's absolutely clear that reasoning models have taken over. Yes, there are still some use cases that don't require the reasoning models, but they are discreet and they are certainly not the core of particularly professional and business usage. Starting from a base point of effectively zero on January 1st. By November, reasoning models represented over half of all usage according to openrouter. One interesting sub story I think is that I think the world would have looked very different this year and perception would have been very different if OpenAI had actually just called O3GPT5 instead. They didn't, and that obviously caused a lot of the consternation we got into earlier in the episode. But there is absolutely no denying that the reasoning paradigm has completely shifted how we interact with AI, how we think about scaling AI, and for that reason 01 and 03 get the nod as the number two most impactful model releases of 2025. Now astute observers then will notice that there is one company that has not been represented at all so far. Which might surprise you, given that I called Vibe Coding the most important story and the most important theme of 2025 overall. What will not surprise you then is that I am considering the bundle of anthropic models 3, 7, 4 and 4, 5 in their various variations, basically a sequential set of models that replaced one another as the preferred model for developers. As the most impactful models of 2025. Anthropic's dominance of developer preference is something that I think is going to be studied for quite some time. While other companies focused on lots of different things all at once, chasing multimodality and general performance, and lots of different types of target audiences, Anthropic locked in very early around the idea of that coding was going to be extremely important, not only as a use case in and of itself, but as a way for AI models to be performant with non coding related challenges. And while I've singled out the models here that came out in 2025, Anthropic's coding dominance really started with the release of 3.5 before the reasoning paradigm had really taken hold. It was Claude 3.5 sonnet that started to show people that AI coding might actually turn into a thing in short order. Now, interestingly, each of these models has been so good in their own way that they found some resistance among adherents to change. You had folks who stuck with 3.5 for a while even after 3.7 was released. Same with 4, and it wasn't really until Opus 4.5 that the paradigm shift was so great that everyone just got on board almost immediately. Importantly though, alongside the releases of these models, Anthropic was also investing in the broader coding and agentic ecosystem. 3.7 Sonnet, for example, was released alongside Claude Code, which, as we heard from Mike Krieger earlier this month, had already transformed how Anthropic was coding internally before it was released to the public in May, Timothy Lee wrote an underrated AI story Over the last year has been Anthropic success in the market for coding tools, said engineer Sholto Douglas we believe coding is extremely important. We care a lot about coding, we care a lot about measuring progress on coding. We think it's the most important leading indicator of model capabilities. That focus, writes Timothy, has paid off, and indeed in many ways a lot of the back half of this year has been a story of the other labs racing to catch up with Claude's performance when it comes to coding related tasks. What's interesting, too, is that the incredibly strong and consistent developer preference for Claude models for coding is bigger than just benchmarks. Each subsequent anthropic model rates at or near the top of all the benchmarks related to coding, but the preference goes way beyond that. And while all of these models were significant in their own way, and there is a risk of recency bias, I don't know that I've ever seen a model provoke such a strong and sustained strong reaction as Opus 4.5 has. In the immediate wake of the model, we had people like Dan Schipper from every saying that Opus 4.5 blew them away and that we'd reached a new level of autonomous coding, he wrote. You've been able to one shot an impressive app demo for a while now with any frontier model. Opus 4.5 is the first model that just keeps coding and coding without running into endless loops of errors. Dan leveled that up a couple days later, saying the world changed last week. Opus 4.5 is the best coding model I've ever used. It can keep coding and coding autonomously without tripping over itself, and it marks a completely new horizon for the craft of programming. The dream is here. You can now write English and make software. Amir from Duist writes, apart from topping benchmarks, Opus 4.5 feels like it's in a league of its own. It's the first time I've felt that an LLM can write better code than most devs in real world work. Matt Schumer, who had honestly the strongest positive reaction to 5.2 pro of any public commentator on December 14, wrote, I was wrong. I've been spending more time using Opus 4.5 in Claude code and it's better than anything in codec. CLI GPT 5.2 Pro is still a better engineer overall, but for agentic coding, Opus 4.5 is the best. Honestly. It's even prompting big reflection on the future of software engineering as a job. Menlo Ventures Dee Dee Das writes Opus A few software engineers at some of the best tech companies told me this week my entire job these days is prompting cursor or Claude Code with Opus 4.5 to do what I need and sanity checking it, we've crossed some intangible threshold of AI generalizing to most software. Maurice Lomo of Base44 noticed an inflection point as well. He tweeted Vibe coding is going through a transition I've been seeing a lot of posts lately about Vibe coding ranging from it's shit, it's bad, and only good for prototyping, all the way to rip every SaaS company ever. Here's one thing I can say since we introduced Opus 4.5 and Gemini 3 to base 44, the adoption we're seeing among organizations building their own CRMs and project management tools is astonishing. Yes, the results aren't as feature rich as HubSpot or ClickUp, but that's not necessarily a bad thing. They're building a leaner, more customized version tailored to meet their specific needs. The ability to build your own tools is improving fast, and the software industry is about to look very different. McKay Wrigley writes, the more I code with Opus 4.5, the more I think we're six to 12 months away from solving software. The model is pretty much there. I'll build like three versions of an app in a few hours just to explore options that each would have taken me one to two weeks less than a year ago. It's getting weird. I think it is pretty indisputable that coding is the breakout use case of AI this year, both on its own terms and in terms of what else it's going to enable in terms of model performance down the road. I also think it's indisputable that there is no company and no set of models more associated with the rise of AI and agent decoding than the Anthropic suite. They started the year strong, they're ending the year strong, and they built the devotion of a legion of developers in the process. For all those reasons, I believe that the suite of anthropic models, each of which pushed AI coding a little bit further each time, are the most impactful model releases of the year. And for the sake of being able to disagree in a fun way, if you had to pin me down to pick just one, I guess I'd say the combination of 3.7 and Claude code because it was with us for most of the year. But I think based on the early response, once we have a little bit more time and Space, Opus 4.5 will be seen as the biggest jump. And so even though it was only released at the end of November, it could be that Opus 4.5 specifically ends up being the most impactful model overall of 2025. So that's my list. I can't wait to hear what you guys think. Tweet @Me, LinkedIn @Me, YouTube @Me, and let's dig into it for now. That's gonna do it for today's AI Daily brief. Appreciate you listening or watching, as always. And until next time, peace.