Summary5 min read

The AI Daily Brief: "A Huge Week for AI Models Gets Even Bigger"

Host: Nathaniel Whittemore (NLW)
Date: November 21, 2025

Episode Overview

This episode delivers a fast-paced roundup of a historic week for AI, spotlighting new model releases (OpenAI’s GPT-5.1 Pro and Codex Max), significant tech industry moves by Nvidia, emerging policy moves from the U.S. government, and the latest in generative AI’s impact on music. Nathaniel contextualizes these developments as not just technological milestones but as signals of shifting narratives around AI’s impact, the end of “bubble talk,” and sustained rapid progress.

Key Discussion Points and Insights

1. Nvidia Crushes “AI Bubble” Narrative with Massive Earnings

[01:06] Jensen Huang, Nvidia CEO: “There’s been a lot of talk about an AI bubble. From our vantage point, we see something very different.”
- Q3 revenue of $57B (+62% YoY), shattering Wall Street expectations.
- CFO Colette Kress: On track to $500B in sales by end of 2026, mainly from new Blackwell and Rubin chips.
- Nvidia is central to “every phase of AI—from pretraining to post training to inference.”
[02:46] Huang outlines three “massive platform shifts”:
1. CPU to GPU (accelerated computing)
2. Classic ML to Generative AI (transformational for foundational business apps)
3. Rise of “agentic and physical AI”—reasoning, planning, and tool use across industries (e.g., legal, medical, autonomous vehicles).
[05:45] Markets react enthusiastically: Nvidia stock up 4% overnight; key cloud customers rally 8–10%.
- Analyst Brian Mulberry: “Markets are reacting very positively to the news that there is no slack in AI momentum.”

2. AI Chips and Geopolitics: US Approves Chip Sales to Middle East

[07:00] U.S. okays sales of 35,000 advanced AI chips to UAE’s G42 and Saudi-owned Humane, but prohibits hardware resales to China.
- Part of broader investment deals; AI highlighted as a “cornerstone.”
- Notable Moment: Elon Musk makes a joke onstage with Jensen Huang about building a 500 “gigawatt” data center: “the 500 gigawatt one will have to wait as that'll be eight bazillion dollars.” [08:00]

3. Sweeping U.S. AI Initiatives Incoming

[09:10] President Trump set to announce the “Genesis Mission,” likened by DOE Chief of Staff Carl Koh to “the Manhattan Project or the space race.”
- Order likely to direct national labs to accelerate AI R&D via public-private partnerships.
[10:40] Upcoming executive order to block state-level AI regulation:
- Justice Department to challenge state laws as unconstitutional.
- Commerce Dept. to withhold broadband funding from states pursuing their own AI laws.
- Trump: “You'll have a one approval process to not have to go through 50 states.”

4. OpenAI Launches ChatGPT for Teachers

[11:45] Tools for K–12 teachers:
- Secure workspace, memory for curriculum, native integrations (Canva, M365).
- Free, unlimited GPT-5.1 access for verified U.S. teachers through summer 2027.
- “OpenAI is…showing off how features added this year help teachers save time and customize lessons.”

5. AI-Generated Music Takes Big Steps: Suno Raises $250M

[13:10] Suno’s $2.45B valuation; revenue at $200M, rapid growth in consumer use.
- No major record labels on the cap table; lawsuits persist over copyright.
- Lightspeed’s Michael Magnano: “Everyone loves music, but only a few could make music. Now everyone can make music.”

Deep Dive: The New OpenAI Models

6. OpenAI Strikes Back After Gemini 3 with Two Model Releases

a. GPT-5.1 Codex Max: Specialized for Coding

[16:10] Key Features:
- Advanced on reasoning, speed, and “token efficiency.”
- Designed explicitly for long, complex agentic coding tasks (project-scale refactoring, multi-hour agent loops).
- Compaction: Trains to operate coherently across “millions of tokens,” pruning history to manage vast context windows.
- Internal OpenAI metrics: 95% of engineers now use Codex weekly; those engineers ship 70% more PRs.
[19:15] Nathaniel: “Codex Max can work independently for hours at a time…in our internal evaluations we’ve observed Codex Max work on tasks for more than 24 hours.”
Codex Max outperforms GPT-5.1 Codex on industry-standard benchmarks (Suite Lancer, Terminal Bench):
- 30% fewer “thinking tokens” used for the same reasoning tasks.
- Achieves longer successful agentic task runtimes—a human 2h42m task completed with 50% success.
[22:00] Simon Willison: “Despite Codex Max, the bigger news today may actually be GPT-5 Pro…”

b. GPT-5.1 Pro: General Domain Workhorse

[23:45] Quiet launch—no major blog, just a tweet.
Daria Anutmaz: “[5.1 Pro]…raised the level of my favorite model…more self-contained, more visual and more accessible while still being deep.”
Matt Schumer: “It’s an effing monster, easily the most capable and impressive model I’ve ever used…Instruction following is the standout…It actually does what you ask for without going off the rails.”
- “For most day-to-day work, Gemini 3 is just better” (UI and integration still weaker than Google).
Simon Smith: “More like a human domain expert than 5Pro…clearer writing, better judgment, more emotionally aware…about a 10–15% jump over 5 Pro for [my] uses.”
- Still lacks on “professional-quality presentations or Excel spreadsheets.”
Ethan Molik: “OpenAI feels like it undersells GPT-5 Pro, which is still the model most likely to deliver serious value on very hard problems.”

Notable Quotes & Memorable Moments

Jensen Huang, Nvidia CEO [01:06]:
“There’s been a lot of talk about an AI bubble. From our vantage point, we see something very different.”
Carl Koh, DOE [09:50]:
“We see the Genesis mission as equivalent [to the Manhattan Project].”
Matt Schumer [24:55]:
“5.1 Pro is a slow, heavyweight reasoning model…It genuinely feels like a better reasoner than most humans…”
Michael Magnano, Lightspeed [14:40]:
“Everyone loves music, but only a few could make music. Now everyone can make music.”
Nathaniel Whittemore [28:05]:
“In some ways, this week wasn’t about competition, but about all the model companies…standing shoulder to shoulder and telling all of the skeptics: just wait to see what comes next.”

Important Timestamps

| Timestamp | Segment | |------------|-------------------------------------------------------| | 01:06 | Nvidia Earnings, “AI Bubble” address by Huang | | 07:00 | US AI chip sales to UAE & Saudi Arabia | | 09:10 | Trump’s “Genesis Mission” and federal AI preemption | | 11:45 | Launch of ChatGPT for Teachers | | 13:10 | Suno’s $250M raise and copyright/legal update | | 16:10 | OpenAI’s GPT-5.1 Codex Max deep dive | | 19:15 | Codex Max’s long-duration independent performance | | 22:00 | Reactions to model naming, launch focus | | 23:45 | GPT-5.1 Pro “quiet” launch and expert hands-on | | 28:05 | Host’s summary: shifting AI business narrative |

Conclusion: Shifting Narratives

Nathaniel closes by reflecting on the bigger picture—a recalibration of the “AI is a bubble” narrative in light of continual, meaningful leaps in model capability across industry leaders:

“We all just got even more new tools to play with.”
“This week… wasn’t about competition, but about all the model companies… telling skeptics: just wait to see what comes next.”

For listeners and readers alike, this episode offers a rich, first-hand look at the speed and breadth of developments redefining the AI landscape.

Loading summary

Transcript1 lines

[00:01]
A
Today on the AI Daily Brief, OpenAI drops two more advanced models, making this the best week for model releases in a very long time. And before that, in the headlines, Nvidia's blowout earnings absolutely smashed the AI Bubble. Bubble the AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Robo Robots and Pencils, Blitzy and super intelligent. To get an ad free version of the show go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. Again, it's just $2.99 a month for ad free. And if you're interested in sponsoring the show, shoot us a note at SponsorsIDaily Brief AI. Finally, if you are interested in our AI ROI benchmarking study, we are collecting data for just a few more days. Anyone who shares three use cases will get the extended report. You can find that @roisurvey AI welcome back to the AI Daily Brief Headlines Edition. All the Daily AI news you need in around five minutes. Boy, it is so clear to me that the combination of Gemini 3 and these new OpenAI 51 Pro and Max models, plus what we're about to hear from Nvidia is significantly putting a damper on the bubble in AI Bubble Talk in Nvidia had its earnings call yesterday and CEO Jensen Huang went right into it. Opening the call, he said, there's been a lot of talk about an AI bubble. From our vantage point, we see something very different. That very different looked like revenue up 62% compared to last year and reaching 57 billion for the quarter. Profit was a buck 30 per share and both of these key metrics beat Wall street expectations. CFO Colette Kress doubled down on Huang's suggestion that Nvidia could see $500 billion in sales next year. In the first 60 seconds of the call, she said, we currently have visibility to half a trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026. She added later, there's definitely an opportunity for us to have more. On top of the 500 billion that we announced, the number will grow. Now beyond the extremely strong numbers, Huang reinforced how central Nvidia is to every element of the AI stack. He said, we excel at every phase of AI from pre training to post training to inference. Indeed, he provided not just numbers to counter the narrative, but a new narrative. This framing has already been extremely resonant and so I think it's worth sharing his comments in a little bit more extensive detail. Jensen said the world is undergoing three massive platform shifts at once. The first time since the dawn of Moore's Law. The first transition is from CPU general purpose computing to GPU accelerated computing. As Moore's Law slows, the world has a massive investment in non AI software, from data processing to science and engineering simulations representing hundreds of billions of dollars in compute and cloud computing spend each year. Many of these applications, which once ran exclusively on CPUs are now rapidly shifting to CUDA. GPUs accelerated computing has reached a tipping point. Secondly, AI has also reached a tipping point and is transforming existing applications while enabling entirely new ones for existing applications. Generative AI is replacing classic machine learning in search ranking, recommender systems, ad targeting, click through prediction and content moderation, which are the very foundations of hyperscale infrastructure. Now, he said a new wave is rising. AI systems capable of reasoning, planning and using tools from coding assistants like Cursor and Claude Code to radiology tools like idoc, legal assistants like Harvey and AI chauffeurs like Tesla, FSD and Waymo. These systems mark the next frontier of computing. So there are three massive platform shifts. The transition to accelerated computing is foundational and necessary. The transition to generative AI is transformational and necessary, supercharging existing applications and business models and the transition to agentic and physical AI will be revolutionary, giving rise to new applications, companies, products and services. And to bring it back to Nvidia, he pointed out simply, Blackwell sales are off the charts and cloud GPUs are sold out. Compute demand keeps accelerating and compounding across training and inference, each growing exponentially. We've entered the virtuous cycle of AI. The AI ecosystem is scaling fast with more new foundation models, more AI startups, across more industries and in more countries. AI is going everywhere, doing everything all at once. Now keep in mind these record revenues came with zero sales into China and Nvidia is currently forecasting zero sales in perpetuity. Nvidia also responded directly to Michael Burry's short thesis regarding the rapid depreciation of chips, noting that A1 hundreds from six years ago are still in operation at 100% utilization rates. Ultimately, markets liked what they heard. Brian Mulberry of Zack's Investment Management said markets are reacting very positively to the news that there is no slack in AI momentum and indeed Nvidia stock was up 4% in overnight trading and the beaten down Neoclouds, Nebias Group and core weave were up 10% and 8% respectively. Vital knowledge wrote that the report, quote, should quiet the skeptics and help clear the path for a year end rally. There are certainly pockets of the AI space where valuations needed to take a breather, but Nvidia is not in that camp. Next up, staying on the chip theme, but moving a little bit geopolitical. The US has agreed to supply advanced AI chips into the Middle East. According to Bloomberg sources, the administration has approved the sale of 35,000 chips to UAE firm G42 and Saudi owned Humane. The chips form part of broader bilateral deals that include prohibitions on diverting hardware to China. The news comes, of course, as Saudi officials arrive in Washington for an investment forum. President Trump has said that $270 billion worth of deals are being signed between dozens of private companies, and while those deals do span multiple sectors, AI was of course one of the key cornerstones. Among the deals was a partnership between XAI and Humane to develop a 500 megawatt data center in Saudi Arabia using Nvidia chips. On stage with Jensen Huang, Elon Musk stumbled over the size of the announcement, quipping the 500 gigawatt one will have to wait as that'll be eight bazillion dollars now. We're expected to get a lot more on AI from the White House in the days to come. President Trump apparently plans to roll out a new AI initiative known as the Genesis Mission as part of an executive order to be announced on Monday. Speaking at a conference in Tennessee on Wednesday, Department of Energy Chief of Staff Carl Koh said the administration views the AI race as being just as important as the Manhattan Project or the space race. He said, we see the Genesis mission as equivalent. Coe didn't provide many further details, but said the order would likely direct national labs to do more work on emerging AI technologies and could include public private partnerships. In addition to the Genesis mission, the administration is planning an executive order that would ban states from passing their own AI regulation. According to a draft document leaked to the press, the executive order would empower the Justice Department to challenge state AI laws in court. Government lawyers would be instructed to argue that state laws are unconstitutional on the basis that they restrict interstate commerce. A new AI litigation task force would be established with the sole purpose of pursuing these lawsuits against the states. In addition, the Commerce Department would be ordered to withhold federal broadband funding to states that pass their own AI legislation. Trump hinted at the order during the investment conference on Wednesday, stating, we are going to work it so that you'll have a one approval process to not have to go through 50 states. Republican lawmakers are also looking to insert a moratorium on state AI laws into the must pass National Defense Authorization act, which will come to a vote in December. Moving out of the realm of the policy and into the practical OpenAI has launched ChatGPT for teachers. The new version of the ChatGPT UX features a secure workspace for teachers to create classroom materials and optimize their prep time. It also includes account management for school and district leaders to ensure compliance with privacy regulations. OpenAI is using the service to demonstrate how the features they've added this year can be utilized by teachers. They highlight the use of memory to ensure chatgpt remembers curriculum details and preferred formatting for lesson plans. Teachers will also be able to make use of new ChatGPT integrations like Canva and Microsoft365 to create presentations and documents natively in ChatGPT. OpenAI is also providing a prompt library designed to get teachers off to a fast start. The service will be provided for free to all verified US teachers K through 12 until the summer of 2027, including unlimited use of GPT5.1. Lastly, today, AI music startup Suno has officially raised another $250 million at a $2.45 billion valuation. The round was led by Menlo Ventures, with participation from Hollywood Media, Lightspeed, Matrix and Nvidia. Now, interestingly, the large record labels weren't included in this announcement and don't appear to be on Suno's cap table as of yet. Universal, Warner and Sony filed a copyright infringement lawsuit against Suno and Yudio in June of last year, and you might remember that Warner and Udio finalized their settlement on Wednesday, with the companies partnering on an AI remixing platform to be released next year. Earlier reports suggested Suno was also moving towards a settlement with the record labels looking for an equity stake as part of the deal. Instead, it appears that Suno will continue to fight the lawsuit on the basis that music generated by their models doesn't use samples and therefore doesn't infringe on copyright. Menlo's Didi Das writes, suno is so much more than a neat tool to generate music. Students use Suno to remember schoolwork, indie movie makers use it for soundtracks, parents customize birthday songs for their kids, and Suno songs even made top music charts. Now, in addition to the raise, Suno also disclosed that they'd reached $200 million in revenue. That puts them in the same echelon as lovable and replit as some of the fastest growing startups in AI. I did a whole episode about why Suno tells such an important story for AI. In short, the vast majority of that revenue is not spend that was previously going to working musicians heading over to Suno, although certainly with certain types of behavior that's part of it. Still, the vast majority is just individual consumer use because people love it. It is net new revenue for a net new behavior. Michael Magnano of Lightspeed writes, I see a lot of people on this website surprised by Suno's success. It's actually very simple. Everyone loves music, but only a few could make music. Now everyone can make music and I think he might be right. In any case, that is going to do it for today's headlines. Next up, the main episode. Meet Rovo, your AI powered teammate Rovo unleashes the potential of your team with AI powered search, chat and agents or build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights. From day one, Rovo is already built into Jira Confluence and Jira Service Management Standard, Premium and enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian get started at ROV as in victory o.com small, nimble teams beat bloated consulting every time. Robots and Pencils partners with organizations on intelligent cloud native systems powered by AI. They cover human needs, design AI solutions, and cut through complexity to deliver meaningful impact without the layers of bureaucracy. As an AWS Certified Partner, Robots and Pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the us, Canada, Europe and Latin America, clients gain local expertise and global scale as AI evolves. They ensure you keep peace with change, and that means faster results, measurable outcomes, and a partnership built to last. The right partner makes progress inevitable. Partner with Robots and pencils@ropotsandpencils.com aidaily Brief this episode is brought to you by Blizzi, the Enterprise autonomous software development platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzi platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the Sprint Public companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzy transforms your SDLC from AI assisted to AI native. Today's episode is brought to you by my company, Superintelligent. You've got a hundred what if ideas, but which one becomes an agent? Superintelligent maps every AI use case across your company and helps you create an agent plan that you can actually execute. We match opportunities to your tech stack, your data profile and your team. No more guesswork, just a clear path from pilot to production. If you want agents that deliver business outcomes, start with planning, go to BeSuper AI and sign up for a demo. Welcome back to the AI Daily Brief. Boy, did this turn into just a hell of a week. Today we're talking about OpenAI's response to Gemini 3, but we're also talking about what I think will start to happen in the wake of this week, which is a bit of a recalibration in the larger narrative around AI as well. First though, let's start with the new model releases. When we got GPT5.1, which frankly no one was really expecting, it became clear that OpenAI knew that Gemini 3 was coming out very, very soon. Now 5. One, as I've said numerous times, was a major update. It was not a nothing update at all. On the one hand, 5.1 brought more personality back to the model, trying to appeal to the 4o people who had been so mad when GPT5 came out and felt much more clinical to them. But it also has felt too many just frankly a big step up in capabilities from GPT5. I know on a personal level I have significantly increased the amount of time that I've been collaborating in a brainstorm and creative and strategic ideation capabilities since 5.1 dropped. Likewise, it was notable that the pre Gemini 3 drop did not include a pro version, leading many to speculate that that would be OpenAI's fast follow to Gemini 3. I'm not sure that people thought it would be this fast to follow though, and as it turns out, it was not just 5.1 pro that we got. But in fact, even more emphasis yesterday was placed on a new coding model, GPT 5.1 Codex Max. In their announcement post, OpenAI writes, GPT 5.1 Codex MAX is built on an update to our foundational reasoning model which which is trained on agentic tasks across software engineering, math research and more. GPT 5.1 Codex Max is faster, more intelligent and more token efficient at every stage of the development cycle and a new step towards becoming a reliable coding partner. Codex Max, they say, is built for long running detailed work and one of the big new innovations is this new process they call Compaction. They write it's our first model natively trained to operate across multiple context windows through a process called Compaction, coherently working over millions of tokens in a single task. This unlocks project scale refactors, deep debugging sessions and multi hour agent loops. In other words, this model is not only designed for raw capabilities, but it's designed to improve performance in the specific context in which it's going to operate as not just a coding assistant, but as an autonomous coding agent. Now, as with any model release, we got some benchmarks and remember this is a model that is very specifically designed for the purpose of coding. Introducing the benchmarks, they reinforced that it was trained on real world software engineering tasks including PR creation, code review and front end coding. And in so doing, Codex Max represents a major jump from 5.1 Codex high on both Suite Lancer as well as Terminal Bench. The value however isn't just an output, it's also in token efficiency. For example, they write on Sweep Bench verified Codex Max with medium Reasoning achieves better performance than GPT5.1 Codex with the same reasoning effort while using 30% fewer thinking tokens. They also announced that they're introducing a new extra high reasoning effort for non latency sensitive tasks, ie tasks that can run for a long period of time. Overall, then you're getting better results and more efficient performance. And it's clear from the blog post that this is a model that's designed to expand the universe of what's possible with AI and agentic coding. In a section called long running tasks, OpenAI writes, compaction enables Codex Max to complete tasks that would have previously failed due to context window limits such as complex refactors and long running agent loops by pruning its history while preserving the most important context over long horizons. The ability to sustain coherent work over long horizons is a foundational capability on the path towards more general, reliable AI systems. Ultimately, they claim that Codex Max can work independently for hours at a time. Indeed, they say, in our internal evaluations we've observed Codex Max work on tasks for more than 24 hours. They conclude Codex Max shows how far models have come in sustaining long horizon coding tasks, managing complex workflows, and producing high quality implementation with far fewer tokens. Finally, they clued with some statistics. Internally, they say 95% of their engineers use Codex weekly, and the engineers that do ship roughly 70% more pull requests since adopting Codex. So that's the official blog post. Other members of OpenAI's team focused on different parts. Researcher Noam Brown used it as a chance to reinforce a message which has been coming up all week. Pre training hasn't hit a wall, he writes, and neither has test time. Computer Ethan Molik points out in a theme we'll come back to 5.1 Codex was released six days ago. Now we have 5.1 Codex Max. The use of every naming scheme piled on top of each other, from version numbers to qualifiers like Max, makes it hard to see how big a deal each release is, but this looks like a big jump in ability. Peter Gostev tested it against a prompt to create an application that allows you to view the Golden Gate Bridge from various angles and said, this is definitely the best I ever got out of this type of prompt, by far. On Meter's measurement of long time horizon tasks, which is of course this chart that we've been following very closely as a more fast visual cue to understand shifts and capabilities, showed that Codex Max was able to complete tasks that take a human programmer 2 hours and 42 minutes, with a 50% success rate. That's 25 minutes longer than GPT5, which was the previous state of the art, although Grok4.1 and Gemini3 have not yet been tested. What all of this adds up to, by the way, on the Meter test, is that the time horizon for agentic capabilities is still doubling roughly every seven months, but due to a slight inflection point somewhere around the release of 03, the time horizon of capabilities for the state of the art has actually tripled since the release of Claude 3 Sonnet in February. Now, people have not had a lot of time to digest this, but a lot of folks are jumping on this idea of compaction and what it might mean for context windows in the long run. And indeed you get the sense that a lot of the innovations in Codex Max were basically OpenAI trying out things that it wants to bring to general purpose AI in what they perceive as the most competitive and highest value use case area right now, which is AI coding. Now Simon Willison pointed out, despite Codex Max, the quote bigger news today may actually be GPT5 Pro, although as he points out that one didn't even get a blog post, it just got this tweet OpenAI actually retweeted its announcement of GPT5.1 from last week, saying GPT5.1 Pro is rolling out today to all pro users. It delivers clearer, more capable answers for complex work with strong gains in writing, help, data, science and business tasks. Now, despite it not having a lot of release hullabaloo, there were some people who had early access to it. Professor Daria Anutmaz writes, I can confidently say 5.1Pro has raised the level of my favorite model, GPT5.O Pro by a significant notch. He gave an example where he asked both 5O and 5.1 Pro about the top unanswered questions in immunology, requesting that both models unpack each question clearly so that someone without an immunology degree could understand their importance. He concludes 5. One pro is clearly better in that someone without an immunology background can more easily understand these explanations, with the importance and potential payoff clearly spelled out. They are also more self contained, more visual and more accessible while still being deep content Creator Theo had Tweeted back on November 17, Just had my mind absolutely melted by redacted, can't wait to talk about it and responded yesterday OpenAI just quietly released GPT5.1 Pro and this is the redacted I was talking about. Matt Schumer did not mince words. He said, I've had access to GPT5.1 Pro for the last week. It's an effing monster, easily the most capable and impressive model I've ever used. But he says it's not all positive. His review ultimately is called an absolute monster, but trapped in the wrong interface, his summary reads, 5.1 Pro is a slow, heavyweight reasoning model. When given really tough problems, it feels smarter than anything else I've used. Instruction following is the standout. It actually does what you ask for without going off the rails for serious coding. It feels less like an assistant and more like a contract engineer working from a spec. It is ridiculously smart, it genuinely feels like a better reasoner than most humans, and I expect examples within days of it solving problems people thought were out of bounds for today's AI systems. However, he said there are still areas where it loses to Gemini 3 and there are interface issues. He writes, front end and UX design are still far worse than Gemini 3 and the biggest weakness is the interface. It lives in chatgpt, not in my ide, not wired into my existing tools. This friction is beyond limiting and frustrating, he says. For most day to day work, Gemini 3 is just better. Waiting 10 minutes for an answer in a separate interface is not ideal for anything that requires deep thought, planning and research. And anything that I need to get right on the first try, I reach for 5.1 Pro. Ethan Malik pointed out OpenAI feels like it undersells GPT5 Pro, which is still the model that is most likely to deliver serious value on very hard problems. Partially it is because these hard problems are complicated, so they're hard to describe to others. Now Ethan also points out the right comparison is probably not Gemini 3 but Gemini 3 deepthink. But still, it is interesting that 5 Pro has always had a bit of a shroud of mystery when it comes to the right use cases. One other person who had early access to 5.1 Pro is Simon Smith. He wrote, I was invited to alpha test 5.1Pro alongside experts in robotics, math, immunology, medicine, music and more. My focus was life science, commercial research and strategy, and some personal use cases. Having used five. One Pro for a few days, I find it more like a human domain expert than 5Pro, with clearer writing, better judgment, fewer tangents, stronger synthesis and more emotionally aware responses. I ran 5.1pro head to head against 5pro on work tasks like scientific literature synthesis, drug launch planning and social media analysis. I also tried it for personal financial planning and even journaling. It was more rigorous and comprehensive in research and planning, stronger at reasoning, better at staying on track and avoiding tangents and in at least one case associated errors much clearer, more confident, more empathetic in its communication style. Now he does point out that it's still bad at certain things. He said that it's not good at creating professional quality presentations or Excel spreadsheets. And he said, I saw that at least one tester found the model conservatively avoided tackling known open problems in STEM domains, choosing instead to explain why they're open problems. Ultimately, he says it's about a 10 to 15% jump over 5 Pro for the types of things he uses it for. And he says knowing OpenAI's focus on real world performance like GDP VAL and reports of IT hiring domain experts in fields like finance, I think human domain expertise is exactly what they're going for. And with 5.1 Pro they're getting closer. This bodes well for AI doing even more impactful work in 2026. Now to Zoom out here. I think the obvious surface level story is something like OpenAI cracks back in the week that Google wanted to dominate with Gemini 3, and to some extent that's the case. Although it's pretty clear that OpenAI is not trying to steal Gemini's general thunder with this, or at least knows that it's not possible with these models. But instead they chose to release the two update models that are most specifically about very discrete types of work. They are showing off some new approaches, or at least newly named approaches like this compaction, that hint at where the future of general models is headed and suggest that there is still much, much more territory to be claimed. Indeed. Interestingly, I think that these releases in a weird way are much less about trying to win back momentum from Google, and much more about leaning into Google's momentum more broadly. Taken alongside Nvidia's earnings report, you can feel the embers of a little bit of a shift in the AI narrative. For a couple of months now, markets have been flirting with the idea that AI is just a big bubble, and one of the things that they've been looking for as evidence is of course, plateaus or walls in the ability of these models to continue to improve. The story of this week, as investor Gavin Baker points out, is that Gemini 3 shows that scaling laws for pre training are intact. He says this is the most important AI data point since the release of 01. Now he gets into why that is, which is a topic that we'll explore in an episode later this week. But for our purposes here today, I think that takeaway one from these new models from OpenAI is that we all just got even more new tools to play with. And two in some ways this week wasn't about competition, but about all the model Companies, including Grok with 4:1 standing shoulder to shoulder and telling all of the skeptics just wait to see what comes next. That's going to do it for today's AI Daily Brief. Thanks for listening or watching. As always. And until next time, peace.