The AI Daily Brief: “Does Gemini 3.1 Pro Matter?”

Host: Nathaniel Whittemore (NLW)
Date: February 20, 2026
Episode Theme:
A critical look at the arrival of Google’s Gemini 3.1 Pro model, its place within the current generative AI landscape, and whether incremental state-of-the-art (SOTA) updates matter given the frenzied pace of model releases and ecosystem shifts.

Overview

This episode places the launch of Google’s Gemini 3.1 Pro within the broader context of:

Rapid, incremental major model releases from the largest AI labs
The diminishing significance of absolute SOTA benchmark performance
Shifting focus to model-specific strengths, cost efficiency, and unique use cases, especially in multimodality NLW also covers the latest AI industry news from India’s AI Impact Summit, Walmart and Amazon’s AI plans, and Accenture’s AI mandates, setting the scene for how Gemini’s new release fits into the evolving enterprise and competitive landscape.

Key Discussion Points and Insights

1. AI Impact Summit in India – Global AI Power Shifts

[02:19 – 10:40]

First time the summit held in a developing country; signals a call for reducing AI inequality.

“AI must belong to everyone. AI must be accessible to everyone. AI must benefit everyone. AI must be safe for everyone. Let's build AI for everyone.” – UN Secretary General Antonio Guterres [04:01]
Major investments: Adani/Reliance ($100B+ each for data centers), Indian government ($1.1B).
Media drama: Sam Altman & Dario Amodei’s awkward hand-holding snub reflects deepening OpenAI vs Anthropic rivalry.
Viral moment: Chart suggesting Anthropic will overtake OpenAI in revenue by mid-2026; Altman’s nuanced take on AI job loss:

“I don't know what the exact percentage is, but there's some AI washing where people are blaming AI for layoffs that they would otherwise do, and then there's some real displacement by AI.” – Sam Altman [09:05]
NLW’s skepticism of grand events:

“It’s about the silly photo op of the arms up of all these people, which was incredibly awkward and weird … the less time you spend caring about what's said at events like this and the more time you spend on building things, the better off you're going to be.” [10:24]

2. Corporates and AI: Walmart, Amazon, Accenture

[10:40 – 20:54]

Walmart: AI as key to growth. Their shopping assistant ‘Sparky’ boosts online basket size by 35%.

“Sparky is essentially helping us evolve from traditional search to intent driven commerce. From an economic standpoint, better discovery and higher conversion translates into bigger baskets and greater frequency.” – David Gugina, Walmart US CEO [12:58]
Amazon: Now tracking AI tool use by employees via ‘Clarity’ system; AI use metrics increasingly tied to performance.
Accenture: AI usage mandated for promotion. Pushback from some who think the tools aren't ready or useful.

“If these tools were actually useful, people will just use them. You don't need to track logins and tie them to promotions. The fact that companies are resorting to this tells me adoption isn’t happening organically.” – @hedgymarketsonx [19:11]
NLW’s take: Biggest bottleneck is time—staff don’t have time to learn new tools without dedicated time carved out.

“…they simply expect people to figure out that time on their own. That creates a situation where people feel negatively about these tools because they’re just another layer of stuff they have to do…” [20:10]

3. Gemini 3.1 Pro: Placing Yet Another Model Release in Perspective

[24:13 – 50:42]

The State of Model Releases

We’re now in a weekly/biweekly “state-of-the-art handoff” cycle between OpenAI, Anthropic, Google, xAI.

“There is ... a circular chart that starts OpenAI introducing the world’s most powerful model, moves to Grok... Gemini... Anthropic... and then back to OpenAI ... At this stage, state of the art ... feels less significant ... than it ever has before.” [26:33]
The “best” model is increasingly use case dependent.

Gemini’s Place in the Ecosystem

Coding use case: Until now, Gemini “really nowhere in the conversation” compared to OpenAI and Anthropic.
Usage vs. preference: 80% tried Gemini in January, but only 16.1% reported it as their primary model.
Key value: Gemini’s strength is not universal, but situational—i.e., excels in specific multimodal or creative workflows.

Benchmarks and Cost Efficiency

Gemini 3.1 Pro jumps to #1 on multiple benchmarks:
- Near top on “Humanity’s Last Exam,” GPQA Diamond, Terminal Bench 2.0
- Massive jump on Arc AGI 2: 31.1% (Gemini 3) → 77.1% (3.1 Pro), and at a low cost per task
Cost is the differentiator:

“3.1 pro achieved that score at less than a buck a task...” [36:27] Artificial Analysis: “Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index four points ahead of Cloud Opus 4.6, while costing less than half as much to run.” [38:00]
Still, slightly trailing in real-world "agentic" tasks on GDP VAL test, fueling speculation about Google’s focus.

Early User Reactions

Mostly positive, especially for design and creative uses:

“Loving Gemini 3.1 Pro, it made three huge improvements to my compiler and saw things that even ChatGPT 5.2 Pro Extended and Claude Opus 4.6 Extended couldn’t see.” – Eric Hartford [33:11] “Gemini 3.1 Pro is an absolute beast for creating landing pages. It understands design details and animations so well. Insane upgrade for web designers.” – Meng Tu [33:22]
Distribution is Google’s real “moat”:

“…benchmark leadership lasts weeks, not quarters. OpenAI, Anthropic and Google are all within single digit percentage points of each other … but Google has 2 billion Chrome users, Android, workspace and cloud. That's the real moat ... not the 77.1%.” – Akash Gupta [43:27]

Notable Quotes & Memorable Moments

On AI Inequality and Symbolism

“This was their big moment on the global stage and perhaps an inflection point for one and a half billion people who will have to figure out their place in the new AI-shaped economy. And yet ... nothing will change.” – Quoting Sean Wang, aka Swix [10:02]
On Model Update Fatigue

“It’s getting a little hard to say interesting things with all the round robin minor version updates at Frontier models every week. Gemini 3.1 Pro seems like a decent enough advance...” – Leighton Space [48:19]
On True Model Differentiation

“What’s important is to try to understand what it does uniquely well … from being able to do much more technically and scientifically advanced work to being at the core of products that aren’t possible with the other models.” – NLW [49:42]

Timestamps for Important Segments

[02:19 – 10:40] — AI Impact Summit recap, global AI policy, India’s ambitions, OpenAI/Anthropic rivalry
[10:40 – 20:54] — Walmart’s AI transformation, Amazon’s AI employee monitoring, Accenture’s “AI or out” promotion policy, NLW’s commentary on enterprise AI bottlenecks
[24:13 – 26:33] — The new normal of constant, incremental model updates; the “SOTA merry-go-round”
[26:33 – 34:00] — Gemini’s lag in coding, usage statistics, responses from Google/DeepMind leadership
[34:00 – 43:27] — Performance on benchmarks (Arc AGI 2, cost-per-task), Artificial Analysis ranking, cost vs. capability
[43:27 – 48:19] — User testimonials, Google’s moat (distribution & ecosystem), Productization of multimodality (Pumeli’s Photoshoot, Replit Animation)
[48:50 – End] — Reflections on what makes a model matter today, why unique strengths surpass overall SOTA, closing thoughts

Conclusion

Does Gemini 3.1 Pro matter? Yes, but not solely for taking the SOTA crown. NLW argues its major advances are (1) cost/performance at scale; (2) flexing the boundaries of multimodal, creative, and technical work; and (3) serving key pockets of differentiated use cases not dominated by rivals.
Key takeaway: As major lab models converge in quality, usefulness comes down to strengths in specific tasks and integration with larger ecosystems—understanding “what [each model] does uniquely well” is now the primary value for enterprises and developers.
NLW’s advice: Go deep on where a model excels and build around it; the age of “one AI to rule them all” is over.

Listen if:
You want to understand not just the what of Gemini 3.1 Pro, but the why—why its release matters in a crowded, fast-shifting AI landscape, and how Google is wagering its competitive edge on much more than raw intelligence statistics.

Transcript

A (0:00)

Today on AAI Daily Brief Gemini 3.1 Pro is here and I think its point is to flex Multimodal. Before that in the headlines, a lot of talk about AI in India, but is there anything worth listening to? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG Insight Wise, super intelligent and Blitzy. To get an ad free version of the show go to patreon.com aidelibrief or you can subscribe on Apple Podcasts. To learn about sponsoring the show. Send us a Note@ SponsorsIDailyBrief AI and of course one more quick reminder about the projects that we launched this week. Claw Camp, a free self directed program to build an agent team using OpenClaw. We have kicked off the first four week sprint, so come join about 3500 of your best friends in becoming an Agent Boss. Meanwhile, for the enterprises out there who want to figure out how to use OpenClaw and other systems to build agent teams and change how you do things, we've got an executive Sprint coming up. I will be sending more information at the very beginning of next week, so if you are interested in that, check out EnterpriseClaw AI. Lastly, if you want the single coolest job of all time, come apply to be our clarkitect and work on agentic vibe coding projects with me across the AI DB ecosystem. As always, all of this information is linked at aidailybrief AI for Easy Finding Today we start with the AI Impact Summit. It's a gathering in New Delhi that has brought together world leaders and AI executives. This is the first time the event has been held in a developing country, with previous iterations hosted in the uk, France and South Korea. The selection of India as the host country was symbolically important, allowing the event to platform a political call to address AI inequality. Earlier in the week, a UN report highlighted that AI adoption is still growing more rapidly in the developed world, risking a permanent technological divide, UN Secretary General Antonio Guterres wrote in an X post, the future of AI cannot be decided by a handful of countries or left to the whims of a few billionaires. AI must belong to everyone. AI must be accessible to everyone. AI must benefit everyone. AI must be safe for everyone. Let's build AI for everyone. In a follow up post, he called for a global fund on AI to quote, build skills, data, affordable computing power and inclusive ecosystems everywhere. Now this is one of the first times we've heard world leaders proclaim the need to deliver affordable AI to the Global South. Until now, the discussions have largely been about national or regional interests. By way of example, last year's summit in Paris was squarely focused on European leaders establishing the need to invest and compete in the AI race. This year's event was a shift towards recognizing the need to treat AI as a global public good. The other big theme of the summit was India itself, declaring their ambition to become a global AI power. The event featured huge investment commitments from Adani and Reliance Industries, who will each spend more than $100 billion on local data centers over the coming decade. The Indian government also earmarked a $1.1 billion fund to the efforts. Aside from global leaders, the summit also saw tech leaders fly in, including Google CEO Sundarpi Chai, DeepMind CEO Demis Hasabis, and Mistral CEO Arthur Mensch, slightly overshadowing other things going on at the event. Bill Gates canceled his keynote because of continued scrutiny over his appearance in the Epstein Files. And yet, still with all of that, all eyes were on Sam Altman and Dario Amadei, specifically on one moment where more than a dozen tech leaders joined Prime Minister Modi on stage. The leaders joined hand and raised their arms in celebration, save for Altman and Amade, who refused to hold hands bef. Jesos broke down the tape and determined that Dario had been the one to refuse to hold Altman's hand. But regardless of who instigated, the moment, reflected just how bitter the rivalry has become. While the two were on stage, a chart from Epic AI went viral, suggesting that Anthropic is on a pace to overtake OpenAI in revenue terms by the middle of this year. So with that bombastic framing established, the two AI rivals took to the stage and delivered vastly contrasting speeches. Dario, it must be said, ummed and ahed his way through a generic and well trodden narrative read from an iPhone screen. He said nothing he hadn't said before, and many people commented on just how bad it looked for him to be reading off his iPhone, wrote terminally online engineer on X the oral loss is crazy. I take back everything good I said about Anthropic. Altman was more eloquent discussing how the fundamental uncertainty of AI interacts with global issues of democracy, social contracts and job loss. His major call to action was for global leaders to continue iterative deployment and allowed people to access each successive layer of the technology as it unfolded offstage. In an interview with cnbc, Altman expressed skepticism over the present fear of AI job loss, remarking I don't know what the exact percentage is, but there's some AI washing where people are blaming AI for layoffs that they would otherwise do, and then there's some real displacement by AI of different kinds of jobs. Now it is difficult for me to take very seriously these global talk fests. I guess theoretically, sometimes genuine action arises from them, but mostly the model is that world leaders arrive, exchange platitudes about the state of the world, and then return to doing exactly what they were already doing. It's about the silly photo op of the arms up of all these people, which was incredibly awkward and weird. Even if there hadn't been the scruffle between Sam and Daario, Sean Wang, AKA Swix really nailed it. In a post he called why do AI conferences keep not getting AI? He wrote, I feel for my brothers and sisters in India. This was their big moment on the global stage and perhaps an inflection point for one and a half billion people who will have to figure out their place in the new AI shaped economy. And yet the powers that be decisively demonstrated that nothing will change. They care more about bad photo ops and hobnobbing with celebrities than they care about the builders that are supposed to drive the Indian AI economy forward. Ultimately, I think the less time you spend caring about what's said at events like this and the more time you spend on building things, the better off you're going to be. Still, we had a huge portion of the big tech AI leaders and a number of sovereign leaders as well, so we couldn't let it pass completely undiscussed. Next up, we shift over to Business World, where Walmart is turning to AI as their next big growth driver after a soft earnings result. Past quarter has been a mixed bag for Walmart. They briefly achieved the milestone of becoming a trillion dollar company. However, they also lost the crown as the world's largest company by revenue to Amazon after 17 years on top. This week's earnings report guided lower earnings and revenue growth for the coming year, reflecting the shaky position of the consumer economy. And yet, in spite of or perhaps because of that, the earnings call focused heavily on Walmart's AI transformation strategy. Newly installed CEO John Furner said, the way we're using technology and AI is helping us create great customer solutions, reduce friction, simplify decision making and pinpoint where our inventory is, all while maintaining the trust we've earned from our customers and members. Now Walmart has of course been rolling out AI into every corner of their business over the past couple of years for flagged that their shopping assistant Sparky has shown early promise and will become core to their strategy moving forward. He reported that around half of Walmart's online customers have used Sparky and that those using the assistant ordered 35% more than those who didn't. US CEO and President David Gugina noted that AI is driving a complete transformation in the way that Walmart thinks about their business. He said sparky is essentially helping us evolve from traditional search to intent driven commerce. From an economic standpoint, better discovery and higher conversion translates into bigger baskets and greater frequency. Sparky is helping customers find the things they need, they want and they love, and it's strengthening our digital unit economics as it scales. Next up, moving over to the company that dethroned Walmart off the top of the Fortune 500, Amazon is keeping a close eye on AI adoption with new metrics in their employee tracking system. The information reports that Amazon has been using an internal system called Clarity to measure various elements of AI tool use within the company. The system, which is also used to measure other elements of employee performance, is now being used to track overall AI usage by teams as well as which tools are seeing the most use. The monitoring doesn't just include Amazon's in house tools, but also external AI products that staff are encouraged to use. The tracking goes well beyond software engineering and standard white collar functions, with Amazon also keeping tabs on how the company's supply chain optimization team is making use of AI. While Amazon has maintained that AI was not the direct cause of their massive recent layoffs, the framing of the assessment certainly implies a push to realize AI productivity gains. Employees are asked how they have, quote, accomplished more with less and for specific examples where they have remained innovative, force, multiplied using AI and delivered results while reducing or not growing headcount. Moving over to the big consulting world, Accenture is laying down the law when it comes to AI use in the workplace, telling senior managers that no AI, no promotion. The consulting giant has begun collecting data on how some senior employees use AI tools and explicitly tied the metrics to career progression. According to an email viewed by the Financial Times, Accenture has told staff that promotion to leadership roles will require regular adoption of AI. You might remember that Accenture embarked last year on one of the more ambitious AI upscaling projects. At the time, CEO Julie Sweet said that the staff who failed to adopt AI workflows will be exited from the company. This week's email reinforced that initial training is now over and use of AI is a fundamental requirement of the job. It stated. Use of our key tools will be a visible input to talent discussions during the summer promotion cycle. In their story about this, Financial Times noted that AI holdouts are becoming a major problem across the consulting industry. Three executives at Big four accounting and consulting firms said that convincing senior managers and partners to use AI has been a much more difficult task than introducing the tools to junior staff. One executive said that older, more senior figures at the firms are more set in their ways, requiring a carrot and stick approach. It'll be interesting to see how much internal resistance they find. One person familiar with the policy change said they would, quote, quit immediately if it affected them, while another source criticized the quality of the tools deployed at Accenture, describing them as broken slop generators. In a press statement, Accenture explained the need to keep pushing commenting our strategy is to be the reinvention partner of choice for our clients and to be the most client focused AI enabled great place to work that requires the adoption of the latest tools and technologies to serve our clients most effectively. And to understand why you only need glance at Accenture's share price. The stock is down 17% year to date and 45% over the past year. Now this is pretty interesting to me as a bellwether of where corporations might go. I think hedgy@ hedgymarketsonx probably sums up the feeling of a lot of folks when he writes, if these tools were actually useful, people will just use them. You don't need to track logins and tie them to promotions. The fact that companies are resorting to this tells me adoption isn't happening organically, which raises questions about whether the tools are delivering value or just generating metrics for leadership to point at. I don't think this is necessarily a super cynical take, but I do think it's wrong. The biggest issue that we find across all of our surveys at AI Daily Brief, as well as everything we do at Superintelligent, is the problem of time. People inside enterprises report that they don't have time to learn the technology that would save them time. And unfortunately, the vast majority of companies we interact with don't create specific time carve outs for their people to learn how to use these tools. They simply expect people to figure out that time on their own. That creates a situation where people feel negatively about these tools because they're just another layer of stuff that they have to do, which creates the need for mandates like this. Now, to the extent we're talking about tool quality, I do think that in many corporations there is an issue of the tools that are approved for work being pretty far behind what people have access to in their personal lives. Probably the second most frequent complaint we see outside of the I don't have time to learn this stuff is at home I'm using Opus 4.6, and at work I have a terrible old version of Copilot. In any case, I do think that to some extent Accenture is an extreme example of this because of the point that they're making that if they are in the business of bringing this new technology to their people, they really kind of need to know about it. But I wouldn't be surprised to see more mandates like this in the months and year to come. For now, though, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Agentic AI is powering a $3 trillion productivity revolution and leaders are hitting a real decision point. Do you build your own AI agents, buy, off the shelf or borrow by partnering to scale faster? KPMG's latest thought leadership paper, Agentic AI Navigating the Build, buy or borrow decision does a great job cutting through the noise with a practical framework to help you choose based on value, risk and readiness and how to scale agents with the right Trust, Governance and Orchestration Foundation. Don't lock in the wrong model. You can download the paper right now at www.kpmg.usnavigate again, that's www.kpmg.usNavigate. as a consultant, responding to proposals can often feel like playing tennis against a wall. You're serving against yourself, trying to guess what the client really wants. That all changes with Insight Wise. Now you've got an AI proposals engine that thinks just like your client. It returns to the brief time and time again, picking apart your work, identifying key evaluation criteria and win themes, and making recommendations to ensure you stand out. Suddenly you're on center court, but this time you've got a secret weapon. Insight Wise gets rid of all the time consuming manual work so you can focus on winning more business more often. Generate reports, pull insights from your own data, build competitive advantage, and go to sleep before 2am when it comes to proposals, you only get one shot with insight wise. Make yours an ace. Today's episode is brought to you by my company, Superintelligent. In 2026, one of the key themes in enterprise AI if not the key theme is going to be how good is the infrastructure into which you are putting AI in agents. Superintelligence Agent readiness audits are specifically designed to help you figure out 1 where and how AI and agents can maximize business impact for you and 2 what you need to do to set up your organization to be best able to leverage those new gains. If you want to truly take advantage of how AI and agents can not only enhance productivity, but actually fundamentally change outcomes in measurable ways in your business this year, go to be super AI blitzy is driving over 5x engineering velocity for large scale enterprises. A publicly traded insurance provider leveraged Blitzi to build a bespoke payments processing application, an estimated 13 month project and with Blitzi the application was completed and live in production in six weeks. A publicly traded vertical SaaS provider used Blitzi to extract services from a 500,000 line monolith without disrupting production 21 times faster than their pre Blitzy estimates. These aren't experiments. This is how the world's most innovative enterprises are shipping software in 2026. You can hear directly about Blitzi from other Fortune 500 CTROs on the Modern CTO or CIO classified podcasts. To learn more about how Blizzi can impact your SDLC, book a meeting with an AI solutions consultant@blitzi.com that's blitzy.com. Welcome Back to the AI Daily Brief. Today we are talking about Gemini 3.1 Pro, but I want to situate it in a larger question and I will start by saying sorry to Google for drawing the short end of the episode naming straw on this one. If it had been OpenAI that released 5.3, it would have been something very similar. The context we now all operate in is one where instead of getting big model releases infrequently, we get very incremental model releases much more frequently. There is in fact this meme which came from 2025, but which is more true than ever, which is a circular chart that starts OpenAI introducing the world's most powerful model that moves to Grok introducing the world's most powerful model that moves to Gemini, introducing the world's most powerful model that moves to Anthropic, introducing the world's most powerful model that moves to OpenAI introducing the world's most powerful model and so on in that with the release of 3.1 Pro, we are now at the Gemini section of that chart. And the point of course is that at this stage, state of the art in terms of incremental gains on benchmarks feels less significant as a barometer of a model's importance than it ever has before. When people say what is the best model? It is not only constantly shifting, but also, I think in practice, a question that is use case dependent. So let's talk about Gemini 3.1 Pro, their first reactions, both good and bad, and then try to figure out where it fits in the ecosystem of models. Now it is worth pointing out that I think Gemini was absolutely due for a bit of an upgrade. The conversation for pretty much all of 2026 and really heading back into the end of 2025 has been dominated by Anthropic vs OpenAI or more specifically Codex vs Claude Code. Despite Gemini 3 having such wide acclaim when it came out towards the end of last year, Google and Gemini have been really nowhere in the conversation when it comes to this incredibly important use case of coding. Now it is worth noting that there are lots of different categories of AI users and it is not the case that for all of them coding is what matters. It would be completely reasonable, in other words, for Google to put its priority in other areas. However, it certainly doesn't seem like Google is specifically not trying to compete in that area. They're clearly investing a lot in Google AI Studio and Anti Gravity. But when it comes at least to the most enfranchised subset of users, they were kind of, at least in our recent survey results, a distinct third. All of the big models Claude, ChatGPT and Gemini had some broad usage. In our January AI Usage survey, Gemini in fact matched Claude with 80% of respondents having used it sometime last month, both falling slightly behind ChatGPT, which was at 87%. However, in terms of the number of people reporting that it was their primary model, Gemini was down in third at 16.1%. And at first blush there is a lot to be impressed with. Gemini 3.1 Pro Going by the benchmarks, it is a distinct number one when it comes to humanity's last exam. Not using tools sets a new high for the GPQA Diamond Scientific knowledge benchmark sees a big jump up for Gemini on Terminal Bench 2.0 coming in ahead of Opus 4.6. And while it wasn't ahead of Opus 4.6 on sweep bench Verified Agent Decoding test, it was nipping at its heels 80.6% compared to 80.8%. The biggest jump and the one that a lot of folks are talking about was on Arc AGI 2, while Opus 4.6 scored a 68.8% on that test. The jump between Gemini 3 Pro and Gemini 3.1 Pro was from a 31.1% with Gemini 3 to 77.1% on Gemini 3.1 Pro. Google CEO Sundar Pichai says Gemini 3.1 Pro is great for super complex tasks like visualizing difficult concepts, synthesizing data into a single view, or bringing creative projects to life. Demis Hassabis points to major improvements in core reasoning and problem solving. Google VP Josh Woodward calls out who they want the model to appeal to. Writing to the scientist, the engineer and the developer, Gemini 3.1 Pro has arrived. It's a significant leap in complex reasoning, and once again he points to ARC AGI 2. So it's great at agentic tasks, intricate coding and data synthesis projects. You should see fewer errors, better logic, and surprisingly good SVGs. Attached to the post is an animated image of a seal bouncing a beach ball on its nose. So what are the first impressions? The model is still rolling out, and it's only available in certain pockets of the Google ecosystem, which by the way, is its own challenge that people like Ethan Mollick have pointed out that the Google ecosystem of AI is so diverse that it's sometimes hard to wrap your head around what model lives where. But among those who have tried it, a lot of the responses are pretty positive. AI developer Eric Hartford wrote loving Gemini 3.1 Pro, it made three huge improvements to my compiler and saw things that even ChatGPT 5.2 Pro Extended and Claude Opus 4.6 Extended couldn't see. Designer and entrepreneur Meng Tu writes, Gemini 3.1 Pro is an absolute beast for creating landing pages. It understands design details and animations so well. Insane upgrade for web designers. And then of course there's Arc AGI 2, where it came in at a 77.1%, but that might not even be the most impressive thing. The ARC leaderboard measures not only the score but the cost per task. So, for example, although Gemini 3 Deepthink, which was released last week, got a higher overall score, it did so at more than 10 times the cost. 3.1 pro achieved that score at less than a buck a task. On Artificial Analysis's overall intelligence index, Google jumped all the way from the sixth spot, behind various versions of Claude GPT and even a Chinese model, GLM5, all the way up to number one. What's more, artificial analysis points out that it's doing so at a more efficient cost, they write. Google is once again the leader in AI. Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index four points ahead of Cloud Opus 4.6, while costing less than half as much to run. They said that on their tests it led six of the 10 evaluations that make up the index, with the biggest gains in reasoning and knowledge, coding and hallucination reduction. They also point out that it does so with some serious token efficiency. They write that its processing efficiency combined with lower per token Pricing means that 3.1 Pro Preview costs less than half as much as Opus 4.6max to run, although it still is nearly twice as much as the leading open weights model, which is that GLM5 that I mentioned. In terms of specific tests, they found that Gemini 3.1 Pro led their coding index, achieving the hardest score on both Terminal Bench Hard and cicode. But that one area where they were kind of lacking was on real world agentic performance. This is around that GDP VAL test which we've talked about before, which is an agentic evaluation that focuses on real world tasks. While Gemini 3.1 Pro did jump up meaningfully from Gemini 3 Pro, it was behind Sonnet 4.6, Opus 4.6 GPT 5.2 and GLM5. That was something that a number of skeptical commentators focused on. Scaling01 on Twitter writes Gemini 3.1 Pro's GDP VAL scores are concerning. Simon Smith points out that maybe that suggests that work tasks aren't Google's focus. Indeed, he even goes so far as to speculate they have a stake in Anthropic, so maybe they're okay with that when it comes to coding. Outside of that one example that I mentioned already, I'm just not seeing enough feedback yet to really know. Some people had trouble actually finding the model or getting it to work inside Anti Gravity or Gemini cli, although when they did, as reported by Matt Veloso, they had quote awesome results. So far. Akash Gupta gets at what I think is likely to become a more discussed aspect of this, which is the cost performance frontier, he writes. Best AI model crown now rotates on a weekly basis with each lab holding a different column of the same spreadsheet. The real number in this Release is the $0.96 per task. On Arc AGI 2, Google went from 31.1% to 77.1% in three months while keeping pricing at $2 per million input tokens, the same pricing as Gemini 3 Pro. They doubled the intelligence and charged zero incremental cost. That's the game. Now the frontier is commoditizing so fast that benchmark leadership lasts weeks, not quarters. OpenAI Anthropic and Google are all within single digit percentage points of each other on most evals. The three labs are converging on comparable intelligence but diverging on distribution. Google has 2 billion Chrome users, Android workspace and cloud. That's the real moat in this chart not the 77.1%. Whoever makes intelligence ambient and cheap wins. And this benchmark table, with its patchwork of leaders across every column is the clearest sign yet that raw capability is table stakes. I think there is a lot of truth in that. And so one of the reasons why, yes, Gemini 3.1 Pro does matter is that it's pushing on the cost frontier, not just the performance frontier. Now the other thing about Gemini is that it's very clear that the productization of its multimodal capabilities is something that really matters to Google. Alongside the new model update, Google Labs announced a new feature for their Pumeli app called Photoshoot. They write, with Photoshoot, you can start from a single image of your product and easily create high quality customized product shots to elevate your marketing. That tweet went wildly viral. In fact, whereas CEO Sundar Pichai's tweet announcing 3.1 had around 1 million views, the Google Labs tweet announcing photoshoot has 12.2 million views. At the time of recording, Google Labs product director Jacqueline Konzelman wrote, clearly this hit a nerve. Turns out a lot of people have been waiting for a way to get professional product photos but didn't have the time or resources to make it happen. Now they can go try it. It's free. When folks like a 16Z partner, Justine Moore, tried it, they also came away impressed. Another example of Gemini flexing its multimodal bonafides came with a partner announcement from Replit when they introduced Replit Animation. It is exactly what it sounds like, a tool to Vibe code infographic videos powered, they say, by Gemini 3.1 Pro. Replit CEO Amjad Massad wrote, Vibe coding as a term is a bit tragic because it implies you're merely making software, but you can really make anything. We've been having a lot of fun making videos with Replit animation, the kind I used to pay thousands of dollars for when we needed to do a launch video. Also, if you dig around enough, you can see the types of things that people are using Gemini 3.1 Pro for are just a little bit different than the other tools. Sure, there's a bunch of weird Pelican SVG tests, but you also have examples like this one from Daniel z, who Gemini 3.1 Pro Vive coated a double wishbone suspension, independent double wishbone design, dynamic coilover shock absorber, vented disc brakes with performance caliper, real time kinematic travel and steering simulation. AI isn't just generating visuals anymore. Demis his Habas shared an official example from the Google DeepMind account, where they used 3.1 Pro to build a realistic city planner app that has complex terrains, infrastructure mapping and even simulates traffic. Google DeepMind chief scientist Jeff Dean shared an example of 3.1 Pro doing heat transfer analysis based on a CAD file and material properties and then turning that heat transfer analysis at different times into a visual representation. Overall, I agree on the surface with Leighton Space when they wrote it's getting a little hard to say interesting things. With all the round robin minor version updates at Frontier models every week, Gemini 3.1 Pro seems like a decent enough advance to catch up and in some cases supersede the fellow Frontier models. It's better at some SVG design things and translating textual vibes to visual aesthetics, but that's kind of all they had to say. I think though, coming back to this question of why 3.1 pro matters, or why any new model release matters, the point that I was trying to make at the beginning is that it's not just about state of the art of the benchmarks. That is, as Akash pointed out, table stakes. What's important is to try to understand what it does uniquely. Well, it's very clear when you actually dig deep that Gemini is flexing its multimodal capabilities in a full spectrum of ways, from being able to do much more technically and scientifically advanced work to being at the core of products that aren't possible with the other models. Now, that doesn't necessarily mean for Google that they can still get away with competing on core use cases like coding, but part of the reason I think we found that even though it was the primary model for just 16.1%, still a full 80% of people had used Gemini in the previous month because there are just some use cases that it is ideally suited for. It is very clear that as we head deeper into the AI and agent age, the greatest gains will not come from just shifting wholesale from one model to the next as new capabilities emerge, but instead to understand with each model release what that particular model is going to do best and where it should be in your model portfolio. I'm excited to dig into 3.1 Pro and I'm sure I will have more to report in the week to come. For now though, that is going to do it for today's AI Daily brief. Appreciate you listening or watching and until next time, peace.

wavePod

Does Gemini 3.1 Pro Matter?

Summary

The AI Daily Brief: “Does Gemini 3.1 Pro Matter?”

Overview

Key Discussion Points and Insights

1. AI Impact Summit in India – Global AI Power Shifts

2. Corporates and AI: Walmart, Amazon, Accenture

3. Gemini 3.1 Pro: Placing Yet Another Model Release in Perspective

The State of Model Releases

Gemini’s Place in the Ecosystem

Benchmarks and Cost Efficiency

Early User Reactions

Notable Quotes & Memorable Moments

Timestamps for Important Segments

Conclusion

Transcript

Summary

The AI Daily Brief: “Does Gemini 3.1 Pro Matter?”

Overview

Key Discussion Points and Insights

1. AI Impact Summit in India – Global AI Power Shifts

2. Corporates and AI: Walmart, Amazon, Accenture

3. Gemini 3.1 Pro: Placing Yet Another Model Release in Perspective

The State of Model Releases

Gemini’s Place in the Ecosystem

Benchmarks and Cost Efficiency

Early User Reactions

Notable Quotes & Memorable Moments

Timestamps for Important Segments

Conclusion