Transcript
A (0:00)
This episode is brought to you by State Farm. Listening to this podcast Smart move Being financially savvy Smart move. Another smart move Having State Farm help you create a competitive price when you choose to bundle home and auto bundling. Just another way to save with a personal price plan. Like a good neighbor, State Farm is there. Prices are based on rating plans that vary by state. Coverage options are selected by the customer. Availability, amount of discounts and savings and eligibility vary by state.
B (0:34)
Welcome to the Tech Brew Ride home for Tuesday, November 25, 2025. I'm Brian McCullough today now anthropic says it's leapfrogged OpenAI with its new model and is the AI horse race in play. OpenAI is still focusing on things like shopping, Nvidia answers a question people weren't asking and is Google soaring because they also might be able to go after Nvidia's chip throne. Here's what you missed today in the world of tech. I know it's getting a bit repetitive announcing a new model every other week, but given the recent heavy discourse over OpenAI, maybe losing the lead in terms of the AI cutting edge, we gotta make note of this. Anthropic has launched Claude Opus 4.5, saying it is the best model in the world for coding agents and computer use, and meaningfully better at everyday tasks. Quoting Ars Technica Perhaps the most prominent change for most users is that in the consumer app experiences, web, mobile and desktop, CLAUDE will be less prone to abruptly hard stopping conversations because they have run too long. The improvement to memory within a single conversation applies not just to Opus 4.5, but to any current CLAUDE models. In the apps, users who experienced abrupt endings despite having room left in their session and weekly usage budgets were hitting a hard context window 200,000 tokens, whereas some large language model implementations simply start trimming earlier messages from the context when a conversation runs past the maximum in the window, CLAUDE simply ended the conversation. Rather than allow the user to experience an increasingly incoherent conversation where the model would start forgetting things based on how old they are now, CLAUDE will instead go through a behind the scenes process of summarizing the key points from the earlier parts of the conversation, attempting to discard what it deems extraneous while keeping what's important. Developers who call Anthropic's API can leverage the same principles through context management and context compaction. Opus 4.5 is the first model to surpass an accuracy score of 80%, specifically 80.9% in the SWE bench verified benchmark narrowly beating OpenAI's recently released GPT 5.1 Codex Max at 77.9% and Google's Gemini 3 Pro at 76.2%. The model performs particularly well in agentic coding and agentic tool use benchmarks, but still lags behind GPT 5.1 in visual reasoning. Anthropic also claims that Opus 4.5 is far less susceptible to prompt injection attacks than prior Claude models or than competing models like GPT 5.1 and Gemini 3 Pro. Still, none of these models has perfect performance on that front. While the improvements to performance in benchmarks are worth noting, the most meaningful improvement in Opus 4.5 is arguably that it is significantly more efficient with tokens. Anthropic's blog post offers examples set to a medium effort level. Opus 4.5 matches Sonnet 4.5's best score on a SWE bench verified, but uses 76 fewer output tokens. At its highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while using 48% fewer tokens. The Opus 4.5 launch is accompanied by other new features for developers and users. For example, the developer platform now includes a new effort parameter, allowing developers to more precisely tune the balance they want between efficacy and token usage. Also, Claude code is now available in the desktop Claude apps. Previously it was available via command line, IDE extensions and the web a few places, just not in the native desktop apps. The Claude desktop interface is now tabbed between the traditional chat experience and the Claude code experience. And lastly, and for some most importantly, there's a big pricing change for the API for Opus 4.5. The cost is now $5 input and $25 output per million tokens, down from 15 and 75 respectively. End quote and quoting VentureBeat Anthropic's internal testing revealed what the company describes as a qualitative leap in Claude Opus 4.5's reasoning capabilities. The model achieved 80.9% accuracy on the SWE bench Verified, a benchmark measuring real world software engineering tasks outperforming GPT 5.1, Anthropic's own sonnet and Google Gemini 3 Pro, according to the company's data. The result marks a notable advance over OpenAI's current state of the art model, which was released just five days earlier. But the technical benchmark tells only part of the story. Albert says employee testers consistently reported that the model demonstrates improved judgment and intuition across diverse tasks, a shift he described as the model developing a sense of what matters in real world contexts. The model just kind of gets it, albert said. It just has developed this sort of intuition and judgment on a lot of real world things that feels qualitatively like a big jump up from past models. He pointed to his own workflow as an example. Previously, Albert said he would ask AI models to gather information, but hesitated to trust their synthesis or prioritization. With Opus 4.5, he's delegating more complete tasks, connecting it to Slack and internal documents to produce coherent summaries that match his priorities. The new model also scored higher than Anthropic's most challenging internal engineering assessment than any human job candidate in the company's history, according to materials reviewed by VentureBeat. End quote. Again, a lot of chatter about OpenAI maybe falling behind in the horse race over the last week. Maybe OpenAI has taken its eye off the ball. We were talking yesterday how they made their models arguably worse in an effort to increase user engagement, and they've been rolling out stuff like what I'm about to tell you, which, while interesting, the argument people are making is they only have so much in the way of resources. OpenAI has unveiled a free shopping research feature in ChatGPT that delivers a personalized buyer's guide powered by a custom version of GPT5 mini. Now this does sound interesting, but again, maybe they need to focus on staying cutting edge. Quoting ZDNet Similar to deep Research when prompted with a product description, ChatGPT will now sift through the Internet to put together a guide for you. It will also ask you a series of clarifying questions, using the context from past conversations, and considering product reviews to develop your guidelines. Shopping Research is designed to act as an assistant that can create a personalized shopping experience tailored to your specific criteria and needs in just a few minutes, OpenAI said. Research outputs can help with a variety of different tasks, including finding a product that meets specific criteria, for example like help me find a smartphone with 18 plus hours of battery life under $1,500. Other examples include finding dupes or look alikes of a product, comparing different products with a detailed trade off list that is catered toward your specific needs finding product deals helping you choose gifs for people on your list. The entire experience is powered by a version of GPT5 mini that was trained specifically for shopping tasks, according to OpenAI. The company said that it was trained to read trusted sites, cite reliable sources and synthesize information across many sources, as well as refine its prompts in real time when compared to other ChatGPT models such as GPT5 Thinking or ChatGPT Search. Shopping research leads in product accuracy, yet OpenAI acknowledges that it occasionally makes mistakes about product details such as pricing and availability, and recommended that users always double check its work. I found the experience of using it to be interactive and intuitive. To get started, all logged in ChatGPT users, including those with free Go plus and Pro plans, can either ask a shopping question, which will automatically activate the feature, or select the Shopping Research option from the menu in the text box. In your first prompt, describe what you want it to do for you. Then ChatGPT will follow up with questions pertinent to your search, such as your budget or the features that are important to you. It will also use the context it knows about you if you have those personalization toggles on to tailor the response toward you. As it conducts the research, it will display sample products it has found. With every product, you can indicate whether you are interested or not and why you made that decision, guiding the research further. This was my favorite part of using the feature as it felt like an engaging Tinder like experience where you can quickly click through to indicate whether you like or dislike. Then after a few minutes, it will provide you with a personalized buyer's guide that includes the top products, comparisons and links that take you directly to the retailer's website to place the order. In the future, the company plans to integrate this feature into the instant checkout experience, enabling you to make purchases directly on the site. OpenAI said that user chats are never shared with retailers and that the results are generated organically based on publicly available websites. Sites that want to appear in Results must allow OpenAI's crawlers to access their site, which can be done by following the instructions for the Allow listing process. End quote.
