Transcript
A (0:00)
Anthropic has just announced Opus 4.5, which is the latest version of its flagship model. It's the last of Anthropic's 4.5 series model that's going to be released and it's following the launch of Sonnet 4.5, which happened back in September and we had Haiku 4.5 in October. So this is kind of the big model everyone has been waiting for out of Anthropic. I think as to be expected, this, the new version of Opus has, you know, basically state of the art performance on a bunch of different benchmarks, including the SWE benchmark. For software engineers, this is incredibly important. Also Terminal Bench, these are very important as Claude Code has become the favorite AI tool among most developers. Although it's starting to get some competition from Grok and from Gemini and OpenAI. They all kind of battle in there, but it seems to be holding, holding the number one spot, the tool use, which is the Tau2 benchmark, the MCP Atlas and general problem solving on ARC AGI2 and GPQA Diamond. So it's doing really good on a lot of the benchmarks. Today on the podcast I want to talk about some of the changes that we can expect to see because of this. But before we get into that, I wanted to mention if you want to test all of the AI models I talk about on the show, I'd love for you to try AI Box AI. It's my own startup where you get access to over 40 different models. You can, you can compare them side by side. Everything From Anthropic to OpenAI to Google, Gemini to Grok and a ton of different image models. And for audio you have 11 labs. So it's all in there. For $20 a month, you get access to everything. So go check it out. AI Box AI. Okay, let's get into what's going on with Opus. So one thing that I think is a really big deal here is that Opus 4. 5 is the first model that has ever achieved over an 80% score on the SWE bench. This is verified. It is a real coding benchmark that a lot of people pay attention to, and especially considering Claude Code is so popular with developers. As I mentioned earlier, they also said the fact that Opus computer use and spreadsheet capabilities are really, really impressive. They've gotten much better, they're better than everyone else and they've launched a bunch of different parallel products to show how good the model holds up in those settings. Something that I actually really appreciate from Anthropic is Beyond just saying like, look, look, we did really good in a specific benchmark. They've actually released products to showcase what the model is capable of doing. And I think this kind of gets past the pessimism that a lot of us have when a new model comes out and maybe sometimes it does good in the benchmarks and we're just like. And they're like, oh, it can do all these things and you go try it. And it's like not quite as good. If they show you an actual product and it works good, you have a lot more trust in this. So Anthropic made something called Claude for Chrome. They also made one called Claude for Excel products and they, I think they previously were doing some pilots on this. These are going to be available more broadly now. The Chrome extension specifically is going to be available to all of their Max users and the Excel Focus model is going to be available to Max team and enterprise users. Now max users is $200 a month. So like you do have to pay more for some of these, for some of the subscriptions to get access to these, these impressive tools. This is very similar to how OpenAI did their 200amonth tier. And that's what you first needed in order to get Atlas and in order for you to first get Sora and a bunch of their other like cutting edge stuff. And as those became more popular, as they built out the GPUs and as they figured out kind of the demand for those, they rolled those out into the $20 a month tier. So I'm assuming these products we will be expected, we'll expect to see those for more users in the future, but that's kind of how it sits today. Opus 4.5 also has memory improvements for long context operations. Um, this, I think it basically makes it so that it's required a lot of changes in how the model manages its memory. Um, and there's a bunch of different startups that are coming and trying to tackle memory cross AI model. I think there's a bunch of interesting new concepts that have come out and Claude has started to integrate a bunch of those. This is what they said about it specifically. This is Diane Napan, which is Anthropic's head product manager for research. She said there are improvements we made on general long context quality in Training with Opus 4.5, but context windows are not going to be sufficient by themselves. Knowing the right details to remember is really important in complement to just having a longer context window. I think this is so critical beyond just, you know, having like, oh, we have this massive context window paste in a whole book. We'll know everything that's going on and ask us any questions about it. I think being able to beyond just that, but, but knowing what is important to remember and how to store the memory and how to access the memory and when isn't as a relevant time to access the memory to in regards to a user's query is going to be a big moat for a lot of these companies, these AI companies, and it's going to make their products seem a lot more high quality. So I think all of those that they have been working on for Opus really enabled this long requested endless chat feature for paid cloud users, which is going to allow chats to proceed without interruption when the model hits its context window instead, the model is going to compress its content memory and it's not even going to tell the user, it's just going to let you keep going forever and ever. And it's not going to forget all of the previous stuff, theoretically. Right. This is a big problem. We've all seen where you've done, you know, you're a hundred, you're 100 messages in on a chat and it starts forgetting some of the earlier things that you said. And that's because basically the context window ran out. So it just starts forgetting the early stuff. CLAUDE is coming up with essentially this whole new framework and this whole new architecture where before anything gets cut off, they just start like basically, I mean, so essentially I'll give you like a two second summary of this in a, in a crude term, but if you're about to run out of context on something that happened above, they will take the chunk that you're about to run out of context for. They'll run it through CLAUDE and say, here's, you know, a whole bunch of stuff. Summarize this into like five very concise short bullet points of the important key takeaways and information inside of this and it can go, and it will essentially condense it. Right? So they say compress. When we think about compressing a file or zipping a file, I think in traditional terms that we think about that much differently. But when it comes to AI models, that's how you're compressing the data. So you can get down to the end of the, you know, the end of the conversation and it's just compressing all the previous stuff. Does it lose quality? Yeah, I think just like compressing an image and, you know, decreasing the quality of an image, you can perhaps miss little nuances or Little bits of detail, but I think it's going to get the overall idea of everything you're talking about. And especially because us mere humans, when we're talking to these AI models, we tend to add in a lot of information. My wife always complains whenever she hears me talking with ChatGPT because I probably always give it like way too much detail, way too much context. I won't just be like, I need a Mexican food recipe. I'll be like, I was eating at this amazing Mexican food restaurant yesterday and they had this sizzling steak. It was so good, I don't even know what it's called. But like, can you help me come up with a recipe for. And my life's like, why did you tell it you're eating a Mexican food yesterday, why did you. Blah, blah, blah. Anyways, I just like kind of talked to it like a person and I'm like, it's gonna get the details it needs to. But what's interesting is it's very long winded and not all that information is relevant to the model. Admittedly, it just makes it easier for me to think it out loud. So that's why I do that. I think a lot of people do in one way or another. So the AI model then has to go and determine what did I say that was important? And. And that needs to be condensed and included into the summary because not everything is. And so technically you could miss little nuances when you make that compression if you don't get every little detail. But at the end of the day, this is a huge step forward in making it so that's not going to forget everything that you're talking about. So I think at the end of the day this is quite a good, um, this is quite a good tool. In any case, thank you so much for tuning to the podcast today. If you enjoyed the episode, make sure to leave a rating and review wherever you get your episodes. Make sure to check out AI box AI to get access to all of the top models in one place for $20 a month. Have a great rest of your day.
