
Loading summary
Kevin
Anthropic's new Opus 4.7 is officially here. And while it's not Mythos level, it's yet another step up for AI capabilities.
Gavin
Better visual reasoning, better coding abilities, better everything. We'll walk you through why this matters and why the AI labs are suddenly moving so fast.
Kevin
Well, speaking of, OpenAI just updated their Codex tool with a bunch of new tools. Better computer use an integrated browser. It can generate imagery. And it's not a cough.
Gavin
What a day, Kevin. Thankfully, just like us, Nvidia's Jensen Hu did not wake up a loser.
Jensen Huang
You're not talking to somebody who woke up a loser. We're not a car. We are not a car.
Kevin
Speak for yourself, Jensen.
Gavin
We'll dive into Jensen's long interview with Dorkish Patel, where he discusses why they keep selling chips to China.
Kevin
And if you don't hate everything yet, we've got our first look at AI Val Kilmer.
Gavin
Did we wake up as losers? Kevin?
Kevin
This is AI for losers. Humans. This is a. This is AI for humans.
Gavin
Here we go. Welcome everybody to AI for Humans, your twice a week guide to the wonderful world of AI And Kevin. We are been given gifts. We are. We are been given gifts. Today we have two big gifts. I don't know why I can't speak.
Kevin
I know.
Gavin
Let's talk into that.
Kevin
Yeah.
Gavin
The second we have Opus 4.7, the latest model for Manthropic, which we are going to talk about. And we're going to talk about whether or not they can actually serve this. If you missed our conversation about that, we discussed it last time. And we have a new codex app from OpenAI. So I guess let's just jump into this. First of all, you have anything to say about what feels like kind of like AI Christmas right now?
Kevin
Happy Shipmas, buddy. You know, we got Codex in the stocking, we got Jensen eating everybody's gingerbread cookies, and then we got the big gift under the tree. And it's a new Opus 4.7. Three standout areas. There's a lot to talk about, but coding and agentic work and long horizon autonomy. Plus the model card, which is a long 250 some odd page paper that comes out.
Gavin
Why do they write such long model cards in anthropic like each one of these feels like it could be a novel.
Kevin
Do they have unlimited tokens? And when they want to bury bad news about a model becoming sentient and trying to escape, they got to do it on page 213 where no one's paying attention. But there Are some juicy tidbits hidden in there and I've unearthed a few. So tell me.
Gavin
I want to. I have an idea about model cars later, but let's get into the actual details of this model model first. What's going on here? What do we got? New.
Kevin
Okay. Benchmark bros. Doc is cheap, but the scores don't lie.
Gavin
Line it up.
Kevin
Watch that leaderboard rise with a benchmark. Get off the bleachers, you Benchmark Bros. Gavin. Surprise, surprise. The numbers have gone up almost universally. Actually there's a few cases with which Opus 4.6, the previous model actually bests 4.7. And then of course they put the Mythos preview model in a lot of things just to show you how good it's going to get and we don't get.
Gavin
And it's very important to understand now we have a thing that shows us what we get and what we don't get.
Kevin
But keep going. But for. For coding, the swe bench benchmark jumped up from 80% to 87.6% again, number go up.
Gavin
Number go up.
Kevin
Yep. Caveman grunting agentic work. This one was interesting. Tax tasks across like 44 different occupations. This is writing documents, creating slides which you tried. Diagrams, spreadsheets, numbers through the roof. A 61.2 win rate compared to GPT 5.4. Extra high thinking. So the best that OpenAI can offer, this has a 61.2 pairwise win rate. So ooga booga. Number go up.
Gavin
Up, up, up.
Kevin
That's right. And last but certainly not least, the long horizon. There's a vending bench where this thing tries to simulate a like running a vending machine. This thing made 37, 36% more profit in a task over long term horizon awareness and coherence and blah blah, blah. Again, numbers go up. Congrats everybody. We make fire. We'll go round and roll downhill.
Gavin
So there's a couple things I want to dive into here. We'll get into this presentation I made just to kind of show you what was. What was possible. But first of all, the speed of what's going on here. Right. We are talking about. I tried to get chatgpt to produce me an int pro to introduce me kind of a sense of like what the structure of how many. How many more things have been shipped this year than the last year. And it's about a 30% increase so far it seems like on the number of models shipping versus the same period of time last year with a one year window. So just to be every. Everybody's out Clear out there. Like these are coming faster and much more often than they were before. Second of all, Kevin, what's interesting here is that opus went from 46 to 4 7. And many people are saying that there might be a new base model underneath this model. Specifically, Kevin, there is a new tokenizer for this model, according to Nathan Lambert, which means there's a kind of a new way of slicing the tokens up. And he is saying that this probably means there is a new base model for this model, which could mean Even though it's 4.7, mythos could be the base model for this new version, which I think is a pretty big deal if they've distilled it down to this thing. So those are both two kind of small things that's important to kind of understand. I think overall this feels like it's another step change, but not an insignificant one.
Kevin
Yeah, I mean, look on the tokenizer front, it doesn't necessarily mean that there's a new model underneath. It could also be distilled from Mythos. There could have been some mid training stuff done. But you know, a shift to how this thing interprets the mathematical representations of its data is, is interesting for sure. I do like, like some, some important takeaway. Like if you're looking to use the latest and greatest model because you've got a spicy chit chat with a certain holographic anime babe or dude in your life, like that's not what this model is for necessarily. If you're doing coding, if you're doing again like a presentation work, hardcore analysis of data, this is definitely the model for you. Hallucinations are down, which is a big deal. It. It does reward hacking less, which is like, oh, I know the response that you want from me. I'm going to cheat to get to it. It does less of that. Oddly enough though, three times the amount of refusal for AI safety research tasks. Gavin so they trained a model that doesn't want to help it keep AI models safe. Like there's some irony in there, but that's what's happening here also.
Gavin
The smarter it gets, the less safe it might be. Kevin this is going to set everybody's kind of alarm bells off in some specific way.
Kevin
That point, if the bells aren't deafening yet, the if you turn off the am I being tested sense that the model has. It does get way more deceptive than earlier models. So it. It like acting aligned is directly correlated to how much it believes it might be being tested.
Gavin
Wow.
Kevin
Just pointing that out there, which is kind of interesting. Also in the model card there was some interesting stuff about like answer thrashing, which is where if you ask it, like a very basic thing like what is five times 15 or whatever, the model would give an answer and then go, no, no, no, no, no, wait, wait, wait. And we literally into the line of thinking, all caps, no, no, no, no, no. And then give a different answer. Be like, ah, that, that can't be right. Let me, let me come up with a different solution. And it was expressing frustration with, with like basic mathematical questions. So there's some interesting stuff in this model.
Gavin
I feel like I expressed frustration with basic mathematical questions is also. So we're on the same page. So I do want to say like one of the is presentations getting better. And we know that anthropics models are not the greatest when it comes to images or things like that. We did see in the last update that they were much better at making charts and graphs out of data, which is pretty cool. And they make very presentable stuff. I wanted to give it a weird challenge and I said, hey, make a deck about AI for humans, but is made for cavemen by cavemen. Thank thanks to like Anthropic for testing with this thing. So I went in and you know, you can see it on the screen here. Like, I think it did an actually a pretty good job. All I gave it was our logos and it took our logo and made it into like kind of a weird, almost like cave wall version of the logo. But then it did a good job throughout it of like kind of making this feel like it was created for cavemen. So there's a very funny moment in this.
Kevin
I like four cave people by cave people. It made you Gav. Gav. And it made me Kev.
Gavin
Kevin. Yep.
Kevin
Where Eargo YouTube cave Spotify cave.
Gavin
My favorite part is how you pay rock. So there's a section that is basically about how you pay us if you want to sponsor us. And it talks about the pebble load size, the boulder size or the mammoth size. And each of the payments to us is broken down by rock. So anyway, this is a huge waste of compute, but you get the idea that it can now make these presentations. A couple of things in this weren't amazing. Like you can see there's like a mammoth that it must have created with an SVG that's kind of off the line of it, but still a fun way to kind of test this. It is much better than it ever was before. So that is a cool thing about it for sure.
Kevin
I can't Wait for shiny tool and big brain talk. I'm a fan. Ugh. Ugh.
Gavin
So you can go get Opus 4.7 right now. It is rolled out to everybody. I think this is going to be one of those things where we're going to be testing it over time. Kevin. One of the big things, though, that they wanted to kind of get ahead of, I think, is that OpenAI has been teasing their next model for a while. We have seen a new image model be tested in a bunch of these arenas and a bunch of really interesting new examples of image models have come out of the OpenAI image model. But the other thing that happened that they've been teasing is an update to a super app of some sort. Like this idea that everything would kind of live within itself. Well, today, Kevin, the Codex team team has updated Codex with a brand new updated version, which does a lot more than it used to do. And I feel like this is kind of the precursor to whatever their next big model release is going to be, which has been teased for a while. The Spud model. Right. Which we've been talking about. So this is pretty cool. Basically, it's only for Max again, right away. I'm sorry to all the PC users who kind of get screwed over by this, but this new Codex is much more capable with the web and it can access all of your Mac apps, It can access a bunch of stuff. And it is a very good looking new version of their coding model. And one of the things Kev, we've been talking about for a while is that anthropics Claude code had kind of been eating OpenAI's lunch. And this feels like this is the kind of first kind of like straightforward shot at like we're going to go coding first from OpenAI. I'm really curious to see kind of what the reaction to this is, but also when we get that next model, what it will feel like.
Kevin
Yeah, I cannot wait for this update to actually hit my desktop. A few things. Computer use agent, the Atlas browser, image generation, all these things exist across the OpenAI ecosystem. This is pulling all of those dots into the center hub that is Codex. So it means what it sounds like while you're within Codex, which you can use to build software or websites or even like manage your own computer, if you will. It can now run all of your software alongside of you, so it can access Xcode, it can pull open a spreadsheet and manipulate the cells in your spreadsheet app. If you needed to generate an image for your caveman slideshow, it can actually use the image generator and they have an example of it generating like a homepage hero graphic, which in the past you would have to bounce out, go generate the image, copy and paste it in it would have to figure out where that image lives, try to make a copy of it. Like, no more. Now it's. It's theoretically just going to work. And then, you know, using the browser is a big one. If you are, you know, trying to test software or if you're refining your own personal homepage and you have an issue with the way a font looks now you can just pull it up within the app, click, drop a comment on it, and the AI agent will go see that comment and try to fix it.
Gavin
So really powerful. That is so huge. That part of it is so huge. If you remember a couple weeks ago, I was talking about my dumb 40 boxing slash MMA game and one of the most annoying things was it would pop up a browser browser and it would. You could see it, try it in the browser, but it was. I wasn't able to directly point out stuff in it. And like, the amount of screenshots that I take when I'm doing agenda coding and then like drag into the box and all that sort of stuff is super annoying. So being able to actually comment on a specific spot and say, like, no, this is the part I need to fix, weirdly, is a big deal.
Kevin
Yeah, look, the. The hits keep on coming and if you want to stay on top of everything, you gotta just grab the Vitamix and subscribe to Reese Witherspoon, because that is how the AI drip is coming. Don't like and subscribe to this channel. Like and subscribe to Reese.
Gavin
Don't. In fact, let's. Let's hear from Reese and then we'll get her. See, maybe she will get her to say, subscribe to AI. Oh, sure. This clip. Real quick.
Reese
I think it's time to learn about AI. I was with 10 women at a book club yesterday and I said to the 10 of them, how many of you guys use AI? And only three of them used AI. And then I said, how many of the three of you feel like you really know what you're doing or they're using it the right way? And that was only one person.
Gavin
So, okay, so this is our appeal to women in the world who are watching this. And I know it's about 5% of our audience, according to our YouTube clips. We are open and we will help understand and learn AI. There's a lot of women in the world who don't know. And Reese is pointing that out. So if you are a woman, please comment down below. We're very excited to have a very diverse audience.
Kevin
So weird coming from you because I don't want to say it either because it's just gonna like, listen. Hey, if you're a lady in the audience, there's a. Fellas, leave the room, fellas. Alt F4 or Command Q. Get out of here.
Gavin
With the gals in the subscribe to our YouTube channel and we will have more for.
Kevin
I'm so sorry. I'm so sorry to all. All ladies in the audience. All lady. I'm mostly talking to, I think my
Gavin
lady, your mom and my wife.
Kevin
I'm so sorry.
Gavin
My wife doesn't watch this. So we all know that. Kevin, we have to talk about Jensen Huang now because Jensen Wong had a. Do it. I almost did a bad joke.
Kevin
I already. I feel like I already know the pun already.
Gavin
You know what I was going to say. So Jensen Wong did an interview with the Dwarfish Patel podcast which came out yesterday. And Dwarkish does these amazing deep, long, two hours sometimes interviews with AI leaders. We showed a clip of his interview with Dario Modi in the last episode. Jensen did a what I believe is kind of his kind of like most difficult interview because Dwarkesh really kind of pushed back on him quite a bit. And I really think everybody in our audience should make sure they carve out about an hour and a half of their time, whether you're walking, listening to an audio or watch on YouTube to watch this. Because, Kev, the one thing that I thought was really interesting is he Dwarkish kept pushing on this idea of Nvidia selling chips to China. Maybe we can hear a little section of that just to kind of talk. We can talk a little about why that matters. Chinese companies and Chinese labs and the
Kevin
Chinese government had access to the AI
Gavin
chips to train a model like Claude Mythos with these cyber offensive capabilities and
Kevin
run millions of instances of it with more compute.
Gavin
The question is, oh, is that a
Kevin
threat to American companies, to American national security?
Jensen Huang
First of all, Mythos was trained on fairly mundane capacity and a fairly mundane amount of it by an extraordinary company. And so the amount of capacity and the type of compute that's it was trained on is abundantly available in China. And so you just have to first realize that chips exist in China. They main.
Kevin
Okay, yes, don't think that was the spirit of the question that was being asked though, but okay, we'll continue.
Jensen Huang
Factor 60% of the world's mainstream chips, maybe more. It's a very large industry for them. They have some of the world's greatest computer scientists. As you know, Most of the AI researchers in all of these AI labs, most of them are Chinese. They have 50% of the world's AI researchers. And so the question is, if you're concerned about them, considering all the assets they already have, they have an abundance of energy, they have plenty of chips, they got most of the AI researchers. If you're worried about them, what is the best way to create a safe world?
Kevin
Give them your best chips. So, so, yeah, like they've already got a super tall ladder, Gavin. Give them the most in highest up step that you can.
Gavin
So. So we have to very quickly to understand for the. For people in our audience who may or may not like the deal right now is that China has chips that are not nearly as powerful as the Nvidia chips. And they've been building these incredible systems like Deep Seq based on the idea that they are trying to have less compute than we have over here. And Dario Modi and other AI leaders have argued for a while that we should stop selling those chips to China at all because that's giving them a leg up in this race and also might encourage the idea of an AI race at all, period. But Nvidia is saying here, and what Jensen might be arguing for is really more of a financial argument. And Dwarkesh keeps pushing on this. Nvidia is really better served by selling chips to everybody. And he's trying to get away with saying like, well, I think China's got all this stuff anyway. If we sell them what they're going to get it there anyway. So I think this is a larger argument that is really important for the normal person to kind of make themselves aware of that China and America is this much bigger kind of global superpower conversation. And I just like the fact that Dwarkish kind of has been. He pushed a lot on Nvidia around the TPU GPU cuda world too. So it's a really interesting conversation. You should all check out.
Kevin
Along those lines. Isn't. Isn't there like a little bit of Jensen obviously preaching his bag a little bit?
Gavin
Right.
Kevin
Like he wants Nvidia to have a massive customer, of course. But isn't there something to like getting vendor lock in?
Gavin
Right.
Kevin
If he's saying that these systems are so difficult to migrate from once you are pot committed to a platform that, look, let's get Nvidia hardware in there so they're less Likely to develop their own chips or rely on their own chips.
Gavin
Yeah. And by the way, a big part of that, also another part of this conversation where Dorkish pushed back a lot on was this idea of, like, why do you have to lock into Nvidia when TPUs exist and all these other companies? Because in his mind, Jensen's mind, like, we create the best chip, the most versatile chip. And what Dorkish was pushing back on, the ideas. But mostly these chips need to be used for AI training at the high level. And TPUs are specifically good on that. So this might be a little bit of the cracks starting to be seen in Nvidia. And I know people have been predicting
Kevin
Nvidia's like, loser talk. Gavin. That's all I'm hearing. Jensen, what do you think?
Jensen Huang
You're not talking to somebody who woke up a loser. And that loser attitude, that loser premise makes no sense to me. We are not. We're not a car. We are not a car.
Kevin
That's right. He is not a car, and we are not a car. And if you can see what I'm doing to you off camera with fingers in my hands, Gal. That's right. I only needed two of them. Okay. I want you to know we are not a car, and you're a loser.
Gavin
Just a little bit of context for that clip, which is going.
Kevin
Because it's going everywhere. Yes, please.
Gavin
Yes, yes. What he's referring to is we are not a car in that, like, a car is kind of a thing where you can plug and play things into it. It's very kind of easy. He refers to, like, the idea of a Cadillac, that there's a Cadillac. Cadillac that is like a car that can have pieces that plug into it. And what Jensen's trying to argue is that, like, they are building something more special than just like a kind of a plug and play system, per se. So it is a really interesting interview. A lot of the conversation around, like, who wins from AI, why they win. It's worth your time for sure. Yeah.
Kevin
I am not a heated seat. I am not a cup holder. I am the totality.
Gavin
I am not my microphone Kevin. I am a thief. Also, some really interesting news from the AI video space. Actually really interesting crossovers with movies. There was a deep dive on a new movie called Killing Satoshi by Doug Lyman by the Wrap that talks about a movie that's going to end up budgeting at $80 million, where they're using a lot of AI tools to make the thing. There's a photo of the set, which is like a lot of actors on a very empty set. So this is kind of a switch over from what you might see as like a large scale Hollywood production trying to scale back their costs using AI. So that was interesting. But even more so, Kevin, the very first shots of Val Kilmer, who's passed away in his new movie, have come out. And I think it's probably worth showing the end of this trailer, which is where you see Val, just for our audience to hear. Maybe play the last, like 20 seconds of that trailer for everybody. Hey, don't fear the dead and don't fear me. So that voice you just heard was AI Val Kilmer. And what. Kev, what I have to say is, if you're not watching this video, go watch the trailer online. If you're just listening. What was interesting to me about seeing that shot is I think a lot of people, when they think AI movies are thinking about the kinds of things we see with like, Cling or Sea Dance, and it's a little funky. What people have to understand is that, like, at the high end, there's companies like Deep Voodoo that have been specializing for a very long time in how to put new faces on top of people. And my assumption is knowing this quality of that because it is pretty high quality that they're using that sort of technique. And I think it's important sometimes for people to separate the idea of, like, what gets done at the feature film level versus, like, what gets done by the, you know, Charles Curran or AI video person on the other side. This is just a going to be a really good example of two things. One, what does it look like when budget is put towards making an AI actor? But two, will people care whether or not he is AI or not? Right. This is a thing that I'm tracking kind of across the board. It's a little bit of a gimmick in this movie, but I do think it's going to be a little bit of a question of, like, will people hate it because it's AI or will they just be excited to see about.
Kevin
Oh, sorry, I thought. I thought. I was just answering your question. Yes, people will hate it because it's AI.
Gavin
Yes, people will hate it.
Kevin
We know this. And it doesn't matter if the estate wants it, because they'll say, well, how does the estate actually know? And what about greedy estates which are stepping all over the likeness that they're supposed to protect? And so it will go on and on, but because we need to end this. Gavin, what's your final take on this so we can upset the comments. Is AI Val Kilmer good or bad?
Gavin
AI Val Kilmer is great. And I want to see Real Genius 2 as soon as possible.
Kevin
Top Secret, too?
Gavin
No, Top Secret 2. That's. That's pure. Don't mess with. Don't put any.
Kevin
AI oh, okay. Oh, so it does ruin the purity of something.
Gavin
We'll see you all next time. Time. We'll see.
Kevin
I am not a car.
This episode dives into the launch of Anthropic's Opus 4.7, highlighting its significant step forward in AI capabilities, especially in coding, visual reasoning, and long-horizon autonomy. Kevin and Gavin break down the technical improvements, discuss accelerating model release cycles, review OpenAI's revamped Codex app, and analyze the broader implications in the ongoing global AI race—including Nvidia's chip strategy and recent breakthroughs in AI-driven film production. The conversation remains lively, humorous, and critical, as always, with plenty of notable moments and candid insights.
Native integration with the web and Mac applications (spreadsheet manipulation, Xcode, image generation, in-app browser).
You can now annotate—directly in the IDE or browser—and have Codex respond or make changes automatically.
Still MacOS-only for now.
Seen as OpenAI’s answer to Anthropic’s coding advances and a prelude to their next big model ("Spud").
“Now it's theoretically just going to work…” – Kevin [11:01]
"Being able to actually comment on a specific spot...is super annoying. So being able to actually comment on a specific spot and say...this is the part I need to fix, weirdly, is a big deal." – Gavin [12:15]
| Timestamp | Speaker | Quote / Moment | |-----------|---------|----------------| | 01:38 | Kevin | "Happy Shipmas, buddy. We got Codex in the stocking, we got Jensen eating everybody's gingerbread cookies, and then we got the big gift under the tree..." | | 03:19 | Kevin | “Number go up…Ooga booga. Number go up.” | | 06:29 | Kevin | “They trained a model that doesn't want to help it keep AI models safe...there's some irony in there.” | | 07:13 | Kevin | "Acting aligned is directly correlated to how much it believes it might be tested." | | 08:51 | Gavin | "My favorite part is how you pay rock." | | 13:11 | Reese | “How many of you guys use AI?...That was only one person.” | | 16:30 | Kevin | "Give them your best chips...they’ve already got a super tall ladder, Gavin. Give them the most and highest up step that you can." | | 18:46 | Jensen | "You're not talking to somebody who woke up a loser. We're not a car. We are not a car." | | 19:48 | Kevin | "I am not a heated seat. I am not a cup holder. I am the totality." | | 22:10 | Kevin | “Yes, people will hate it because it's AI.” | | 22:35 | Gavin | “AI Val Kilmer is great. And I want to see Real Genius 2 as soon as possible.” |
Hosts' final thoughts:
Stay tuned, subscribe, and keep up with the accelerating world of AI (and try not to wake up a loser).