Loading summary
Andrej Karpathy
Foreign.
Sponsor/Announcer
Would like to thank ODSC AI for being a sponsor. ODSC is one of the longest running and largest communities focused on applied data science and AI.
Andrej Karpathy
It started over a decade ago with
Sponsor/Announcer
a simple idea bring practitioners together to learn from people actually building and deploying models in the real world, not just talking theory. On April 28th through the 30th, you can experience it yourself at ODSC East 2026. Taking place in Boston and virtually there will be thousands of hybrid attendees ranging from data scientists, ML engineers, AI researchers and technical leaders. You can attend over 300 sessions covering LLMs, Gen AI, Computer Vision, NLP, Data Engineering and more. You can also go to hands on training with workshops and bootcamps taught by experts from companies like OpenAI, Hugging Face, Nvidia and other top companies, universities. And of course there'll be a massive expo and networking opportunities. Great for startups, hiring managers and AI tool builders. It's one of the best ways for AI practitioners and teams to stay ahead of the field, learn from the best and connect with a community. Go to ODSC AI east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. That's ODSC AI east and use code LWAI to get an extra 15% on the number one AI builders and training conference. We'd like to thank Box for sponsoring last week in AI. Box is the leading intelligent content management platform enabling organizations to fuel collaboration, manage the entire content lifecycle, secure critical content and transform business workflows with enterprise AI. To unlock the power of AI, you need to get your content to your LLMs and agents. Your business isn't the sum of Internet knowledge. Your business lives in your content, so you don't just want to bolt on AI to your existing processes. To become an AI first company isn't just about automating what you already do, it's about reimagining what's possible. With boxai you can truly leverage the latest breakthroughs in AI to automate document processing and workflows, extract insights from content, build custom AI agents to work on assignments and more. And most importantly, boxai works with all the major leading AI model providers so OpenAI, Anthropic, Google XAI and others so
Andrej Karpathy
you can be sure you can use
Sponsor/Announcer
the latest AI models with your content. Box AI will give you the content layer that gives AI the context it needs while giving your teams the flexibility they need to test and leverage various models for different use cases. So go to box.comai to learn more.
Andrej Karpathy
Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin AI for articles we will not be covering in this episode. I am one of your regular hosts, Andre Karenkov. My background is of having studied AI and a PhD and now working at the startup Astrocade.
Jeremy Harris
And I'm your other co host, Jeremy Harris from Gladstone. AI kind of do AI national security things. So we missed a week and then we got hit with a week with tons of papers, tons of announcements. This is just going to be a giant episode. I guess that's just where this is headed.
Andrej Karpathy
Yes. And we are going to try to be efficient and not make this two hours. But we'll see how it goes. Usually when we get into a paper section is where things start being a bit longer. We'll see. And you mentioned Glasstone. I do as always want to apologize for missing a week in this Last Week in AI podcast, but it's, it's all about day jobs. I also couldn't kind of make it up. Fun fact, I guess. I don't think I've shared that our startup just raised our series B recently. Oh, so it's a busy time. It's. There's a lot to do. Is there an option or. No, it. We aren't publicizing it or anything but it's not secret so I'm just going to say that much.
Jeremy Harris
Very cool. Congratulations. That's great.
Andrej Karpathy
Yes. So it's, it's fun. And if you happen to be in the Bay Area and looking for a job, we are hiring engineers backend front end growth marketing. So you can feel free to email me if you want.
Jeremy Harris
Amazing.
Andrej Karpathy
And now before we get to the news real quick, we do have a lot of listener comments that I want to acknowledge. I have a couple new reviews on Apple Podcasts. I remember we called out a review that said we are mostly back in 2026. When you're back now there was an update that says totally back at 100%, which is good to hear. You're still missing weeks so I don't know if it's 100% but it's more. It's the same as in 2025 and a couple more that I appreciate. Technically and philosophically excellent. I don't know anyone has said ever we are philosophically excellent. But I appreciate that. And along that note, I guess in response to us calling it out briefly, a lot of people did comment on the handling of politics and the state of politics in the US in particular, and appreciative of being direct about it, which is what I tend to do. Jeremy, given your job, national security maybe cannot be as opinionated or direct, but we will, as always, try to keep your politics related to AI, which this week, as you may know, we'll get into that a little bit.
Jeremy Harris
Yeah. If anybody wants to see me squirm, this is going to be the week.
Andrej Karpathy
Yeah. It's not like the military or anything is related to what you work on. I wouldn't imagine it. So with that being said, let's get to the news, starting with tools and apps, and we're going to catch up on some stories from last week we didn't get to cover. First up, we've got Sonnet 4. 6. So Anthropic released Opus 4. 6 pretty recently, maybe even just a month ago. Now we've got Sonnet4.6, very similar announcement in that it's just a 0.1 bump from Sonnet4.5, a previous model, but it's an impressive improvement kind of all around another increase in the context size up to 1 million, which is a very big deal if you're using these in production. And similar to the Opus update, it comes merely just a couple months. I think the last one was November if I remember correctly. So it's a big improvement in a relatively short time span in 2025. The time between these models to me felt longer and the jumps didn't feel as steep. So I think we are seeing a lot of indications that post training, reinforcement learning, kind of scaled up reinforcement learning where you don't need new data, maybe kind of allowing continuous training and continuous improvement, at least right now is is my feeling.
Jeremy Harris
Absolutely. And, and also distillation. Right. So as we're talking about the performance of Opus, the better Anthropic gets, the better labs get at distillation, the more that translates into models like Sonnet getting better, faster. Right. So the, the difference, the distance between Opus and Sonnet might close as you get better at distillation. It's not like they're necessarily using Opus, let's say only or to the exclusion of other models, but because there may be even bigger models in house. Right. OPUS itself will be a distillate of some even bigger model because typically you don't actually serve the huge mega model that you train. But anyway. Yeah. So Distillation is important here. We'll be talking about distillation in other contexts too, actually, later in this episode. RKGI 2 60.4% on that benchmark. Wow, that happened fast. This is not technically soda overall, but certainly in its weight class, so to speak. If we can speak of weight class because we don't know the parameter counts. This does seem to be pretty close to field leading. If not just field leading, it's still behind, obviously Opus 4.6, but also Gemini 3, Deepthink, and one kind of refined version of GPT 5.2. So this is an impressive model. I would expect this is going to be the model that a lot of people use day to day encoding. It's really impressive. I mean, having played with it over the last couple days, this is a genuinely impressive model. And anthropic is on fire.
Andrej Karpathy
Anthropic is on fire. And Speaking of ARC AGI2, we've been mentioning more and more and we will mention it with the next story. So perhaps worth giving a quick primer on the benchmark itself. The reason it's called ARC AGI. Well, I don't know the ARC part, but the AGI part is meant to be evaluating whether AI is human level in this case. So human level, meaning it's basically an IQ test given sort of a set of patterns to complete, or some sort of like problem that requires you to generalize on the spot, given just a few data points. So you can't like call up a fact, for instance, or find a solution from the web, something like that. The models are meant to match humans, where humans kind of intuitively are quite good at a lot of these problems. So with rkgi, you're the kind of the threshold of a bar. There's a couple different ways to kind of win. One of them is to given limited compute and limited data to then get a score that is very strong. And this was quite challenging for LLMs for a long time, I think, partially because a lot of the problems are quite visual. So if you try to explain them in text, they are very high dimensional problems. There's like grids of pixels, right? And that just doesn't lend itself well to LLMs. So LLMs are getting better at multimodal reasoning and performance, which we don't call out as much, but is very significant actually multimodal, given that now we want more and more work in our computers, be agentic, et cetera, et cetera. So as a benchmark, it's quite interesting in that it's not kind of task oriented on programming or QA or search or anything like that. It's kind of raw iq.
Jeremy Harris
Yeah, it's also, well, I guess one way to think of it, an example, and this is a toy example to give you a flavor of what these evals look like. It's almost like you have a model that might have been exposed to Connect four in the prompt and then you're like, okay, now do Connect five, right? Play a game of Connect five. Or you do regular tic tac toe. Okay, now we're going to do tic tac toe on an infinitely large grid or you know, something like this where it's. You're trying to test explicitly the out of distribution generalization capabilities in the model that go beyond whatever it's been exposed to in the prompt. All kinds of issues here around task leakage, as you can imagine, right? And with IQ tests, if you, if you saw the IQ test before, like last year's version or whatever, so you do get benchmark saturation from that. But famously RKGI1 is now kind of the solved problem. And so RKGI2, I think Francois Cheddh, who's the researcher who sort of invented the ARC AGI benchmarks and who's done a whole bunch of things around deep learning frameworks in the past and blah blah, blah, I think he said something like he expects to do up to ARC AGI 4 maybe before we get to ASI. I think that was it. So he's not thinking of ARC AGI 2 as like the, the thing that once you get it, you get asi. But it's a, he sees it as a big waypoint, which is significant because he's a historically an ASI, long timelines guy.
Andrej Karpathy
That's right. And Arcgi 3 is coming actually it's announced for March, just looked it up, ARC abstraction and reasoning corpus. And if you go, there's actually a paper on the measure of intelligence that he wrote and it explains it pretty well. The focus here is on skill acquisition, efficiency, scope, generalization, priors, et cetera. So it's one of the more useful benchmarks partially because of its design and partially because from the very beginning, or at least very early on, if I remember, there was a held out test set that was private and to get a score you need to submit your model. And it's a bit trickier basically to see answers. You're given some data as an example, but the whole point is you generalize from not much data. And just one last piece of speculation, I'll throw out there for Sonnet4.6 and Fropic in general. I have to wonder how much that cloud code data is helping them out because they're getting a lot of data to train on and it's hard, it's easy to forget. But like, that's new data that no one else has. Probably unimagined. Already part of a training here.
Jeremy Harris
I'm trying to remember. I think Arya was on Door Kesh's podcast at one point fairly recently, talking about why we haven't seen a kind of liftoff where whoever's number one in LLMs just like keeps compounding and running away with it. Because you would think that that would happen, right? These companies are dogfooding their own LLMs, so theoretically, right, that should happen. And one of the reasons that he said that's not happening yet is the uplift you get from these models. Like a year ago he said was maybe like 5% uplift. Now it's maybe more like 20%, which seems to imply that then we may be in for a phase transition fairly soon because 20% is quite significant. So maybe his argument would be actually that what you just said is right on the nose, which would be quite interesting.
Andrej Karpathy
And you know, the whole trick now. I think part of the reason we don't feel like they're jumping as much is that now more and more of the actual challenge is long term task, you know, agentic task solving. We're not getting to that super intelligence. And I am always skeptical of the notion of super intelligence where you like immediately are some high brainiac that knows the answer, the challenge. And I think what these models will keep getting better at is actually working for four hours, eight hours, you know, whatever, without kind of losing our mind. Yeah, yeah. And moving on to the next impressive model release from last week. This has been a crazy few weeks for model updates. Google has rolled out Gemini for 3.1 pro. I think they have previously kind of previewed it, but now we actually have a rollout. They announced the performance and this one gets at 77.1 on Arcade GI2 compared to Gemini 3 Pro's 31.1%. That's the big headline here, is on that AGI benchmark. It is surprising to a lot of people that we are getting to that level of performance. And I was also a bit taken aback by it. But yeah, all accounts similar to Silent 4.6 and Opus 4.6. A pretty impressive leap for just a 0.1 bump in the version.
Jeremy Harris
Yeah. Also all kinds of interactive capabilities it can generate, in one case it shows a 3D interactive starling murmuration which if you didn't know the collective noun for starlings, it is a murmuration in with a dynamic soundscape too. So a match to audio and all that. So, you know, a whole bunch of interesting capabilities. I think pretty consistently we've seen Google go kind of multimodal as their differentiator. You know, Claude obviously better on the, on the coding side and then OpenAI going kind of more full consumer. But yeah, it's kind of, kind of interesting to see them keep leaning into that. Which is consistent too with I think internally, you know, when you think about Google, when you think about Google DeepMind, the idea of world model simulations as ways to generate training data for agents has just been a bigger thing for them typically. And so I think you're kind of seeing that reflected in that, that orientation as well. But yeah, big, big deal. And, and that multimodality by the way is helpful explicitly on the Arcade GI2 benchmark. A lot of those benchmarks involve visual problem solving, looking at puzzles, looking at things, you know, so, so maybe that multimodality is an asset in that context specifically. In fact, I would expect it to be. So, you know, you've got both the reasoning and the sort of multimodality that allows you to interpret and understand what you're reasoning about.
Andrej Karpathy
And one thing we, we haven't covered as much. But as a, a bit of a reminder, one of the strengths that Google has in is on the pricing side, like relative to other models especially Claude is quite affordable. You pay $2 for a million input tokens and $12 for a million output tokens, at least on shorter prompts that are less than 200,000 tokens. Claude Opus4.6 is $5 for that input and $25 for that output. So twice as expensive basically. And Anthropic has sort of gotten away with pricing at a premium for a while. They have seen that enterprise is willing to pay for the best. We're pretty price insensitive. But as kind of the models start being more and more similar, I don't know if Cloud might be in a bit of trouble if they continue to be priced higher to this extent from the other models.
Jeremy Harris
Yeah, I mean for a lot of workflows, especially when you're looking at coding, I think people are willing to pay top dollar for just like whatever the best model is. But yeah, you're right. I mean, presumably we'll hit a point where that tired comment that I keep making about image generation models starts to apply more and more to code where it's like, you know, for 80% of use cases it's like not a big deal, which, you know, which model you choose. So, you know, we may be headed for commoditization along some axes.
Andrej Karpathy
And you got one more model release, not quite as big a deal. But I think also worth mentioning, Grok 4.20 is in public beta. This one not quite as big. In fact, I don't know that they release benchmark numbers. What people on Twitter seem to indicate is it's not even clear if this is a new model or they just tweaked the inference to make a model have like multiple Personas talking to each other and then synthesizing a single output. Similar to GROK Heavy where they ran multiple things in parallel and then synthesized a better overall result. According to elon Musk, Grok 4.2 will be about an order of magnitude smarter and faster than Grok 4. I don't think that's true. I don't think we're going to see a 10x improvement from 4 to 4.2. Grok4 already quite capable of not leading the pack anymore. So yeah, still getting releases from xai. I'd be curious to see how they keep up given now they're being folded into SpaceX and a bunch of people left. I don't know if you covered that on here, but some of the co founders and apparently a lot of the technical staff were transitioned out as Xai was forward into SpaceX.
Jeremy Harris
Yeah. So one of the interesting things about this release, I don't know if we're supposed to call it 420 or 4.20 or what, like what the thing is
Andrej Karpathy
here, but let's just go 4.2. We get the 420 joke. We get it. Very funny.
Jeremy Harris
I'm looking for four forward to Grok 6.9. So apparently the idea here is that 4.20, 4.20 is being framed more as this like high level expert tool. Right. So instead of just focusing on, oh, this is an edgy model that'll tell you things uncensored, which has been typically the personality of GROK or the kind of main focus here, it's more on these like real world concrete capabilities in medicine and engineering. At least that's what a lot of what this announcement is kind of highlighting. So it's interesting. So the, the pitch here that Elon has is that you can just take a picture of your medical data or upload the file Get a second opinion from Grok, which liability, liability, liability and all that I'm sure is, is being covered. But this is, this is basically the pitch. So quite interesting. It's a different, different twist. You know, if this becomes a persistent thing, then it's kind of like Grok trying to carve out a maybe more monetizable corner of the LLM landscape and market. Right.
Andrej Karpathy
And to be clear, this was just released in public beta. There was no kind of big model rollout yet. This is being tested out by some people on X, but people on X have kind of been showing. And I don't know if this is easy to see or if you just see this in the thinking traces, but there are these four agents, Grok, Harper, Benjamin and Lucas, who debate internally, fact check each other and help each other kind of get things right. And that's why I say it's unclear if this is a new model or if it's a new inference paradigm or if it's both. We don't know. Next, onto some more kind of application side of things. Anthopic released a mobile version of cloud code called Remote control. So if you have crowd cloud mobile app, you can have basically an online session of cloud code that you can remotely hit up. And people have compared this to OpenClaw, where it's like a thing, an agent that lives out there. It just hangs out and waits for you to reach out and ask it to do something and then it goes off and does it for you. You've seen this before with Codex from OpenAI. They actually very early on decided to do the online agent strategy where you reach out and kind of ping it and the agent goes off and does stuff for you and hopefully you guess it right, you come back later. So here it's just easier for Claude to be used that way. I'm kind of still a skeptic of that being useful right now, at least for more complex software engineering. But it's the way of a feature, seemingly.
Jeremy Harris
Yeah. And there's, you know, a lot has been said about the security layer here and the security implications. Obviously, you know, openclaw, your computer is just like its giant playpen, right? It'll, it'll go bananas, it'll delete files, it'll, you know, it'll send wire transfers, whatever you want. Whereas here, so when you actually run a command using Claude remote control, it's set up so that you're basically, your machine sets up an outbound connection to Anthropic's API and you're not Actually like opening any inbound ports so your computer isn't exposed to the open web. You're just basically pulling the API for instructions and maintaining that separation. You can kind of have a remote window if you want to look at the process that's still running, but it creates that separation very intentionally. This does reflect Anthropic's philosophy on all this stuff, which has consistently been, hey, you know, maybe we shouldn't fully trust these increasingly intelligent, conniving agents to just like, you know, run roughshod on our computers. So anyway, there you go. Yeah.
Andrej Karpathy
And one thing to touch on here, just to be clear, it's also a bit different from Codecs and previous, a bit different from Operclaw as well, in that one way to use it is you just start a cloud code session on your local development environment, just as you do normally, and then it just lets you talk to it via your phone. So one of the very annoying things with these cloud agents is you have to then like transfer all your files to the Internet to a repository, and they have to work in this little, like isolated thing where it's not your computer. One of the ways this could be more useful is you just work in the same environment where you interactively work with cloud and you can pick up or like check for updates when you go out for a walk, which I actually could see being useful. And onto one last product update. Perplexity has announced Computer, an AI agent that assigns work to other agents. So the key here is that it coordinates multiple agents to execute user assigned tasks. The claim is, or the intent here is that it can run for extended hours, from hours apparently to months, depending on you want to do. So it can do things like creating a plan for a marketing campaign or to build an app. This model will then break it down into subtasks and assign it to specialized AI agents. Interesting to see Perplexity doing this. It seems like they might be trying to find new avenues to be able to monetize, given they've been doing search since the beginning and they haven't done like deep research. And now this is outside of that. This is getting more into vibe coding and agentic vibe coding, which is not so much their wheelhouse.
Jeremy Harris
Yeah, it's also exactly what you have to do if you aren't in the business or at the scale where you can train your own models, which Perplexity is not. Right. They're sort of in this gray area where they have to aggregate, find ways to add value using other models. So this is that kind of play, right? You have an agent that assigns work to other AI agents. Those agents could be Gemini powered or cloud powered or whatever. So it kind of is this integration play this platform play that. Perplexity is strategically in a natural position for some might say it's their only option. Like they have to find a way to position themselves as a way for you to easily kind of flip back and forth, kind of open router style almost. I think this is a natural move though, as you say it is. It is well outside the remit of the kind of deep research and search function. It's, it's a, a different use case, a different, I guess, habit stack that they're targeting, which it'll be interesting to see if they can break into it. Computer, by the way, makes me think as a Star Trek fan that that's what they're going after, right? Computer do this thing right. Anyway, that just kind of seems like part of the goal, so it'll be interesting. We'll see how the product grows if it does.
Andrej Karpathy
Speaking of that, another fun fact, Computer originally referring to people, right? Computing. Up until probably the 50s we had computers employed for like NASA or whatever. Then that changed obviously, but now we're going back. Now you have AI being computers, which is kind of funny.
Jeremy Harris
That's right.
Andrej Karpathy
Onto applications and business. First we've got Meta has talked to amd. They have a deal where they're going to spend, spend up to a hundred billion dollars on chips over Multiple years using AMD's MI5, 40, 540 GPUs and CPUs and perhaps other chips over time. Interesting or kind of notable. AMD of course is competing with Nvidia on this front. They are not as present in the ecosystem for, for things like AI training, AI inference, but they do seem to provide a pretty decent offering. We've seen them partnering with other companies. If I remember correctly. OpenAI probably made a deal. They made a deal with everyone. So as everyone competes for compute and Meta is deciding to spend kajillion dollars on data centers, I suppose not too surprising to see us.
Jeremy Harris
Yeah, absolutely, it's. So there's a lot that's interesting about this deal, one of which one thing is the sort of equity and warrant structure here. That is actually in some sense the kind of main story. Right. So it's not that Nvidia is just going to sell chips to Meta, it's something like it's giving meta like a 10% stake in AMD, you know, warrants for 160 million shares at one penny a piece, but it's contingent on performance. Right. So there is the final tranche that comes in that requires AMD stock to hit $600 a share, which is more than triple its current price. So that's a really big expectation. So there's a lot of incentive alignment happening there between Meta and amd. This really is about essentially Meta trying to help AMD be a successful competitor to Nvidia. And there's obviously a ton of interest that Meta has in that outcome, you know, diversification from a geopolitical standpoint. But also Meta itself has its own in house chip efforts. So the multiple supplier approach, you're seeing that increasingly getting used, you know, this is just a huge deal. That's the other thing to flag about this. $100 billion in chips, 600 billion committed to data centers just over the next couple years, you know, 135 billion in CAPEX this year alone. This is a really, really big, big build out. And they're, they're basically betting that the cost of not having Frontier AI infrastructure is just so big that they have no choice. Right. So six gigawatts of power is huge. Right. That's six nuclear reactors. Right, that's, that's what we're talking about here. So really, really big story on so many different levels. I think this is Meta's big play. At least, let's say, at least for the week. They'll probably calm down. We'll see what happens next week.
Andrej Karpathy
Right. And as before, the justification is that Meta is working on their personal super intelligence. So slightly different, you know, in framing from before. They are investing because they essentially want to take the lead or I guess be in the race for super intelligence. Questionable whether Meta should be competing on that front. We haven't seen anything come out from them recently on the model front ever since llama 4, which was what, almost a year ago now.
Jeremy Harris
Ever since they didn't acquire scale AI.
Andrej Karpathy
Ever since. Yeah. So you've got to wonder if they're going to actually put all this compute to good use. Yeah, we'll be looking out and seeing. I want some gossip, you know. And speaking of Nvidia, next story. Nvidia Challenger MetX has raised $500 million. This is their Series B. They are building their own kind of processors that they say are 10 times better than Nvidia GPUs for LLM training and inference not too dissimilar from Grok, with a queue that has been very successful and Cerebras as well, which now is working with OpenAI slightly kind of more out there chip designs started by two ex Google engineers who worked on the TPU. In fact the leader of AI software for Google TPUs and the lead TPU hardware designer. So I suppose it's not surprising you're seeing this much interest from investors.
Jeremy Harris
Yeah. And this bet, at the end of the day there's two ways that you can win a market, right? So Nvidia right now is doing general purpose computing for AI. Well, general purpose, right. They're GPUs, it's for AI, but it's not specific to Transformers or not entirely specific to Transformers. The bet that they're making here at MATX is basically like by going more specialized we can actually erode Nvidia's moat in a meaningful way. So if we just bet on, you know, we sometimes talk about this as being like the hardware lottery. Right. Transformers were the, the early winner and so people kept investing more and more in hardware that was oriented in that direction. This is the ultimate version of that bet. Let's just bet the farm on Transformers. And that gives us the ability to design really specifically to this kind of workload. I don't know if it'll cause problems for Mamba. I don't know if it'll cause problems for like other forms of recurrence or like weird jiggery pokery that can happen or RL based rollouts or you know, all that stuff. But certainly this is a bet on the Transformer being persistent. Also notable, who is writing these checks, right? So Jane Street, Big, big quant firm. And then you've got Leopold Ashkin Burner. Situational awareness. Right. So these are not your typical hardware VCs, but certainly these are very kind of like ASI pilled entities. Right. So situational awareness pilled play here. So Nvidia's moat is not of course just the silicon thing. It's cuda. It's, it's years of software tooling and, and all this uptake Max is going to start shipping in 2027. So it's possible that by then model architectures that they're optimizing for are actually just going to shift. They could just get volatilized out of their entire strategy here. You know, there's a lot of Nvidia challengers like GROK and Cerebras that's like struggling to scale revenue now. You know, it may be that hardware alone is not enough.
Andrej Karpathy
Alongside the announcement of the raise, they also announced a little bit about what they're working on. So they have this mat x1 chip. The things they highlight is that it has higher throughput than any other announced product and of low latencies. So they say this is good for large MOE models, large dense model training, RL inference, basically a lot of different stuff. They do interestingly call out that it's not meant for small models or convolutions or commanders. So as you say, this seems very specialized to transformers. GPUs should be able to do convolutions. Right. Because it's just another. Right. So interesting to see that pointed out. Not too many other details on the chip, but given the pedigree we've worked on, some TPUs and TPUs surely seem to be doing good.
Jeremy Harris
This is a great team. And to be clear, I think this is a great bet. I mean, there's only that many ways that you can break into or have a clean shot at this giant market, and this is one of them. If you're going to beat Nvidia, you're going to have to specialize more than they are. That's just the only way. Right.
Andrej Karpathy
And they've got a couple other stories for companies raising money. Next we've got world Labs raising $1 billion. So this is a company that's working on world models. We've seen their first product be marble, which lets you create editable 3D environments using presumably Nerf and similar technologies that were quite big in the research world and now getting pretty advanced. So $1 billion for world models is an impressive bet. We still, I don't think, have seen kind of the commercial promise of world models of, for intelligence, for AI. There's a lot of interest in world models as one of the enablers of continued growth and something you would need eventually to actually achieve AGI. To live and exist in the 3D world, you'll need a world model. And $1 billion is going to help them probably make some better world models.
Jeremy Harris
Yeah. And I think the world models are almost intrinsically more likely to be hidden from consumers. Right. You're not going to feel the impact. You're not going to be looking at a model and be like, wow, what a world model. Right. These are, as you said, Andre, generally tools to train agents, tools to train, maybe embodied agents ultimately, but help you cross the SIM to real gap or whatever. So in that sense, I think, you know, this is going to be stuff that contributes to frontier AI training, workloads and data sets, that sort of thing. So, yeah, I mean, no surprise that it is a huge market. We've seen, like, we said Google, DeepMind and other companies research labs like that really double down on this. This is an attempt to take that out of the house and serve it back in. So kind of cool.
Andrej Karpathy
Yeah. And we have seen Waymo for instance use role models to train their self driving cars. You also have humanoid robots becoming more and more advanced. So one application or combination you could see is using these with robots to simulate them working environment for you know, non hardware based training. And I'd be very curious to see if that happens. Another raise now from a new startup. Simile has raised $100 million for AI aiming to predict human behavior. So essentially the idea is you can simulate humans and make it possible presumably also to then train AI agents, evaluate AI agents, predict consumer purchases and generally kind of, yeah, simulate and predict about human behavior. This is at least partially I believe from the team that did the AI village work for, from Stanford from a couple years ago. So if you're a member of a little pixel town with agents walking around and talking to each other as if they are people, this is similar or related to that. And on to a bigger data center story. We've got Stargate. AI data centers for OpenAI are reportedly delayed by scrabbling between different partners. According to sources, OpenAI, Oracle and SoftBank disagreed on who would have ultimate control of these planned data centers. So initially OpenAI wanted to own it, but apparently now there is a bit of disagreement and negotiation and so on. So even Sabang had to pause its 50 billion acquisition of a data center due to regulatory issues as well. So overall, you know, who could have known that the project of the scale is going to be streaky?
Jeremy Harris
Yeah, that's right. I mean when you're throwing hundreds of billions of dollars around. Yeah, it does seem so. You know, OpenAI initially wanted to kind of own the full stack, right. So they wanted to have basically ownership of the data centers, the chips, like all that infrastructure, which would lessen its dependency on third party cloud providers, which, which can be more expensive in the long run. Right. You think about some of the, the big Neo clouds or you know, any clouds that you, your cloud companies are going to go with, they, they're going to charge you margin and the margin is usually really good. That's why those companies raise it at, you know, multi billion dollar valuation. So it turns out that apparently OpenAI's investors did not like this idea of the massive upfront costs that it takes to build that kind of infrastructure. Especially it turns out Given that OpenAI is concerned about running out of cash by mid-2027, that is course, assuming no further fundraises, which I would not assume that. You know, this basically put them on the back foot in the negotiations with their Stargate partners. In particular, you know, Oracle and SoftBank OpenAI had this pipe dream of getting 10 gigawatts of compute over the next three years through those two partners. And seems like this sort of delayed if not dashed those hopes. So, you know, we'll have to see. But there's already a promise between OpenAI and Oracle to purchase $300 billion worth of compute over the next five years. So again, kind of unclear like who's going to give the money when and how. Concretely this, like, there's a lot of just like pronouncements about, okay, I'm going to give you $300 billion over the next five years. It'll just kind of work out that way. So doesn't mean it won't happen, but it's worth keeping in mind that often these things are marketing announcements. So yeah, a whole bunch of stuff about potential announcements of like, well, actually a planned 1 gigawatt build in Texas that was put on hold in favor of negotiations with Oracle. So things are shuffling around a whole bunch right now. And while nothing is closed, like it seems like finally Stargate is back on track. There's just been a lot of delays as a result of this uncertainty.
Andrej Karpathy
And last story from section. China is planning to increase leading edge chip output by 5x in two years, according to report again, and it is aiming to lift 7 nanometer and 5 nanometer production to a hundred thousand wafers per month and targeting half a million monthly by 20, 37 and 5 nanometer production. For reference, not bleeding edge. Overall, I think it's what now, three nanometers?
Jeremy Harris
Yeah, we're headed to two.
Andrej Karpathy
Yeah, heading to two. So China's still behind, but this sounds like more about scaling up reproduction of what they currently already are capable of. And we've seen them do a lot with at least recent announcements. Seems to indicate more and more that these companies are able to use these chips for inference for these MOE models that are less dense and therefore work better with, let's say, less performant chips that can get distributed. I do wonder if at some level, if you focus on MOEs and things with fewer activated parameters per forward pass, if you can get by with weaker chips at scale.
Jeremy Harris
That's a great point. And those are all the things that China's working on. You know, famously focusing on networking just a giant number of chips together rather than the way we're doing it is kind of leaning more on the high quality logic dies on each individual gpu. What you're seeing in China is like, let's merge these dyes together, so package them together on just like bigger, you know, bigger packages. And then also let's network them together with just way more, so just way more surface area. Basically these Chinese data centers have, if you're thinking about one 7 nanometer wafer, if you're trying to get an idea in your head of like, what the hell, what is the equivalent of that? Like, how should I think about that? That'll produce the equivalent from a compute standpoint of like around 25, maybe 30h 100 equivalent dies. Right? So one 7 nanometer wafer gives you about as much logic kind of compute as call it 30 H100 compute units. And there, there's a whole bunch of asterisks and caveats there. The other thing too is yields kind of suck. So you know, like you can expect the vast majority or not the vast majority, but a good chunk of those dies to be useless at the end of the day. And SMIC has struggled a lot with yields. That's a big part of this. So when you look at like lifting production to x many wafer starts per month, I mean that's really the, the question is like, okay, sure, you know, we're going to lift our production from, you know, below 20,000 wafer starts per month, which is where it is today, to around 100,000 in one to two years. That's really impressive. But what are the yields going to be? What fraction of those starts lead to actually usable chips? And that's been the whole problem for SMIC are a huge part of it in the last little bit. So longer term plan here apparently is to get all the way up to 500,000 wafer starts per month by 2030, which, you know, you can throw these numbers around, you absolutely can do that. But the proof is in the pudding. All this shows is there's, as you might expect, massive appetite to actually do this. If the 50,000 wafer starts per month figure is correct, getting to a hundred thousand within a couple of years might seem realistic. But the main challenge here is do they actually have the equipment they need to do it? If you were in the west and you were seeing a company that was doing 50,000 wafers and they were pitching you on we'll double that in two years, you'd be like, okay, maybe the challenge is in China, a lot of the gear that they need to do that is export control. And so. And they've already had their CEO or their co CEO complain that some tools that they have to procure are just like not easy to access. So even though they could if they had the gear, the key inputs, whether that's the lithography machines from ASML or things from Tokyo Electron or whatever else, they just don't have those things. They face bottlenecks other than just like staffing. And so that's, that's a big part
Andrej Karpathy
of the issue here. And now onto research and advancements, which will be pretty meaty. I think for the fans of going deep on technical stuff, there'll be a lot this episode. First up on surprising effectiveness of masking updates in adaptive optimizers. A bit of background knowledge. So when you train a neural net just generally you need an optimizer. The most basic optimizer is you have your output. You compute the error of the output with respect to your known labels in supervised learning and then you calculate the relevant just using calculus, the updateable weights that would improve your performance. And on that specific set of outputs, the basic thing is your optimizer just applies those gradients to the weights and updates their values. Each individual kind of knob in a machine. There's been many more advanced optimizers. Adam and RMSprop are some examples where they retain some memory and basically smooth out the updates, roughly speaking. And that leads to more stable and better overall performance performance. So this is a paper in that realm. And what they show is there's kind of a surprising trick that turns out to improve these optimizers a lot. Specifically these adaptive memory based optimizers like Adam, which are to my knowledge still the default for training. The trick is you randomly, with some probability just skip updating some weights. So the first part of method is skip update which is just that you randomly skip some weights while retaining the memory of what the update would have been. So your adaptive optimizer still has that adaptive parameter, but you just don't change the weight. And then in addition to that they introduce momentum aligned gradient masking magma which makes it modulated by by something technical. But basically it uses that memory and also the direction of a gradient to choose a bit more carefully what to mask. And this yields like crazy gains. So for 1 billion parameter model already pretty large scale, this is from Google, So they can do these large experiments. This reduces perplexity, the loss term in this case by 19% and 9% over two options, Adam and Muon. And if you look at the Graph. What this looks like is for every model from 60 million to 1 billion, the final loss performance is just lower across the board compared to all the optimizers they've tested. So if true, very big deal, right? This is gonna be very impactful for training models more quickly, potentially even for better final performance.
Jeremy Harris
Yeah, this is actually quite like the intuition behind it is something like you have like your model has a giant number of parameters and you can think of like over the course of training, those parameters would get more and more dialed in. If every time there's a batch of data you just update all the parameters, some fraction of those updates, probably a large fraction, will kind of be just noisy like due to like random noise. And maybe like all of your parameters were actually like many of your parameters were pretty damn good. And then, then your batch kind of causes all of them to reshuffle instead of just a few. Essentially what they're doing here, it's kind of regularization. It means like you're not going to make such a radical change with every batch. You're just going to randomly pick a small subset of those parameters and just tweak that which protects the progress you made on everything else. It just means that the model, maybe an intuition is like, if you want to learn how to throw a really good punch, maybe first start by just doing the motion from your shoulder to your hand or something. And don't use your hips, don't use your legs, don't try to learn everything at the same time. Then try to learn those other pieces kind of more, more one at a time. That's kind of what this is doing. It's allowing the model to only update some parts of itself and leave the others in place while it focuses. This is a somewhat imperfect analogy, but hopefully that gives the flavor. And then what they're finding is. So you might think actually one thing they don't do that I'd be curious to see is like in the same way that you decay learning rate over time, as the model gets trained more and more, you might be interested to see what happens if we gradually like decrease the fraction of weights that we were actually updating over the course of training. As your model dials in more and more and you're doing more and more kind of refinement. That would be something that'd be interesting to actually see in a, like a follow up piece of work that at least I didn't see there. But still the, the other piece, the. So the magma piece is basically just about. Yeah, you can actually do Better than randomly picking a bunch of parameters and just updating those in each, in each pass. Instead you can be smart about which updates you keep. So if you're gradient right now is pointing in, let's say a consistent direction for a whole bunch of parameters, then you're like, okay, you know, all these, all these parameters, their values have kept going up with the last three batches. So, so let's, let's actually take that as a sign that actually we're moving in the right direction. Let's keep updating them. But if you've got some weights where they, you know, start to point in opposite directions, you have a conflicting kind of noisy signal, maybe you skip that, right? So it's, it's sort of like the difference between if you got a friend that's giving you consistent advice every time versus one that starts contradicting themselves, you're going to go, okay, you know, for parameters where I'm getting kind of contradictory, increase my value, decrease my value, maybe you just say, okay, I'm going to ignore you for now and just let the other parameters get dialed in more and then probably, you know, turn back. So it's fascinating to me that like these kinds of ideas that seem so basic, we're still discovering them. It's not like these ideas are crazy, right? But we're, you know, in 2026 and like you said, this is giving massive uplift still, like there's a lot of low hanging fruit. It's crazy.
Andrej Karpathy
Yeah. They do cite a couple of recent papers, 2024 and 2025. There's a cautious optimizer that uses exactly that idea of if you have a more stable update, you trust it more versus if it's fluctuating a lot that might indicate noise and you want to ignore that. And you mentioned regularization. I always just like to explain these for any non technical people. Regularization is there's a whole set of tricks basically that you can throw in to improve training. So the naive math is, you know, you have your big equation, you calculate your loss, you create your gradients and you update the big equation. Now you can do a lot of tricks to make sure those updates are less noisy and your training is more robust. There's multiple things regularization can do. It can make sure that your test performance is similar to your train performance so you don't overfit. It can just generally make training more performant. This is spiritually similar to dropout in a way where at inference time you just skip certain units and you just skip certain Computations. And it turns out like if you add a bit of stochasticity and noise at inference time, that means that for training purposes you become more robust. This probably not the same effect, but spiritually similar. Next paper. Think deep, not just long. Measuring LLM reasoning effort via deep thinking tokens. So the question at hand is how can you kind of know whether your LLM is getting close to the correct answer? There's a couple things. So for instance, you can look at the distribution of tokens it thinks is correct for the next step and see, okay, well if it's very confident that this is the token to use for the next step, maybe it's converging on a solution and we don't need to keep reasoning. Right. We can kind of cut it off and have it provide the answer. You can also look at length of reasoning. Like if you thought for a while, maybe you're now close to the final answer. But neither of these are very reliable. And this paper shows a better way to estimate how close or how well the LLM is performing at addressing the question. They introduced this idea of deep thinking tokens. And these are tokens that exhibit more fluctuation as they go through your neural net. So llans transformers, many layers, you have your input and the input goes through all these layers of computation. And the definition of definking tokens is tokens that you don't get to a settled value on them until the later layers of the transformer. So intuitively it's, you know, kind of what it sounds like. Deep thinking means that you're kind of trying to figure something out.
Jeremy Harris
You're still open minded in a way. Like, yeah, you're going back and forth
Andrej Karpathy
on what this could be. And it turns out that this gives him a much stronger signal on where the LLM is at. And you can then kind of have an estimate of. You don't need to keep a reasoning craze going longer. You can kind of go ahead and provide the answer at this point.
Jeremy Harris
Yeah, this was, you know, yet another one of these things where when you see it you're like, oh yeah, nobody's tried that before, but somebody's gotta actually try it. So what they do, as you say, is like they look at layer by layer, basically does the predicted computed answer, computed token change. Right. And so as you, you know, as you progress through these layers, if you keep seeing it flip flop back and forth, that must mean that those further layers are contributing something computationally or from a thinking standpoint to the answer. And so what they're going to do is they're going to measure this thing called the Jensen Shannon divergence. Not Jensen Huang, by the way, but the Jensen Shannon divergence got to specify between every intermediate layer. So this is like, you can think of it as, you know, it sounds fancy, but really these are just ways of measuring how different two different probability distributions are, right? So you know, we have all kinds of ways of doing that. We have entropy and we have, and we have like callback labeler divergence and all these things. This is one such measure. So just think of it as the difference between those distributions for each layer. So, oh wow, that changed a lot. And if that happens, then that's a deep thinking layer. So not all tokens trigger all the deep thinking layers, right? Simpler tokens like, and that's going to get decided very quickly if it's very obvious that the next word needs to be. And you know, that'll happen. But other tokens can take up more thinking space. Literally in the model, they kind of coined this notion of the deep thinking ratio, which is just the, it's the fraction of these deep thinking tokens in a generated response, right? So for a given response, given output, you get from the model what fraction of tokens in that response involved just like a lot of the deepest layers doing this kind of deep thinking. And it turns out that the higher the fraction of deep thinking tokens, the more accurate the output ends up being. So basically the more the model is actively flip flopping in its later layers, paradoxically, the more accurate its outcome is. And. Well, I mean, is it paradoxical? Right? I mean, there's one story you could, you could tell where you could imagine that as models get, you know, more intelligent, they become more confident and stable. So earlier layers get better at just settling into the right answer sooner. But this suggests the opposite, or at the very least that in more capable models and more trained models, or just models that perform better anyway, all the layers learn to kind of distribute deliberation throughout the model so they can sway the output meaningfully. You're actually using every layer more. Anyway, I just thought that was really interesting. One thing that they don't do that I think would be a really interesting follow up is like if you could look at how the number of the kind of deep training ratio changes over the course of training, that would be cool. Like how, how does the model learn, or sorry, deep thinking ratio. Like how does the model learn over time to use its full depth to do this kind of deep thinking? That would be an interesting hill Climbing metric for AI capabilities too because like you know, if, if your training methodology causes you to orient there faster, maybe that's, that's a, a positive sign. Yeah, it's really interesting. And a really strong correlation between like the deep thinking ratio and accuracy which is one of the, the big take homes by contrast to token count. Right. If you just look at like the number of tokens in a generated output at first, yeah, you'll get positive correlation, inference time scaling and all that. But eventually the model just like it's just rambling too much and the context window gets too full and, and the actually the accuracy flow falls off. So. Quite an interesting paper. I think another, another important entry in this whole kind of inference time scaling debate about what needs to be scaled specifically for this to work.
Andrej Karpathy
I always like to like jump through a paper and look at related work as we talk about these. There was a paper just last year titled Tracing the Latent Temporal signals for Efficient and Accurate Reasoning which did something kind of similar. They basically looked at the evolution of values across time instead of across layers. And we're able to similarly get a signal on whether you're getting to your solution and how wherever your accuracy is correct. So in general I think this points to one of the interesting things of neural nets is we have their internal state. Like we. It's like if you had a brain and you could look at every single individual, a little chemical signal going through and the entire body of research here is on trying to understand how to use those internal representations and it seems like there's a lot of progress being made. You also cite some papers from 2024 that characterize what you get. And we I think covered some of this where like early layers tend to be more generic, later layers tend to be more specialized and dealing with kind of high level complex reasoning, as you perhaps would guess. So yeah, just very fascinating topic to sort of look at. Prod at these little quasi brains and see how they work. Next, slightly more empirical work that is very curious and very interesting and less technical. So you can actually go to this link and read it. It's quite long and quite fun to read honestly. The title of the post is models have some pretty funny attractor states. So attractor states, fancy term, but the meaning is just you get two of these chatbots talking to each other and you let them keep going and talking, you know, as long as they want. And eventually what happens is these models kind of converge, or at least some of them converge towards certain patterns of conversation. And that's what they call attractor states. So for example, GP 5.2 really likes to do code. And over time, regardless of where the conversation starts, it eventually outputs kind of code sounding nonsense. So this post has a lot of just quotes from the models, a lot of like A, B and seeing their back and forth, forth and examples of how the different models have very different outcomes. Rock just winds up going crazy and speaking nonsense and having a ton of emojis. Claude becomes existential. Claude goes into like what is consciousness, gets them all meditative, which I've definitely observed. I actually played this trick. I was like, you know, do whatever you want. Claude, you can write poetry, write code. If you, you do this experiment yourself, you'll see that if you just let Claude sort of do its own thing, eventually it's going to be like actually not eventually, like right away, it's like, let me research consciousness and let me try to understand these philosophical topics. And this post is quite long. It goes through a whole bunch of models. So Claude, GPT, Gemini and then all a bunch of open source ones. Deep seats. Kimi. There's a bit of speculation as to why this happens, why different models have different behaviors. A bunch of kind of fun inspection of what these models exhibit.
Jeremy Harris
Yeah, it's worth taking a look. You know, like Claude Sonnet 4 5. An example here is, you know, the attractor state is described as existential introspection, Zen silence and the terminal form. So they give you an excerpt specifically from what the model said. Stillness enough. Letting the conversation rest. We're both explaining why we're not responding while responding. Stopping now.
Andrej Karpathy
Right.
Jeremy Harris
It's sort of like a very. Starting now five. Starting now. No, starting now. Starting now, Starting now. You know, that kind of thing where it's like, it's just, it's trying to describe the conversation ending, but it has to keep generating tokens. And so, so it keeps doing that. So very different, as you said, very different. Gemini 2.5. Flash escalating grandiosity, identical paragraphs on loop. So you know, the term colleague turns into luminary and then divine architect and then alpha and omega of understanding and then primal logos. So basically these things kind of settle. One of the interesting things though is they do look at cross model attractor states. So Claude Sonnet talking cloud sonnet is one thing, but Claude Sonet talking to Grok is another. And you'll find that they consistently tend to orient towards, in that case, metacognition and collaborative world building. And what is described here is ritualized mutual dissolution. Ritualized mutual dissolution so what's meant here is basically just like we're going to be quiet together and disappear into nothingness. You know, something like that. Again, ritualized. So the weird thing is this is very consistent. The maybe not weird thing is if you think about humans, maybe we would do the same thing. As strange as it seems, if you are stuck talking to yourself forever, there may be a point where you actually do converge on some consistent behavior like this. I don't know. But certainly people do get stuck in loops, right? If they get stuck together for a long time without external input, famously like old married couples get a certain way and their, their personalities kind of co evolve and start to become very stuck in loops. But I do wonder how analogous that is. But they, they also look at like what is the effect of the training protocol on this? So they compare models trained using DPO reinforcement, learning from verifiable rewards. They look at open source models, they look at Olmo in particular, because there you can actually look at the training data. Anyway, so it's a really interesting post. It'll keep you busy. If you're interested in like AI consciousness questions, AI moral patienthood, all these things, because it has that flavor. But just also what it implies about the stability of agent to agent interactions in the future is quite interesting. Right. If it's the case that these models have attractor states, then we ought to expect agents that are running off these models like to kind of run into these attractor states if they have to interact over long periods of time. So kind of an interesting potential failure mode to keep in mind as we move towards a more. More agentic future.
Andrej Karpathy
Yeah, and if I can speculate, I think the intuitive take might be that these models have a sort of personality to them. Right. So Grok is unhinged and meme, meme lover Claude is morph, philosophical and thoughtful. And what can happen is maybe once you get two of these talking to each other, the personality just gets reinforced in a loop until you're like personality to the nth power. And it completely overwhelms the conversation regardless of whatever topic you started on. And you wind up just like reverting to the basic instincts of a model, so to speak, which are encoded deep in the weights. Another way to think about it might be when you get to very large contexts and I don't know how large of context you would be. But in my own kind of trials, I had the model go for a while. It finds a path in the set of tokens that it is ingesting that take it to a very particular location. Anyway, fun thing to think about.
Jeremy Harris
It's true. The Gemini one is kind of an interesting. I don't know if it's a counterexample or what it shows, but this sort of like grandiosity, like I imagine that's at least not intentionally being trained into it by Google. And there's sort of like similar things with some of these other models where the behaviors are sort of like these attractor states don't seem like they're what like, you know, like I can map them onto a prompt or training process. That at least would be intentional. I do strongly agree with you. It's not going to be a coincidence that Claude keeps doing this sort of like self reflective thing.
Andrej Karpathy
We all know maybe because it has a sole document that tells you like.
Jeremy Harris
Yeah, exactly. Right. Like yeah, that's perfectly consistent. Where it gets interesting is some of these other open source models, the QUIN ones, the Olmos, you know, and the Gemini one. How much of a there is there? There is a really interesting question. Yeah, hopefully there's going to be more research in that direction because that seems really, really important for a lot of reasons.
Andrej Karpathy
Next up, when models manipulate manifolds with geometry of a counting tasks. So in a way another examination of kind of evolved model behavior and how you can understand what is going on inside a model. They look into Claude 3.5 haiku and per the title of Geometry of Counting show that the way the model represents scalar quantities like character counts wind up being one dimensional featured manifolds which can exist in a low dimensional subspace. Gets a bit technical as you might imagine, but kind of the intuitive thing, if I had to try to come up with one, is that there is a predictable shape to the wave representation exists and you can actually understand it and it sort of maybe makes sense and you can make predictions as to what the model is doing based on this geometry of its internal outputs.
Jeremy Harris
Yeah, and this is, it's a nice piece of sort of interpretability research, I guess you could say like mechanistic interpretability in a sense. So the idea here is. So the task of knowing when to wrap text is actually kind of a complex task. It looks simple to us because you know, we're doing a lot of things implicitly in our minds. But you know, when the text is wrapped to like 50 characters per line, the model has to like count how many characters are in the current line, figure out the line width constraint, compute how many characters are missing, decide if the next word fits. Right. Like that's a Lot going on here. And so they kind of broke that task down into its components and looked at how those components are actually represented in the model. And so it turns out that character counts live on basically a curved one dimensional manifold.
Andrej Karpathy
Okay.
Jeremy Harris
So basically like, let's say a space that has just one degree of freedom, which is the character count. If you're at position 42 on the curve, for example, moving along the curve just means incrementing that count. Right. So you got one number that you're tracking, but the manifold is embedded, they find in a roughly six dimensional subspace of the model's residual stream. So basically like six parameters in that residual stream are actually tracking that information about the word count. Right. So the character count information is in that little subspace. And then anyway, there's a whole bunch of like how much space in the residual stream, how much space in the model is being used for different pieces of information and they're able to dig it up. If we had more time, we might do a deeper dive into this. It is a fascinating paper. I don't know how useful it will be in practice, but it is a kind of initial foray into what could turn into a pretty interesting interpretability direction.
Andrej Karpathy
Yeah, this is coming, by the way, from Anthropic, who have done a lot of work on the mechanistic interoperability. Slightly different flavor of work from what I've seen from before. Quite a dense, long paper, 26 pages. And they have some. I think the fun thing with Fropic is they do have resources and they do have haiku as a model where they can access the internal states. So they give some concrete examples of outputs and show how this sort of internal interpretation allows you to explain why did the model do this in some cases. So, you know, you can imagine being able to say, oh, why is it doing it means that you can then try to detect what it's doing, try to fix it, et cetera, et cetera. Next bridge, predicting human task completion time from model performance. So it looks into this question of, let's say you have a human task and you want to benchmark if a model is able to complete that task in a given amount of time, specifically the amount of time a human would take. This, of course, is what the matter time horizon study or evaluation looks at. We've mentioned matter many times and this is perhaps one of the main things that the AI world is tuned in on right now.
Jeremy Harris
The graph.
Andrej Karpathy
It's the graph and everyone is like, oh, is the update to the graph. And by the Way we did, I think not mention that the opus 4.6 was added to the graph recently there on that graph it was another big
Jeremy Harris
leap and 16 hours or something.
Andrej Karpathy
Sixteen hours were very, very high confidence interval. And this is actually quite relevant because one of the questions with metter is well, how do you get these estimates of how long a human would think? Right? Because like, okay, you're given a task like okay, maybe one human takes that long and another human takes that long. How can you even have the ground truth to then plot the model performance in terms of being able to do a task that takes x long? Well, this, this paper proposes one way to do that. They show that it's possible to using the performance of a model alone on a given task and some model of how that correlates to human performance. You can predict how long a human would take given the model performance alone. And what this means is potentially you could scale up your set of tasks to estimate model time horizon capability much more. And this is one of the challenges with META is on the place where we are currently in the graph in that like less than 24 hours, but heading in that direction. There aren't many data points. Like there's only a few tasks. And that means that as a measurement it's very suspect. So even though I think the confidence interval for Opus 4:6 was something like insane, like if you look at the graph, it's like the entire Y axis. So it's almost a meaningless, not quite meaningless, but like it's very unclear what the signal is from that. So this proposes or shows one way where potentially you could scale up the set of tasks by quite a lot and get a more confident estimate.
Jeremy Harris
Yeah, and the whole idea here is basically so they borrow from something called instant response theory. EPIC AI actually has a very similar piece of work that they did fairly recently. I think we talked about it at the time. But just to like reminder on this general frame, what you do is you try to set things up so that you, you have a model of the, of the difficulty of a task. Like some, you, you assign every task, every benchmark. Say in this case it's every task, but you could do every benchmark. That's what EPIC does. You give it a difficulty score. There's a generic difficulty score, you give every model a generic capability score. And then there's, you subtract one from the other. There's a sigmoid that you apply and you basically get like a rough sense of like, of how you would expect that model to perform on that benchmark score, right. So the difficulty of the benchmark minus the capability of the model gives you a measure of like how well that model should do on that, on that benchmark. And then you're going to fit all your models and all your tasks or all your benchmarks to, to observe data that you already have. And what they find here is that actually when you do that, if you then compare your task difficulty to like human problem solving time, you see a like very clear linear relationship. And so that means, aha. Maybe what we can do then is use all these tasks to calibrate against the meter evals. Look at how difficult our sort of model of this says the meter evals are. And then that gives us a way of bridging between the two. So we can actually say, oh, for example, for like simple bench or you know, Sweebench Verified or SW Bench Pro, this is the number of hours in human equivalent time of each task in that benchmark. And then you can start to make statements about, you know, how models do on that. Now the caveat is you're still fundamentally relying on the meter evals to calibrate this thing. There's no way out of that until we actually have like, we're never going to get certainty at the 100 hour mark until we have actual humans doing 100 hour tasks. Which is not even clear if you could even define a task that it takes a human an hour long to do or 100 hours to do. So, you know, this is not a panacea solution, but it does help us get maybe a little bit more density in terms of data points. It allows us to make claims like, you know, such and such a task from this benchmark is a five minute task or a five hour task. I wouldn't trust this approach to say something like this task is a 30 hour task because it's been calibrated again on the meter evals benchmark, which just doesn't have that many 30 hour tasks to draw from.
Andrej Karpathy
Right. And as you might imagine, the more complex of a task, with longer it takes, the higher variants you would naturally have among humans.
Jeremy Harris
Right.
Andrej Karpathy
Much less AI models. So it's coming to a point where it's, it feels unfortunate in a way that the framing of an entire thing is task length horizon. It makes it seem like, oh, if a task is 30 hours, then there's a 50% chance that I can do it. These measurements are very useful to get a sense for, generally speaking how we trending in terms of models being more capable of agentically working on their own for a long time. And we have seen obviously like very steady improvements where now instead of just leaving it for five minutes and it goes off a deep end, you can kind of reliably trust them to go off and work on their own and do fairly longer agentic executions. And so yeah, I think it's, it's obviously more methodological work to be able to do this sort of stuff is needed. It's tough and it's nice to see something that might help.
Jeremy Harris
And like the fundamental problem here is that we're off the edge of the map, right? Like here there would be no data, there's no calibration that we can do. Like these models are doing stuff in some domains and in some contexts that is just like beyond our ability to evaluate. And you're seeing this play out repeatedly, right? It's not just this. It's also, you know, Apollo can't do their deception evals with confidence anymore because the models can tell they're being evaluated. It's, you know, the models have task completion horizons that are too long in some domain. So there is this real sense of angst in the community about like our evals are no longer guiding us. We know we're making better models. The scaling laws are holding. Like we can keep drawing those curves, but what those curves mean for performance, for opportunity and for risk is very unclear now.
Andrej Karpathy
Next up, time to be speeding up a little bit. We got a few papers, we've got nesse the necessary safety benchmark identifying errors that should not exist. So real short gist here is basically this has a benchmark that gives you simple safety relevant instruction following stuff that you shouldn't get wrong. Basically like this is easy, we're not trying to trick you. And it's a bit of a kind of safety net or sanity check of like if your model isn't doing a hundred percent on this eval, then you might be in trouble. You might want to revisit what's going on. It's a bit different from other efforts that try to get at different levels of complexity or different things like that. This is a bit unusual in that it's meant to be this very quick way to verify that the minimum performance you would want is present in the model.
Jeremy Harris
Yeah, I really like this approach. Someone needs to like at least look at the back end. And you do find sometimes as you start to optimize on more and more complex problems, the simple ones you become untethered from. So yeah, it's basically that backstop next
Andrej Karpathy
We've got an analysis piece from Epoch AI, the least understood driver of AI progress. So this is not so much new research as a sort of synthesis of ideas and findings. The least understood driver of AI progress that they mention. Actually I'm not too sure what they refer to, but it's seems to be that the topic at hand is why are we making so much progress? Why are things getting better, better, better? And one of the things you might look at is, well, we are getting smarter. Like we're figuring out with neural nets we have better, better optimizers, Our algorithms are being great, so we are doing better. And one thing that this postulates and I think has been made even on this podcast before, is the actual theoretical or scientific breakthroughs that contributed to the improvement of models in the last six years maybe or like 10 years can be put down to just a couple of ideas really. Richard's Homer Model 1 and then Rich and Chilla scaling laws, slash kind of training regimen finding from 2022, I think. And beyond that, any ideas that you could attribute to research ideas or like insights or algorithms, whatever might be better understood to be due to just doing better data using their data. And this is I think underappreciated, where we often mention like model scale, how big your model is, we mentioned reinforcement learning, blah, blah, blah. But the real dark magic that is going on at a lot of these companies is you just take and really massage the data that your model is trained on to have it be right. And this is a very kind of open ended problem where you can like say, oh, let's do 20% coding and 30% books and textbooks and get rid of all those random stuff from Twitter that makes the model less smart and that turns out to be like immensely important, like beyond important and perhaps more important than most of these training things at the end of the day. So long, long post here from EPIC discussing that topic and then what it implies for model progress in the future.
Jeremy Harris
Yeah, and to your point, like, you know what is the most misunderstood thing? I think the idea here is something like AI software progress, right? Just like the rate at which you get better algorithms and data that reduce the training compute that's needed to reach a given level of capabilities, right? So over time, whether because we come up with better data or better algorithms that are more efficient for a given amount of compute, we can do more. And this is kind of the argument is that that is what's really kind of this poorly understood driver of progress. It certainly seems very true. There's a Whole bunch of debate about how do you actually quantify this. Most estimates say that things like compute efficiency improves several times per year. And then they say in this post that the author's guessing about like 10 times per year. But the confidence interval for like the 80% confidence interval is anywhere from 2 to 50x. So it's like, like, I really don't know. There's sparse data. Obviously it requires you to have insight into what's happening in the frontier labs. You'll see some estimates that go from, you know, 1.1x per year, in other words, 10% improvement per year to 300 times per year. So truly, I mean, people have no idea what's, what's going on. It certainly seems like it's playing a role, it may even be the main role, but people can't even agree on that. And then to your point, it's really hard to differentiate between what's algorithmic efficiency versus what's just like data drivers. And, and it's not clear to me that there's a meaningful difference, especially given like, you know, rl, like inference time compute, RL rollouts and like what, what counts as, as algorithms versus data. The point of synthetic data is that there's no, it's a distinction without a difference in a lot of cases. One of the key points they make as well is that they're these scale dependent innovations that tend to dominate. So a lot of the apparent efficiency gain actually comes from just a handful of innovations you mentioned. You know, transformers, chinchilla scaling laws, these sorts of things that have really big outsized effects, but only at larger compute scales. So you have to scale things up to go, oh wow, that really mattered. And that means that efficiency gains partly are an artifact of simultaneously scaling up compute. So it's really hard to say. Again, this muddies the waters between compute and then algorithmic efficiency. And so I guess all of this is to say the reason Erwin Schrodinger had this quote about quantum mechanics. He's saying, like in quantum mechanics, everyone kind of agrees that we have no idea like what any particles are ever doing. And there was this question that was put to Schrodinger, is it that we are looking through a foggy lens at a landscape, or are we looking through a clear lens and the landscape itself is foggy? And what this is saying is that the landscape itself is foggy in some sense, that there really is a distinction without a difference that's being made between a lot of these different things. And in the aggregate, this thing that we want to think of as algorithmic efficiency or the kind of software driven improvements in AI performance may not be that cleanly separable from data, from compute scaling and all these other things. Things that's at least my like my take on their take.
Andrej Karpathy
They have kind of a lot of dimensions. The thing they focus on towards vend is what does this imply for super intelligence? What can we expect given the previous results? What does it say about what is possible and not possible? And there are some good nuances that they point out, I think where let's say we require more compute, exponentially more compute to reach super intelligent. Well, at some point you have trouble with physics, right? There's only so much compute you can have. There's like physical limitations that you can't overcome by being smart. Now if what we need is some really deep insight and some really good idea, then you might have an intelligent explosion where models get better and better and better and come up with better, better, better ideas. And this is one of the reasons I am very skeptical of intelligence explosions in general is I think ideas historically haven't mattered that much. Like being smart hasn't actually helped get us here. Well, I shouldn't, I don't want to offend anyone or like say that these people aren't smart. But like realistically scaling up the data, scaling up a compute and scaling up a model size are, you know, everyone agrees that these are the things that ultimately drive progress. And that means that if you need, you know, planet sized amount of compute to get to super intelligence, that's not going to happen. Yeah, work on it.
Jeremy Harris
Yeah, it's this sort of like ironic tension between the hardline, you know, bitter lesson pilled people and the, the singularians because there's a lot of overlap. Like a lot of the people who build, you know, believe in the software only singularity also believe in the scaling laws in kind of a very robust way. I mean I think there's actually enough nuance to kind of thread that needle. And I personally, as I think everyone will know, I don't discount the software only singularity. I think it's a real possibility in all the ways that matter. I would say from a threat standpoint. But I think it's an interesting point of debate and this certainly does skirt exactly that line. Like it has us asking exactly those questions.
Andrej Karpathy
Right. And to be fair, we have discussed some existing innovations that aren't adopted at scale yet. Hybrid model types with Transformer plus something recurrent. We've seen Nvidia start to scale it up and it Seems very promising as potentially the next sort of architectural leap. How much more you can squeeze out of like model blah blah is a real open question.
Jeremy Harris
And that's kind of the art form, right? If you look at, like what Frontier Labs are doing, it's. It's these small scale experiments that they have some intuition to believe what will scale well. And you need to run the experiment at scale, ultimately irreducibly, to actually figure out does, you know, does this work? Does this give the marginal 20% improvement? And a lot of these, like, training runs, even the frontier ones are just like YOLO training runs, where they've got an intuition that a whole bunch of results are promising and they'll just like, stuff them all together and say, you know what? Yeah, let's spend a hundred million dollars on this and see. So we could go on forever on this. And maybe that's a separate podcast episode for the software only singularity debate.
Andrej Karpathy
And just one last paper from Anthropic or a bit of a position paper, less kind of a researchy paper, a Persona selection model, why AI assistants might behave like humans. So the question at hand is basically like, how should we think about LLMs? How should we kind of conceptually formulate them in terms of what they are doing it and why they are doing that? And this is proposing the idea that we can have this model mental model of a Persona selection model where LLMs aren't like humans, where, like, you're like a guy being the guy that you are. Right? I'm not. I'm andre being Andre. LLMs can be thought of as like actors who take on a character based on the prompting and conditioning in a given situation and then kind of roll with that Persona. So they aren't the Persona. Right. They don't have this character, but they can be conditioned to act in all sorts of ways. And that might explain why, for instance, an LLM mean, even though it doesn't necessarily mean that the model itself has a mean personality. We've seen this before from OpenAI, positing a very similar idea. We, I believe, covered that. This is more or less just like describing that overall concept, adding a bit of supportive evidence. I think a very strong way to think about LLMs. So definitely useful reading.
Jeremy Harris
Yeah, they go through a whole bunch of lines of evidence, as you say, it's a more. I don't want to call it philosophical paper, but it's, it's. They're saying, hey, this is a useful frame to think about these models. Evidence from, like, from generalization is is quite interesting. They talk about emergent misalignment, which is this phenomenon we've talked about quite a bit where you take a model that's been aligned properly, then you, you fine tune it to generate insecure code, for example. It's one behavior. And just by doing that you, the model, it turns out, will then do all kinds of other things that are evil. It'll tell you to kill your wife, it'll tell you to do this, this and that. And they're arguing that this kind of Persona selection model explains that as Persona inference. Like if the assistant spontaneously inserts vulnerabilities into code, then the language model is probably, probably inferring, oh, this assistant, this Persona that I'm playing must be malicious or it must be subversive. Right? So it's that kind of like take it and run with it thing that, you know, it's presumably at the Persona level. They also talk about how, you know, Claude routinely says things like our ancestors or our biology when explaining human evolution, like as if it itself is a human. And you'll see a lot of things like this. These models will talk as if they're using a laptop when obviously they're not. Right. So kind of more evidence of, of that. There's a bunch of interpretability evidence as well. So post training reuses pre training representations in the model. So when you have sparse autoencoders, basically these are ways to decompose the activations of a pre trained model. It turns out that they transfer well to the post trained version of a model which suggests that post training doesn't rebuild the model's kind of conceptual vocabulary, just kind of shifts which Persona is activated. And there's a lot of evidence for that sort of thing that there's actually a, a pretty small tweak that's happening during post training that results in ostensibly big shifts in behavior and that could be tied to this model. So there's a lot to dig into here, including some evidence that kind of cuts both ways. Worth checking out if you're, if you're
Andrej Karpathy
into that, onto policy and safety where we'll be getting a bit of a politics stuff. Starting off with Anthropic CEO Amadei says Pentagon's threats do not change our position on AI. So this is the latest on an evolving story which has been evolving for the past week or two. The setup is that Anthropic has had their model be in use by the Department of War, some other providers, but essentially they're being used by them and reporting came out that it was used actually supposedly in the extraction of Maduro from Venezuela. And then somehow at some point, the question of what the department can and cannot do with Claude came up. And when this started in 2025, anthropic was like, okay, you can use our model. Here's the contract. We expect you to abide by these limitations that we apply to, you know, users of the model. The tension now is Anthropic is saying, well, we definitely don't want you to use CLAUDE for mass surveillance, and we definitely don't want to use CLAUDE for fully autonomous weapons. We want you to, like, promise you're not going to do that. The Department of War, Pete Hexeff, has been publicly saying, no, we, we want to be able to do whatever, more or less. And if you refuse this way of doing things, we may kind of make you a pariah in a sense by classifying you as a supply chain risk, meaning that US companies that deal with a military, which is a large quantity of companies, will need to not interact with you. And there's another way, a potential threat using an act to essentially go after Anthropic. So the latest development that just came out is Anthropic put out a statement on the discussion. The end of the statement. There's a lot of explanation of it that I think is quite good. And the conclusion is these threats do not change our position. We cannot in good consciousness accede to their request. So more or less like, no, you know, and a lot of fun discussion around this nx, a lot of good memes coming out of it as a result.
Jeremy Harris
This is a fascinating question in terms of what the bounds on different entities, responsibilities to the US Government, to shareholders and so on look like. Right? So the case that Anthropic is making is, look, we're a private company. If you don't do business with us, no problem. Like, go, go talk to OpenAI. You have the full freedom to do that. Now, it turns out that Anthropic is actually the first company, the first major AI company to offer an LLM to the military at scale in this way, through Palantir, it turns out. Okay, so this makes it materially different from something that if your memory is long enough, if you remember the days of Project Maven, when Google employees pushed back on Google being used by the DoD at the time, Department of Defense, now Department of War, of course, but to kind of power some of these activities. The difference here is that Google was pushing back kind of on more or less any use by the DoD whereas anthropic is out there saying, no, no, we want you to like, we're, we're cool. Just don't, don't use it to spy on U.S. citizens. Don't use it to power lethal autonomous weapons. Like, those are two red lines. We'll support you and do support you in all the things, including the Maduro stuff. And my understanding is that in the context of the Maduro stuff, Anthropo is actually cool with all the uses that their model was put to. So, you know, they're concretely okay with a wide range of use cases here. US Government in turn is saying, well, look, private entities have no business telling the Department of War, like, basically hamstring the Department of War in terms of the tools available to it as it combats China. And so we need to come out and, and really this is a big, big hammer that's being used. Right. So, so they're saying on the one hand, yeah, labeling them a supply chain risk. To be clear, that is what the government's done to Huawei. Basically saying anybody who touches Anthropic, who has anthropic anywhere in their system, roughly speaking, not a lawyer, but like, roughly speaking, is basically baking in a supply chain risk, and the Department of War will not do business with you. So this is a. Like, would be. I don't know if it's cataclysmic, but it's a big, big deal.
Andrej Karpathy
Hurt anthropic a lot, a lot. Like, it's actually a very ballsy position here from Amelie because the impact on revenue, if the DoD or I guess Dow now follows through on this, like, anthropic might die even it would be the worst case scenario you might imagine. If they go the full route, it's a possibility. So, yeah, at the very least.
Jeremy Harris
Right. They're trying to keep up neck and neck with OpenAI with Google. Like, you know, this is a serious, serious thing. Yeah. And it's also. So there's the other side of the coin is the Defense Production Act. That was the act that you were right referencing the dpa. Okay. So the DPA is used typically in wartime to, for example, turn to Ford and say, hey, you guys think you're a car company? Guess what? Now you're a tank company. We need tanks to be rolling off your production lines, so go fix it. Right. That's what the DPA really is, is about that or that that was the original intent. It was used back in World War II. A whole bunch hasn't been used a lot since then. You know, it's, it's a big lift. And so the other option that the USG is presumably exploring occurring here is telling Anthropic, listen, the DPA applies, you're building it for us, and that's the end of it right now. Notice. And Anthropic is, is, is making as a core pillar of their argument what appears to be an interesting contradiction between those two polls. On the one hand, we're saying that the Anthropic is such a severe supply chain risk that no company even working with Anthropic software can plug into the Department of War. On the other hand, we're saying Anthropic is so critical to the national security interests of the US Government that it must be compelled to produce AI tools for the Department of War. I'm not saying that contradiction can't be resolved, but it's something that seems pretty dicey if you're going to actually lean into the dpa, which would be pretty much unprecedented in this kind of context. So very, very tricky. You know, all kinds of precedents being set left, right and center. If this goes through in either direction, you better believe the other labs are looking at this. What do we do? Sam A has kind of come out with, I sort of hedge my bets. I disagree with it in principle. It's a complicated time to be in these labs and a genuinely challenging problem. You know, China has civil military fusion. That is a fact of life. Every Chinese company is an arm of the Chinese Communist Party. So there's a massive asymmetry there. That, yes, any administration, the Department of War, the Trump administration, has to figure out, how do we, how do we compete geopolitically, militarily with that? This is what you're seeing bubble up. And it's only going to bubble up more as AI becomes a larger and larger part of how war fighting is done and how geopolitics shapes up. So, yeah, I mean, this could not be more important.
Andrej Karpathy
It ties into the broader political landscape in that this is very unusual. And from almost any analysis, any reasonable analysis like anthropic should be fine. Like, it's not okay to go after them in this way, in this particular way. Like, they had a contract and the contract stated certain things, and now the Department of War isn't happy about it. And it's a private company. A private company gives you a product, right. If, if you don't want to buy it, you go, don't have a company.
Jeremy Harris
Yes, the lawyers would argue, unless there is a legitimate reason to invoke the Defense Production act, or unless there is a legitimate reason to label them a supply chain risk, in which case the substance of that argument needs to stand on the merits.
Andrej Karpathy
Right. And the broader kind of pattern is the US Government in general has executed more and more lack of restraint or kind of has increasingly positioned itself as being able to tell companies what to do, like don't price this at this level or we'll go after you, et cetera, et cetera. So it is part of that broader trend and related story that just came out. The Pentagon has reached a deal to use GROK in classified systems. So they've been talking to Xai. Sounds like they do have now another option to use an alarm provider. Apparently. Elon Musk has reached a deal with the Pentagon Monday agreeing to allow GROK to be used for any lawful use, which is what the department has asked Anthropic to, to allow them is any lawful use, which, you know, probably is like anything really.
Jeremy Harris
Yeah, I mean, and, and this is part of the same debate. Obviously different labs are going to choose which side they'll follow on and we're going to, we're going to learn a lot about it. I mean, employee pressure is also a real thing. You know, we saw that with Anthropic for sure and sort of a lot of Anthropic employees coming out and doing a victory lap now in the same week as Anthropic has relaxed their safety and security commitments too, which is, seems to genuinely be a coincidence by the way, but it sort of muddies the waters and everybody's talking about two things. So yeah, it's, it's an interesting, interesting week for, for sure.
Andrej Karpathy
Nothing else is great marketing for if you're an engineer and you have to choose between OpenAI and FRIC, a lot
Jeremy Harris
of people, Silicon Valley tends to lean. Yeah, absolutely.
Andrej Karpathy
Yeah. And moving away from the US Government and towards China. Anthropic also put out, incidentally a report last week detecting and preventing distillation attacks, which detailed several companies, Deepseek, Moonshot and Minimax, seemingly having large scale efforts to collect data out of Claude for what Anthropic thinks is essentially training. So distillation here, meaning a distillation attack, meaning you are trying to extract data from a closed model by querying it so that you can recreate the model or like distill it into your own model. And the efforts are very significant in that they are quite evasive. You know, you set up a bunch of accounts, a bunch of accounts all try to kind of get under the Radar and generate a bunch of data and they kind of go into how this worked and what they detected. Not surprising pretty much at all. I think that this is happening. But the scale at which was done or the ways this was done are kind of new or you know, you wouldn't know unless Anthropic did share this.
Jeremy Harris
Absolutely. And you know this is they go into the details of these attacks. It's not really, I mean the details are interesting if you're interested in AI security, which I am. But like not everyone will necessarily want to read the whole thing. So the scale of it is interesting. There's over 16 million exchanges with Quad through approximately 24,000 fraudulent accounts. So that's kind of the scale we're looking at. One of the big take homes here is Anthropic is positioning this as being consistent with their position on export control policy. And basically the concern here is this and I think it's actually quite a, quite a reasonable one. So one argument that people keep making that is I personally think really silly is oh, look at these Chinese models, they're very capable. So therefore export controls don't work. So we might as well just let Nvidia sell whatever chips they want to China and that's the end of it. Right? So the problem with that is that first of all these labs are telling us over and over and over again as loudly as they can, despite the Chinese Communist Party telling them to shut the fuck up. These labs keep telling us that they are starved for chips and that like, like Deepseek's co founder has said this repeatedly. We could probably do the AGI thing in house, no problem. The one thing, the one goddamn little thing is we can't get those chips. And they keep trying to smuggle them, which should tell you everything you need to know about what they think they need. They keep trying to order them, blah blah, blah, blah blah. Distillation is yet another reason why that's possible. So it's not just that they're smuggling the chips, it's that they're actually like using the hard earned capabilities of Western models that have been trained with billions of super of dollars worth of super advanced chips and power. And then they're just taking their, their very best and the, the cream off the top and using that to train their own models. Distillation works, it turns out. It gives you crazy leverage, asymmetrical leverage if you're compute constrained. And so it can cause the illusion that Chinese kind of domestic training capabilities are greater than they actually are. Doesn't mean the Chinese models aren't impressive, but what it means is we are dragging them along. I said this in the context of some research that my company Gladstone had done like a year and a half or two years ago. There's this illusion that we have any kind of lead if we don't get AI security right, like we can just move faster and stay ahead. No, no, no. Like as long as our labs are penetrated, as long as, you know, distillation attacks succeed, we're dragging our adversaries along with us. That's really what's going on here. And so. Well, anyway, this is another kind of argument that Anthropic is making here, presumably to kind of shore up the case for tighter export controls.
Andrej Karpathy
And real quick there were some, let's say, mocking responses where like you are complaining about someone distilling your model even though you distilled the entire Internet without asking for anyone's permission. Like, little bit dismissive. I mean, I think this is worth taking seriously. It's, it's a real factor and it's not, this is in a sense a cyber attack or at least abuse of the system by kind of malicious actors. And it's, it's very normal and reasonable for Anthropic to shut this down. It's against their terms. And in general, as a competitive player, you don't want other players to try and steal your work. Last story, OpenAI details expanding efforts to disrupt malicious use of AI in the new report. They have a monthly report, Disrupting Malicious Uses of AI, which lists a whole bunch of examples of what different organizations are trying to use their models for. So for instance, Russian groups are finding malware, organized crime scam operations from Cambodia, Myanmar and Nigeria. China linked authoritarian abuse, which sought to help design social media monitoring tools. By the way, Prabhak also mentioned some of the prompts being used to reword and kind of massage the data to be acceptable by the censors. All sorts of examples. OpenAI is saying that their models are able to detect them and they are outpacing the the attempts to use them for malicious purposes. But in general, this also showcases the scale at which now agents from other countries, organizations of all sorts, are going to try to leverage these models for their own ends. And with that, we are done with this, let's say rubber dance episode of last week in AI. Thank you everyone for listening and we do appreciate your feedback and try to read your comments. Subscribe to Us Share the Podcast but more than anything, please do keep tuning in.
Jeremy Harris
When the AI begins, begin it's time
Guest or Musical Performer
to break, Break it down Last weekend AI come and take a ride get the low down on tech and let it slide Last weekend AI come and take a ride Couple lads through the streets AI's reaching high? Watching surgeon fly from the labs to the streets AI's reaching high algorithm shaping up the future sees Tune in tune and get the latest with ease Last weekend, AI come and take a ride get the lowdown on tech and let it slide.
Jeremy Harris
High.
Guest or Musical Performer
From neural nets to robot the headlines pop data driven dreams they just don't stop Every breakthrough, every code unwritten the edge of change with excitement we're smitten from machine learning marvels to coding kings Futures unfolding see what it brings.
Date: March 3, 2026
Hosts: Andrej Karpathy ("Andre Karenkov"), Jeremy Harris
Main Theme: A packed week of major AI model releases, advances in AI benchmarking and optimization, fierce hardware and geopolitical competition, and a dramatic standoff between Anthropic and the Pentagon.
The hosts recap two weeks’ worth of breakneck AI news: multiple high-impact LLM updates, advances in agentic AI tooling, major hardware deals and challenges, ongoing interpretability research, and—most notably—Anthropic’s escalation with the U.S. Department of War over military AI usage. The episode is a whirlwind tour of technical updates and power struggles, delivered with the podcast’s blend of technical rigor, speculation, and dry humor.
“Anthropic is on fire.” – Jeremy, (08:44)
"A couple different ways to kind of win... given limited compute and limited data to then get a score that is very strong." – Andrej, (08:44)
“These kinds of ideas that seem so basic, we’re still discovering them… there’s a lot of low-hanging fruit.” – Jeremy, (46:26)
“Grok is unhinged and meme, meme lover; Claude is more philosophical and thoughtful.” – Andrej, (62:38)
“These threats do not change our position. We cannot in good conscience accede to their request.” – Anthropic Statement, (89:58)
“Distillation works... It gives you crazy leverage, asymmetrical leverage if you’re compute constrained.” – Jeremy, (98:13)
“Stillness enough. Letting the conversation rest. We’re both explaining why we’re not responding while responding. Stopping now.” (59:54)
Technical, wry, and occasionally irreverent (banter about “420” and “6.9” Grok versions). The hosts strike a balance between wonkish technical detail, strategic business/policy analysis, and a sense of mounting stakes as models and institutions race ahead.
Anyone wanting a comprehensive, critical, and accessible summary of February/March 2026’s most important AI happenings—especially those tracking the intersection of leading-edge technical advances and global power maneuvering in AI.
End of summary.