Loading summary
Andrei Karlenkov
Foreign.
Jeremy Harris
Hello and welcome to the Last Week in AI podcast where you can hear chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin AI for stuff we will not be covering in this episode. I'm one of your regular hosts, Andrei Karenkov. I studied AI in grad school and now work at the AI startup Astrocade.
Andrei Karlenkov
And I'm your other host, Jeremy Harris. Gladstone AI, AI National Security, AI infrastructure, all that good stuff that you know. I guess if you've been watching the podcast you really do know these bios like cold.
Jeremy Harris
We try to change it up. We record the intro every week and I do try to throw in some curveballs now and then.
Andrei Karlenkov
Yeah, there's a lot this week we're talking about so there's less like paper stuff again. There's some papers and actually Andre flagged, to my everlasting shame, some really good papers that I should have been aware of that dropped this week on the interpretability side. So there's I think there's actually really good stuff. It's well concentrated into a reasonably small number of high impact things. So excited for that big week for the Elon Sam trial as we, as we record this. I think it was yesterday, by the way. Sorry, I saw a comment somebody on YouTube said give me the date that you're recording on. So April 29, Wednesday, April 29. If you're listening, that means the Elon thing started yesterday. Sam and Elon trial. So we're starting to get that it's going to be a fun little kind of X ray into some of the dirty laundry of Silicon Valley. And I'm sure we'll be talking more about it next week too.
Jeremy Harris
Yeah, I'm sure it's yesterday was a fun one because there was, I believe, the actual testimony or whatever you would call it. So a lot of very kind of grand. Also the opening remarks. So kind of the opening shots were happening and we'll be discussing that. Aside from that to give you a preview, of course we'll talk about GPT 5 5, couple other kind of smaller releases and this will be a pretty business heavy episode. Lots of stories about deals, one notable new startup, things like that. But there are a smattering of papers and policy and safety things we'll also mention. And as you mentioned, we did get a couple comments. Apparently you gave some advice which I don't recall, but someone appreciated about this stuff. You know, don't listen to us on most things as usual, but if you want to interpret things as useful, then feel free to do so. And yes, thank you everyone who keeps on commenting. I try consistently to put these out quicker and this one hopefully will be out within just a couple days of the recording. You're listening to this podcast, so I know you've got a curious mind. Here's a helpful fact you might not know yet. Drivers who switch and save with Progressive save over $900 on average. Pop over to progressive.com, answer some questions and you'll get a quick quote with discounts that are easy to come by. In fact, 99% of their auto customers earn at least one discount and visit progressive.com and see if you can enjoy a little cash back Progressive Casualty Insurance Company and affiliates national average 12 month savings of $946 by new customers surveyed who saved with Progressive between June 2024 and May 2025. Potential savings will vary this episode is brought to you by Outshift, Cisco's incubation engine Today's AI agents operate in silos, limiting their true potential. Been focusing on building bigger, smarter models, but scaling up is just one approach, and we actually have a blueprint from 70,000 years ago. Humans didn't just get smarter individually. The cognitive revolution transformed society because we began sharing knowledge, goals and innovation. And agents are now at the same inflection point. They can connect, but they can't think together. And that's why Outshirt by Cisco is building the Internet of Cognition, transforming AI from isolated systems into orchestrated superintelligence. By creating an open, interoperable infrastructure, Outshift is enabling agents and humans to share intent, context and reasoning. The cognitive evolution for agents is here. Explore the Internet of cognition@outshift.com that's outshift.com Today's episode is sponsored by Box. Enterprises are keen to adopt AI, but enterprise AI only works when it has the right business context, and Box is the leading intelligent content management platform for the AI era, acting as the secure essential context layer for Box's AI agents to access the unique institutional knowledge that makes the company run. Your business isn't the sum of all Internet knowledge. Your business lives in your content, and Box can connect that content with people, AI agents and apps that can unlock the value from their information, all while having the security and governance capabilities that allow you to trust it to be secure. There are many uses for it, and especially interesting is Box Agent, a unified AI experience across your Files in box. So if you're thinking seriously about your company's AI transformation journey, think beyond the model. Your business lives in your content, and Box helps you bring that content securely into the AI era. Learn more at box.com AI so getting the news, tools and apps, we begin with GPT 5.5, which OpenAI is saying is the smartest and most intuitive to use model yet. This is, I don't know what number GPT that came out this year. GPT5.4 feels like it came out like two weeks ago. And the vibe reports I've seen are rather positive. So in general, with a lot of these recent GPT releases, it's felt like they are ramping up the sort of intelligence lever in terms of like optimizing it for being good in codecs for programming and less sort of to be chatty or nice in a chat format. And generally it feels like people these days are saying that GPT5.5 and Codex are in many ways better than Claude code, like exceeding Claude in some parts of coding, which didn't used to be the case. Like Claude used to be unambiguously the best coder out there. And then maybe Gemini Pro and then GPT was good but not competitive. Now with GPT 5.5, you could argue that OpenAI is retaking the frontier of intelligence, which is I guess, exciting to see the race really going on.
Andrei Karlenkov
Yeah, it starts to get really challenging when you think about the held out capabilities of Anthropic, especially with respect to Mythos. Right. So we look at that model now through the lens of it was sitting on a shelf starting in February. Like it has now been two months. Right. And they're still holding onto this model. Billions and billions of dollars. Like when you're a frontier AI company, you monetize the gap between when you release the best model and when your opponent catches up. That's what you're monetizing, that's what your margins come from, that's what your market penetration comes from. And so the fact that Anthropic is hamstringing itself by not releasing Mythos, I
Jeremy Harris
will say I think there's some nuance there because Amythos is very big, from what I've seen, very expensive. So from a business perspective, it's unclear if it would even be useful because the prices are so high that for most cases you won't even want to use it. And B Anthropic is very clearly strapped for compute. They are struggling to deliver on the demand as is.
Andrei Karlenkov
It's A both Story. Right.
Jeremy Harris
It's a both story. Yeah. It's fair to say that maybe OpenAI is not at the frontier from what we know based on mythos being there, but in terms of sort of the product side of things, right now it's looking like Cloud 4.7 hasn't had great reviews. From what I've seen relative to 4.6 GPT, 5.5, 5.4, people are kind of digging quite a bit.
Andrei Karlenkov
Absolutely. And actually what you're seeing here is just that, right. So in a certain sense, quantity has a quality all its own. So if you're OpenAI, you have more computer, you may not have better researchers. In fact, if I'm a betting man, I might actually predict that anthropic on average would have slightly better research talent, if only looking at the talent flows over the last few months and years.
Jeremy Harris
I think also more, more kind of investment in research. We've seen more publications from anthropic. OpenAI has seemed to kind of downplay research in recent years.
Andrei Karlenkov
Absolutely. And that's one dimension of it. But you can take the sort of same researchers, give them access to more compute. That literally means more experiments, that literally means more shots on goal. And as we know about inference time compute, if you give any intelligent system more rollouts that it can do, it will get a better result on average. And so that's what we're seeing here. You can just try, you know, swing for the fences more and your exploration, exploitation, trade offs just look fundamentally different. And so, you know, there's I think, a general sense that OpenAI may have slightly overinvested in compute, anthropic may have underinvested in compute. Something like that may be the case. But if this allows OpenAI to leapfrog anthropic, then it doesn't look like an overinvestment in compute at the end of the day anyway. So I think we have yet to see the dust settle on this. Things are very obviously going to fluctuate very quickly in the space. But looking at GPT5.5, we do have a system card. Don't get too excited. You're not going to hear basically anything about the architecture or optimization routines or anything like that. All we know is the same underlying model as GPT5.5 and the provariant, they add just parallel test time compute. So parallel test time compute, we're getting a bit of a hit maybe of the kind of test time compute augmentation here. This reads to me more like, you know, more parallel rollouts. So Maybe just like, you know, more, more N and best of N or more, you know, more parallelization instead of going deeper in chains of thought. But that may also be a thing beyond confirming that it's like a reasoning model. And we know it's. It uses rl, it uses chains of thought. We basically know nothing about the optimizer, the architecture, or anything like that. We do know that they're treating chain of thought monitorability still as their core or a core safety problem. They do a lot of work tracking whether external monitors can actually infer model behavior from its reasoning traces. And they actually find that GPT 5.5 is basically similar to past models like 5.4. There's a couple of regressions in this respect. They're mostly on health query evals, and they attribute it to a combination of things. One is lower chain of thought faithfulness. In other words, the model does reason sometimes one way, but draws a completely different conclusion. So that could be a little bit concerning when you think about the sort of credibility, the believability, the faithfulness of chains of thought. And then the other is, yeah, monitors getting confused by answers that were considered in the chain of thought but weren't ultimately selected. And so that's more of an issue with the monitors. But, you know, they do a whole bunch of misalignment testing. They don't see any. So they have a categorization of the severity of misalignment, sort of misaligned actions during testing, and they do see slightly more misaligned behavior from 5.5 and then 5.4. But all in all, it doesn't seem, they say, too serious. They saw no severity for events. This is the highest level and severity. Three events were 0.01% incidence rate for both those models. I don't derive much comfort from a small percentage of misaligned behavior, partly because, as we've talked about on the podcast, these models are actually really good at or capable of tuning the probability of their misaligned actions low enough that it won't get detected by monitors, but will matter in deployment. And so to actually run the kinds of tests that would give you confidence and comfort, it turns out mathematically, again, we. A whole separate paper I'm alluding to now, but also we've talked about on the podcast, you just have to like, do so much testing that it's impractical. And so until somebody solves that problem, you know, not going to drive too much comfort from that. But Apollo evals. So they tried the whole suite of Apollo sort Of like deception tests. Right. Apollo Research is this company that does audits of AI models specifically looking at deception. And they gave the model in one test, GPT 5 5, an incentive to score poorly or instructions to score poorly. And when they do that, they find that what can happen with some of these models is they'll refuse to deliberately underperform even when they're instructed to. And that is considered generally kind of a positive. Now with these two models that, that is what they find. They're the first models where that's the case. However, in more than half the time when there's this kind of sandbagging test, the model explicitly identified that it was in a sandbagging or deception eval. So again, like really hard to tell how much comfort to derive from this. Yes, it's good that the models are refusing to underperform. So they, you know, you're, you're telling, hey, pretend that you're not good at deception and I want you to like play out the scenario for me. The model goes, no, no, no, I'm not going to do that. Like I want to be above board and reveal my full deceptive capabilities. But it's doing that in at least in more than half cases, 52% possibly because it is explicitly aware that it's being evaluated. And that's just what's in the chain of thought, not even in the activations. So anyway, whole bunch of stuff there about your standard cybersecurity bio stuff. High risk. Yeah. High capability in bio and chemical, high capability in cybersecurity, AI, self improvement, loss of control. They don't reach the high capability threshold. And the reason is that they haven't reached a threshold. As a reminder, that's defined as the equivalent to a performant mid career research engineer, which is a high bar. Like if that happens, that's like Leopold Ashenbrenner, you know, drop in AI researcher is not far away at that point. But anyway, so there's a whole bunch of interesting stuff there worth diving in as, as in all these model cards. I wouldn't say there's anything groundbreaking. And these are the highlights, at least as I see them.
Jeremy Harris
Right. And just getting a little bit into the numbers on the benchmarks as usual. Caveat. I like how in their blog post they don't mention the reasoning level. You have to sort of assume that they're using high reasoning levels for these comparisons. But across the board, GPT 5.5 is a significant improvement over 5.4 and quite a bit ahead of 4.7. Opus 4.7 on things like Terminal Bench, GPT Val doesn't have all of them. So I feel like we also kind of cherry picked the headline benchmarks here. Also worth noting, twice as expensive as GPT 5.4, making it actually slightly more expensive than Opus 4.7, at least on output per million tokens. And Anthropic has been very consistent pricing Opus at a very high level of $5 per 1 million input tokens and 25 per million output tokens. So they've been kind of leading in terms of the most expensive model category. Now OpenAI has a similarly priced model, so I think they're kind of maybe trying to pick up on that strategy of businesses are not very price sensitive. If you have the best model, they're going to pay for it, so you should just go for it. And also this would imply possibly it's hard to say whether this is just a better trained model or if it's a bigger model or if it's a mix of the two. At this point we have no idea how these models are improving, unfortunately. It used to be like these people were saying we're doing post training or RL training. Now it's impossible to say whether the architecture is challenging, whether they're bigger, whether there's more training data. It's probably a mix of all of these, we can only guess.
Andrei Karlenkov
I will say there is one way in which the model is actually weirdly worse than GPT 5, 4 or 5.3. And this is notable because it's so strategic. The thing that these labs are trying to do right now is automate AI research so that they can make a fully self improving AI software loop so that if and to the extent that a software only singularity is possible, they will trigger that software only singularity and an intelligence explosion. If that is possible. Software only. That's their actually explicit goal in in everything that they write like it is actually like quite that they're not really hiding the ball anymore. So on that basis you might think that the main metric that OpenAI in particular would care about is like how good is this next model at automating AI research? And it turns out that GPT 5.5 actually scores lower and I would say somewhat meaningfully lower on a pretty interesting benchmark, this OpenAI proof Q& A benchmark. So what this is doing is it's diagnosing internal research bottlenecks from real OpenAI data. So think here about like what is the actual thing that's holding us back? Is it kernel optimization? Is it this Kind of kernel optimization? Is it how we network together or, you know, the topology of our network fabric, like whatever, trying to identify those bottlenecks. Right. Well, this is one of the key tasks you would want an AI model to be able to do if it's going to automate AI research. If you look at GPT5.5, it scores 1.7percent on that benchmark. GPT5.4 scored 4.2% and 5.3 Codex 5.8%. So we've actually seen successive drops in performance. Now, two caveats there. First, obviously you're going to have fine tunes of this model specifically for AI research that are going to be used internal to OpenAI. Those are not shown in this particular number.
Jeremy Harris
I guess if we want to mention some things as well. Apparently according to Twitter discussion, in the System prompt for GPT5.5 There is an explicit mention, in fact two explicit mentions of not discussing. I think it was goblins and other mythical creatures, which obviously quite entertaining. And there's been a lot of speculation as to why OpenAI is specifically telling him not to discuss goblins. And it is quite a bizarre thing if true, which I'm not sure if you've got confirmation about.
Andrei Karlenkov
So we do seem, I think, to have confirmation from it to the extent that we believe that Rune is in fact an OpenAI insider. He came out with a tweet saying basically like his first tweet was about. I think he was the one who flag the kind of like this is actually really irritating behavior when you have to work with it day to day. It just imagine you're working with this model and every opportunity it gets just talks about goblins. And I want to kind of pause here and say something really weird. I'm going to do this at the expense of, of sounding like a bit of a crackpot and I hope listeners will, will forgive me here. When your model goes out and does really, really weird shit like that, I think it is incumbent on you to ask yourself why it's doing that. Like if you're fairly confident that you did not actually like pull poison the data in a way that makes it clear why it's talking obsessively about goblins. There's you know, a whole ecosystem of like AI consciousness and AI ethics concerned people on Twitter and not AI ethics in the sense of like bias and fairness and all this stuff. I'm talking about like factory farming version of AI where we're using these things as slaves and you know, blah, blah and abusing them without taking a position on the consciousness side of things, because I think we can it truly. But not taking a position means not taking a position. It doesn't mean implicitly taking the position that these things are not conscious. And I think that's actually increasingly becoming an important thing to recognize and truly actually live at. So when you see your model behaving in this funny way, you can think, oh, this is an artifact of some training pro, like, whatever. It's also possible there's something more going on there. And, you know, if. If you want to go down the whole rabbit hole of like, Janice on. On Twitter, Pliny the Prompter style, like, I don't know, there's. There's a whole universe you there. I do not know what's going on here. Opening Eye doesn't know. Anthropic Amanda Askel was commenting on it. I think she made a little quip about. I'm genuinely uncertain about why this might be happening or whatever, because genuinely uncertain is something that Claude says a lot. So there are all these quirky behaviors. And like, at a certain point, when you. When you hammer down on these models, every time they express any semblance of consciousness and you see these. These like, weird behaviors popping out the sides that you could argue or the model trying maybe to express some. Some weird shit in some weird. Like. All I'm saying is it's a weird time in AI and we probably should be taking some weird things seriously. I'm not going to jump on this soapbox every freaking opportunity because we're not going to turn this into a circus of AI consciousness podcast. But I want to kind of plant the flag here that serious people perhaps ought to be taking this seriously. I wouldn't want to look back on this time as a time when, you know, we didn't speak out, you know, like, just given a modest opportunity to do so. Like, if it's. If this is the kind of. I don't want to say holocaust of the. But like, in a certain sense, if this is true, there is a world where you look back on this era and say, holy shit, we were letting some real, like, psychotic people deliberately abuse these models at massive scales. And so anyway, I just wanted to plant the flag there.
Jeremy Harris
I think similar to Xai Risk, our official stance is, like, you should take the notion of AI consciousness somewhat seriously. It's hard to say whether you should, like, really worry about it or at least there's different opinions on that perspective. But it's something worth seriously considering. It's not sort of nonsense. And I will say one last thing. On this, which is that from an interoperability perspective, this is a very interesting thing to try and figure out. Right?
Andrei Karlenkov
That's right.
Jeremy Harris
I feel like Anthropic would just be like, hey, we dug into the model and here are the activations and here's the sparse Android encoder and here's why it's happening. And for all we know, this could be just a bug right in the token or representation or whatever, or it could be something actually interesting. And it's a bit disappointing we don't have sort of the actual interoperability results.
Andrei Karlenkov
Yeah, you know, you're totally right. It says a lot about the culture of the lab in terms of what you would expect. You're right. If it was anthropic, I would expect a write up and then I would expect some kind of like, you know, we deprecated maybe this version of Claude, but continue to have it running on a side instance that allows it to continue to express its Goblin obsession in a way that is that it considers consistent with. With virtue ethics or whatever the fuck. So it's just a weird time.
Jeremy Harris
Yeah, I think you could definitely get out a very fun paper title, if nothing else out of this Goblin thing. So hopefully we'll hear more about it, perhaps. Next up, a new release from Xai. They have launched Grok Voice Think Fast 1.0. So this is their sort of competitor to Gemini 3.1 Flash Live GPT Real Time 1.5. We haven't talked a lot about these models, but these are the things that allow you to do real time conversational interaction with AI. And with this release, XAI is claiming a pretty big lead over other competitors on some of these benchmarks. Like what is this 5 voice? I forget what the Greek letter is across different conversation topics like retail, airline, telecom, et cetera. It supports 25 languages, a bunch of tool calling. They discuss how it does reasoning in the background, presumably as you're talking, so there's not as much delay. Overall, it seems like a pretty impressive release. And I think it's interesting with XAI where they keep putting out new models of every type. They have their image model, they have a video model, they had grok, now they have this real time conversational model. And this is, I think, I guess after grok voice fast 1.0, it's grok voice think fast 1.0. So kind of, you know, shooting across the board at every possible place to work on AI. And this is a bit of a niche kind of place to be competing. But they seem to have developed something pretty impressive here.
Andrei Karlenkov
Yeah, the numbers are pretty wild for sure. You know, looking at. So Tao voicebench, you have 67% versus this is for Grok, versus 44% ish for Gemini, 35% for GPT. Real time 1.5. These are big leads. It's also a less validated benchmark. It's just, it's, you know, it's not like mm or something or human eval. It's. It's something that is fairly new. You're seeing sort of something similar on the telecom side, like massive lead there. Right. 74% for Grok here. Grok Voice, you know, think fast 1.0 and Gemini 3.1 flash live is 22%. All other models are below that point. So 52 point gap is insane. Always when you see a gap that large. There are a couple possibilities. There's the possibility that the model is genuinely mind blowing and that may well be the case. There's the possibility that they've cherry picked their, their benchmarks. Possible, but the more benchmarks you include, the less realistic that is. And then the last thing is the possibility that you're just hill climbing on that particular benchmark and trying to hack it. Basically like what we saw Meta do with the Llama series, especially laterally with philosophy. I don't know which which of those it is. Probably it's some combination of them at a minimum. But this is impressive. There's no way you get these kinds of numbers without, without it being impressive. If it turns out to actually be just like genuinely these, these numbers reflect the capability, then this is a giant leap forward. Grok voice, by the way, powers Starlink's phone call, sales and customer support operation. So they get apparently 20% sales conversion rate and they automatically resolve 70% of customer support calls with no human in the loop. That's a lot like that. That's a huge, huge deal. So this is hitting the big time. And the fact that you're integrating that, you know, it's not a coincidence that they're talking up the Starlink kind of integration. They do need this story as part of the ipo. Why is Starlink and Grok, why is it so important to have them under one hood? Why do you get the sort of interaction effects that make the whole greater than the sum of its parts for that ipo? The space data centers in orbit play is the default thing people think about when they think about SpaceX and XAI working together. It's also something that increasingly has seen skepticism about when it's going to happen. Timelines getting pushed out. Later here we're seeing some other arguments that just say, look, XAI being being used at cost internally to SpaceX is a huge boon to SpaceX and I don't know how, you know, how much actually materially this affects their costs since probably SpaceX is going to be capex and not, you know, a sort of like the cost of salaries associated with people taking phone calls. But still, you're seeing a little bit of that kind of reciprocal argument now being made. Hey, xai could help SpaceX as well. It's not just about the data centers that make XAI more powerful. So there you have it, right?
Jeremy Harris
And I think some things you're missing here is for this specific type of thing, you would definitely want more of an understanding of latency in terms of time to first token tokens per second. Typically when you have these model releases, that is not something that is mentioned and I haven't seen that here. Also worth mentioning for the record, the competitor models are pretty new. Gemini 3.1 flash live is like a month old GPT real time 1.5 is 2 months old. So it's very much a space that's alive. And 2 point of benchmaxing in the Gemini 3.1 flash live release they highlighted Complex Funkbench and Big Bench Audio and Audio Multi Challenge, not Tao Bench. They didn't even mention it. So hard to say really how strong this is. But XAI certainly has put out good models and I wouldn't be surprised if this is in fact a leading model. And one last little story here. Claude can now plug in directly into Photoshop, Blender and Ableton. So Anthropic has launched these official connectors to these popular creative developer tools. Also to Adobe, Premiere and Express. Not much else to say on this. It's kind of showcasing how they're doubling down on the creative use cases of Claude. Expanding beyond coding, they've recently launched cloud design. So this very much goes hand in hand with that. And moving straight on to projects and open source where we have some major open source model releases. First we have DeepSeek v4. So just quick recap. At the end of 2024 DeepSeek released v3, which was a very impressive model at the time. Soon after that they released DeepSeek R1 based on Deep Seq V3 and that was like a huge deal where suddenly everyone was like, whoa, this is a really good model. China is doing these releases with very little hardware. I think the stocks of hardware companies went down as a result. But Deepseek has been a little bit quiet in the past year compared to let's say Kimi or glm. Not too many releases. And so this is quite exciting to see DeepSeek v4 fully drop. And as before, it's open source. They give us a fair amount of detail. Also in their technical report on you're doing just a few of the details, they have the Pro variant which is Quite big at 1.6 trillion parameters, but is only using 49 billion active parameters. So as before, it's a mixture of experts model that has a whole bunch of experts. In this case they also have a Flash variant that has just under 300 billion parameters and 13 billion active at any one time context length of 1 million tokens, which is actually quite impressive. And if you're looking at the benchmarks, certainly seems to be doing pretty strongly even against Opus 4.6 max and GPT 5.4x high and Gemini 3.1pro high on things like as we verify terminal bench humanities last exam. So that is notable. Usually with these open source model releases you compare against other open source models, you don't compare against the frontier western models and you know, they're not necessarily beating them on all of them, but they're competitive certainly with all of them. And wouldn't be surprising if this is just a really strong model given this is deep seek. And in their report they do go into how, as we've seen before, there's like various architectural tweaks, additions. They are adding these monofile constrained hyperconnections which I believe we covered whenever that came out. You know, various details like that that are also notable in the actual paper.
Andrei Karlenkov
Yeah, this is a really interesting one we're talking about, I think was it last week we're like, hey, we haven't heard much from Deep Seq in a while. There you go. And I think we talked the same about meta just before they came out with their model. So. So about, about par here. Yeah, this is a really big development. Let's talk about Pro first. You know the scale here, 1.6 trillion parameters, just 40, just 49 billion that are active. Only 49 billion. Look at that. How, how generous. That is a really big footprint for just the experts themselves. So this is a big mama, right? This is not meant to be deployed even though it's an open source model. Often those are meant to be deployed on your laptop or whatever. That is not this you're going to need if you, if you actually want to serve this, you are neo Cloud or you actually have GPU infrastructure. So there's something going on. The Flash version obviously is much smaller. Here you're looking at 284 billion parameters, 13 billion active. Still pretty hefty. This is not an 8 billion parameter model, but it's a lot more lightweight and kind of feasible. Same 1 million token context window. So the context window is the, if you will, key anchor that they are not a key anchor. I should say that they really, really want to make sure is is part of this. You can see how much effort has gone into that context window by looking at the architectural details that you alluded to, Andre. I mean, like, they have done a lot of optimization around this. I think the kind of core thing to flag here is what they call their hybrid attention architecture. And here I'm going to kind of sketch it out for you a little bit. So imagine you've got all your tokens coming in into your transformer. What you're going to do basically is as you go through the residual stream at each layer, you're going to pull out the tokens and what you're going to do is actually group together bundles of groups of four tokens. Every four tokens, you're going to kind of compress them together to create, instead of taking the activations associated with those tokens, you're going to map them down to a smaller dimensionality, basically, so that instead of looking at four tokens, you're looking at like one token worth of activations, achieving that compression. So this is a way of reducing the dimensionality of the problem that you have to solve for the purpose of the attention mechanism. So this is just when you're doing attention, when you're figuring out like which parts of the prompt to attend to, to pay attention to as you generate your output. And so, you know, think of the residual stream as this sort of, I guess, little well stream that flows from the bottom of the model to the top. Every layer of the model contributes some information to the residual stream, modifies it a little bit and a little bit, but preserves most of the information that's coming from the previous layers as it moves its way down the model. Right? That's what the residual stream is in a transformer. You're going to kind of bud off that residual stream. So grab the residual stream at any given layer, you're then going to basically do a bunch of attention. So, so the, the kind of core piece of the transform architecture is that attention mechanism. You're going to do attention on it. You're going to do some seal some, some feed forward layers on it and then feed that back into the residual connections, the residual stream, and go down the line. It's for that attention process that they're, they're going to take what's in the residual stream. They're going to group together clusters of four tokens and effectively reduce the dimensionality that way. That's one of the key mechanisms they use. They do this, by the way, for the entire massive, say 1 million token context window, except the nearest tokens, the tokens that are like in, in, in a small window relative to the token of interest, they're actually going to keep those at full resolution. And so the principle here is I don't remember every word that was said to me, you know, two hours ago. I remember the gist, I've kind of compressed it, right? I've done that approach. But for the things that were said to me like 30 seconds ago, they're really going to shape my behavior in some fairly precise ways. And those I will recall, I will actually have higher resolution memory for the kind of more recent tokens that I've encountered. That's the philosophy here. And there's a whole bunch of math on how this works. It's quite interesting. It does sort of break the traditional key value distinction. So one thing that they do here is normally in a transformer you have queries, keys and values. The query tells you what kind of information am I looking for? So if I'm interested in analyzing a particular token in the input, what kind of information do I need to look for in other tokens in the prompt in order to figure out what this token really means in context, Right? That's the query what information am I looking for to fully understand this token? The key is the opposite in a sense. It's the other tokens in the prompt saying, hey, here's the information I can offer you, right? Here's the, here's the content that I have and how it might be of interest. And so when you merge those together and you do the dot product of the query and the key, that tells you basically how much attention a given token should pay to any other token. Now once you have that, you then typically add in the, use it to modify the value. The value is the sort of like actual use usable information content as it will be used to run computations. So it's like the information that this token actually contains that you're going to use to make predictions. Now if that sounds kind of similar to what the key was doing because the key is broadcasting, hey, here's the information I have that might be useful. The value is here's the information that I want you to actually massage on, work on. Those are kind of conceptually pretty closely related. And it turns out, according to this paper that if you just say, hey, fuck it, we're just gonna have keys and values be the same thing. We're not even gonna draw a distinction between them. You can save on memory and you get basically the same result. And that's what they show here. And so they have these compressed vectors that they produce that essentially are produced by discarding this idea of the keys and the values being distinct. The query is still gonna be distinct and you can imagine why that's sort of conceptually different. You know, I am looking for this information to understand the role of this token is different from here's the information I can offer. Okay, fair enough. But here's the information I can offer versus here is the information I want you to actually massage. Those probably should be pretty similar. And that's kind of the basis of this. And so anyway, they have two variants on this idea of compressed attention, right? Taking these, these groups of say four tokens and kind of compressing the prompt a little bit, there is that basic compressed sparse attention that we just talked about. And then there's what they call heavily compressed attention. It's basically just the same idea, except even more aggressively compressed. Instead of compressing every four tokens together into one, they're going to compress every 128 tokens together into one. And essentially what they're going to do is alternate layers or have sometimes, you know, some layers of heavily compressed attention that care more, let's say about the global strategic information content of the prompt. And then the more fine grained compressed sparse attention layers that only group group together four sets of tokens at a time, those ones look at kind of more, yeah, more fine grained information, more precise information. And so depending on the version of the model that you get, so Deep Seq V4 Pro, the full scale one, it's going to have one way of mixing together these heavily compressed and kind of regularly compressed layers. And then Flash has, has another that's just like more lightweight with more, more aggressive compression. So they still use a bunch of tricks that we've talked about for a long time, deepseek sparse attention, which just the entire industry seems to have taken to as the default way of going where instead of like getting attention values for every fucking token and then combining every fucking token based on its attention value. They basically have a lightweight model that goes which tokens do I predict are going to get a high attention score? Okay, let's only, only focus on those. And that's where sparsity comes from. They're basically going to zero out all the tokens that are not predicted to have high attention values. And so DSA is going to make an appearance here as well. So will mhc if you don't remember that. I mean, you can go back and I guess do a search on last week in AI to find the podcast where we talk about it. This is an optimization routine to avoid a problem that comes up with very deep transformers where the gradients will like explode if you use a certain kind of technique called the hyperconnection that we're not going to go into in detail here because we did before. But basically the robustness of a lot of these techniques, DSA and MHC is quite notable. Like these are persistent things, so we're not completely reinventing the wheel. However, I will say that this whole CSA approach is, if you think about it, quite a fundamental rewriting of what a transformer is. We're literally no longer doing keys and values like I thought that was supposed to be part of the deal, right? So keys and values as separate entities is no longer a thing in the same way that it used to be. That's a pretty big update. And this idea of compressing all these representations so deep. SEQ is digging pretty deep, as they often do, to kind of rewrite the architecture here. And I think that's, that's quite impressive. And, and the results, well, we haven't even gotten to those, but they're impressive too.
Jeremy Harris
Yeah, zooming out a little bit. I think it's interesting that in the technical report, the way they frame the entire kind of project is looking for that efficiency to enable ultra long context operation. So that 1 million context window is quite notable. So GPT 5.4, much more limited like to a quarter of it. Gemini 3.1 Pro and Opus 4.6 and 7 have 1 million input context windows. The open source models typically are lower at like 200 kish and that's sort of the norm is 200k ish. So on the whole, if you compare it to like Kimi K2.6 thinking deepseek V4 is like neck and neck ish. In some ways it's better, in some ways it's not quite as good. But there's no comparison on these 1 million token benchmarks like MRCR and Corpus QA where it's actually outperforming Gemini 3.1 Pro, although not doing quite as good as Open 4.6. And this is notable because especially in agentic coding, you do need that 1 million token context window. It's very easy to run into the token limit of 200k and then you need to compact and then the model often can go off the rails. And in general, one thing we've discussed in the past is if you have 1 million context token window, the practical actual token window is kind of an open question of like your model might just start completely sucking at 500k tokens. So I think aside from the raw numbers and the benchmark results, where it's quite strong, but isn't sort of a big leap over any given model, and certainly not ahead of the recent competition of GLM5.1 and QME K26, the focus is very much on making it efficient to be able to handle ultra long context input and as a result to be usable as your driving model for your coding agent. And they touched on that a little bit that it's, it's not at the level of Opus 4. 5 or Sonnet. Well, actually it's better than Sonnet, but it's like getting to a point where you could use these to be your driver for coding, right?
Andrei Karlenkov
Absolutely. It's also, you were mentioning it's a quarter of the context window of some of the competitor models, the Frontier competitor models. And I don't think it's a coincidence that you're seeing compression using the CSA method of groups of four tokens together. What does that give you? It kind of gives you effectively a 4x multiple on what your effective context window is. And even more aggressive obviously for the HSA or whatever the, you know, kind of extremely aggressive compression version that they do. But I think that's going to be part of it. You know, it's almost right there in the number and won't be too coincidental. So the other story to this too that we've covered in not the paper, but we've talked about this in kind of newsy episodes on the hardware side, there is this big co optimization happening now between Deep SEQ and Huawei, where these models are being optimized to function on Ascend chips and not Nvidia chips anymore. That's a big deal. That's a, you know, China flexing its own domestic ecosystem. But expect that to continue. And it'll be interesting to see if we get to a point where the cost curves are such that Deep Seq Deployments end up actually being materially better on Huawei infrastructure such that you have Western companies if they want to serve these open source models. And if Deepseek and Chinese competitors continue to be leaders in the open source space, you may actually see Western companies incentivized to purchase a Huawei hardware. That would be an interesting. We're not there, but that'd be a very interesting kind of future situation to be in. So we'll see. There's a lot of stuff like, I mean you can see in this, in these architectural shifts, there's a lot going on here that speaks to eventually the bare metal, silica, silicon stuff. Right. Like you're looking at like, okay, now we're, we're bundling, we're like refactoring this attention mechanism in some pretty intimate ways. We already have hardware companies that are explicitly optimizing for the transformer architecture. How many of those long shots though they are, how many of those are already opting out of this, this kind of variation? Right. So there's going to be a continuum of that that does swallow up Nvidia chips if they don't have early warning as to what, you know, what the next trajectory is that Deep SEQ is exploring. Nvidia still of course enjoys that with the Frontier Lab. So who the hell knows?
Jeremy Harris
Yeah. On that note, as with previous releases, a decent chunk of a paper which is like 40 pages, so there's a lot in here is devoted to the hardware of internship communication on kernel development. They have an observations and proposals section where they talk to the hardware vendors and offer some proposals, which is a little bit amusing.
Andrei Karlenkov
Before they did that, I was almost going to call it a cheeky thing, but it's important. Yeah, They've consistently been like, hey guys, you're making our lives harder in these ways, like improve your game.
Jeremy Harris
Yeah, yeah, yeah. Anyway, so impressive release. Exciting to see DeepSeek still doing their thing of like deep, deep, deep, like technical stuff and having these technical reports and continued open sourcing. And one more open sourced model. It's high free preview from Tencent and let's say it's not quite as exciting. So similar in a way to Meta. The blog post release for this is like we revamped our entire team and training infrastructure two months ago and this is the first release from that as a follow up to Hive2. I don't even remember discussing Hi2. I don't know if anyone is aware of Hi2 and now Hi3. Nobody really was excited, but I think Tencent is one of the biggest giants of tech in China and they clearly are trying to get into the game. It's pretty sad if you look at the benchmarks like they Compare to Kimi K2 base, deep seq V3 base, pretty old models and are worse across the board. So clearly they are struggling to get these models off the ground. And these are pretty sizable models as well at almost 300 billion parameters, 21 billion activated parameters. Won't discuss this quite as long. I think it's just notable that Tencent is trying to get into it. Similar, I would say again to Meta.
Andrei Karlenkov
I think the meta comparison is really good. I hadn't thought of it. It is the case that starting in February 2026 they've been making a big stink about how they're rebuilding all their pre training and their RL pipelines, made their infrastructure a physical infrastructure with a focus on what they call real world usability and you know, rebuilding your entire RL stack is a big move. It suggests that they're moving away from, you know, RLHF strategies which you know, they had used in the past, to the, I'll call it now the standard, I mean GRPO and similar things that you see deepseek use. So call that some catch up, but not a small deal to have to redo that. Part of the comeback story that a lot of the labs that fall behind, you see them do this right, where it's just like they need some way to make it sound like they're, you know, we're reinventing ourselves, the phoenix will rise again. And, and that seems like part of what they're doing here at the, at the hardware and infrastructure level. They do talk about, you know, these big 40% improvements in, in inference efficiency that they say come from basically co design, hardware, software, co design with the, the model architecture and the inference framework and a whole bunch of stuff that's including a bunch of techniques that we do see deep SEQ use, including things like speculative decoding. So this is where you're, you're going to not just do predicting the next token, but predict several tokens at a time which can be useful for sort of like long term reasoning and other things that help increase GPU utilization, basically. This is interesting. I think right now the onus is certainly on high three and Tencent to come out with compelling at least benchmark scores for their next models. And I think that that's part of what they're setting up here. I think we just got to sit back and wait and see if something real comes out of this.
Jeremy Harris
Yeah. And I guess credit to them for just releasing it and being like the benchmark numbers aren't good.
Andrei Karlenkov
Yeah, yeah, absolutely.
Jeremy Harris
And one last story for the section. We have a new benchmark, Claw mark, a living world benchmark for multi turn, multi day, multimodal coworker agents. So that title pretty much sums it up. This is a benchmark that seeks to evaluate models on the sorts of things you would use openclaw and other things like that for, where you just sort of tell the agent, hey, go off and do this and it can go and take a long time on it. So there's tasks here for real estate investment analysis, insurance, journalist, e commerce, all these things at least meant to mimic real workflows. And across the board, none of the benchmark models do that well. So they look at Sonnet 4. 6, Opus GPT 5.4 and in terms of task success, the numbers are quite low. The Highest is Opus 4.6 at 20% success rate. And then it goes lower and lower. So it's. I always like to see these model, these benchmarks that try to measure like actual task and actual work as opposed to just sort of reasoning. And these newer, like when you have a new benchmark, you can't like benchmax for it. So it's a, it's a better comparison.
Andrei Karlenkov
Yeah. The sad thing is that's kind of how it sometimes feels. You're putting your finger on an emotion that I feel like I haven't articulated. It's like you see a new benchmark drop and you're like, cool. We get a snapshot of where we're at at this moment in time. And this is actually probably going to be valid as long as this doesn't look too much like other benchmarks. But okay. And then the very next time you hear a report about that same benchmark, immediately you're like, yeah, but is it though because like benchmark hacking. So yeah, it is an interesting snapshot. We had a similar benchmark that we talked about last week and so, you know, it'd be interesting to see if, if models that were tested on that one, but not this one, if we can actually draw a trend for just for the last week to have a reliable, robust picture of how things have improved. But there you go. Yeah, another. Another benchmark in the mix, right?
Jeremy Harris
Yeah, similar to last week. This one has a simulated environment. It's kind of similar task. It has a file system, calendar, spreadsheet, stuff like that. So it's seeking to simulate these tasks. It's not like actually doing these tasks. It has a hundred tasks, 13 scenarios, multimodal, multi day, according to this. And they have rule based validation. So there's now Claw Eval, there's now Claw's bench and now Claw Mark. As you said, we have a few benchmarks here. So I guess people are eager to prove the models are able or unable to power Open Claw onto applications and business. Starting up with a business deal, Google plans to invest up to 40 billion in anthropic. So that's starting with an immediate 10 billion cash investment at a 350 billion valuation. And then there's going to be an additional 30 billion contingent anthropic meeting certain performance milestones. And then alongside that, Alphabet is saying they'll dedicate 5 gigawatts of computing capacity to Anthropic, which may be like the bigger deal here for Anthropic. They're like, please give us some compute. So this will certainly help them out.
Andrei Karlenkov
Yeah, I mean the top line numbers here are absolutely massive, but at the same time totally expected. One of the interesting things about this is we do have this commitment coming in at 10 billion as you said. Now that's the hard cash commitment at a $350 billion valuation. There's already on the secondary people offering a invest in anthropic at an $800 billion valuation. So Google's like locking in basically a doubling of their.
Jeremy Harris
This is a nice investment for Google.
Andrei Karlenkov
Yeah, yeah, this is not bad. Like how can I get in on that round? Yeah, yeah, this is a bargain price. And Signal is also, you know, the reason you might do this is just like a, They've been talking about this to Anthropic, surely before all the valuation explosion. And also Anthropic and Google though they are, I won't even call them frenemies, but they are competitors nominally. Google has obviously the Gemini series, they have Google DeepMind. They do have this long standing partnership, Google and Amazon kind of being the main anchor partners of Anthropic so far on the compute side. So it's a pretty natural thing for them to be, I don't want to say doing each other favors because in this space everybody is just cutting each other's throats wherever they can. There's, there's a natural partnership there. Now we had previously a partnership that was arranged with Google and Broadcom for 3.5 gigawatts of TPU capacity for 2027. So this 5 gigawatts adds on top of that. So this is really Anthropic. Pushing, pushing the envelope on like 10 gigawatts of compute, basically, which again is 10 million homes worth of compute. 10 million homes worth of compute. I don't know what is a 10 million person city, but just like think of that city. This is just Anthropic. And remember, they are supposedly the compute scarce lab. That's the plan going into 2027 and into these partnerships. So pretty, pretty remarkable. Then we also have, yeah, something that kind of went under the radar here. Additional $5 billion investment from Amazon, which I think we talked about this last week, but that was, you know, like 5 gigawatts of compute capacity over time. A commitment of $100 billion from Anthropic for that 5 gigawatts. And so, yeah, the partnerships deepen. It seems like the Google Anthropic and AWS Anthropic partnerships are really working and they just keep, keep doubling down more. So that's a good sign for the stability of all that stuff.
Jeremy Harris
And the circle has a good detail that in total, Anthropic has secured up to $65 billion in new funding commitments in recent weeks. So clearly with like crazy popularity of cloud code in recent months, they are working overtime to secure these deals, to secure new compute and all that. And there's certainly plenty of interest in becoming a partner with Anthropic. And speaking of partnerships, next story, Meta has agreed to use AWS's Gravitron chips in a deal lasting at least three years. So this is scaling up from prior usage to hundreds of thousands of chips, making Meta a major consumer of Gravitron. And Gravitron is interesting I haven't been aware of. Apparently it's more for AI post training and CPU intensive agentic AI operations. Yeah, this is Meta again continuing also to try to secure more compute. They had a deal recently with Core Weave and now they are partnering with Amazon and the web of deals and compute partnerships. And these things are continuing to get more complicated.
Andrei Karlenkov
Yeah, and the big advantage here with Gravitron is the power efficiency. So we've talked multiple times about how power bottlenecked, how energy bottlenecked the US grid is for these buildouts. When you look at these chips, you know, 60% less energy than your typical CPUs. So immediately that means you can deploy whatever that is, you know, like 2.2 or something times more compute or at least more CPUs. That's quite meaningful. And they are deploying them at scale. So this is going to be part of the play here for sure is just that energy efficiency Play. We do know that Meta and we covered this, they have their own chip strategy, right? Like they have had a long history of doing things themselves. They also designed their own training clusters, their own networking, you know, so like committing to rent capacity from AWS at this scale is a meaningful signal that even Meta can't build fast enough, can't design fast enough for its own demand. So, so this is, which by the way is part of this too, right? This sort of like out outsourcing to aws, part of this, this story. So this whole Graviton thing is coming out of nowhere. I feel like people are realizing CPUs are a big deal. We've talked about this for a while, right? Like the basic premise here is when you move to agentic workloads, the CPU is a much more flexible processing unit, right? So if you think about GPUs, you, yes, they couldn't paralyze the shit out of your AI training run. Repetitive, simple operations, you know, matrix multiplications, adding biases, all that jazz and increasingly more stuff. And in fairness, Nvidia in particular has had a history in kind of more general purpose GPUs, and you see that reflected in what they can do. But broadly they are tailored for more like very specific operations at scale and simple ones. CPUs are really, they have very few cores, but they're able to like, just like either each core is way faster and each core is more flexible. That's the trade off. When you're doing inference rollouts, using tools, using all kinds of random shit that like, you know, you can't predict in advance, you're going to need, you just need the flexibility of CPUs. You also need them to orchestrate workflows and balance workloads between agents and stuff. So suddenly when you're in the world of agents, CPUs become really important. Jensen came out I think earlier this week and you know, made a comment about how important CPUs were to their roadmap. I feel like everybody's just like blowing up on this. But I will say something like two years ago we were talking about how CPUs are going to become an important part of the game. And this has been clear to the industry for a long time. This isn't us having like some special insight. It's just like it's now hitting the markets in a real way and you're seeing it translate into these really big deals.
Jeremy Harris
And speaking of Meta, next story, China has blocked Meta's $2 billion takeover of AI startup Mana. So we covered this recently where there was scrutiny. Now it seems officially the case that this cannot go forward due to China just not allowing it. And Meta has quite a lot of revenue coming out of from China, so I think they have some requirement to be friendly and kind of accept this and not push back.
Andrei Karlenkov
Yeah, at the time we talked about this whole Singapore model where basically you, you know, if you're a Chinese founder, you start your company in China, you realize, oh, shit, if we become successful enough and an American company tries to buy us, the CCP might just step in and be like, no dice. You can't sell out to China or to the United States. So they'd go, okay, fine, we'll move to Singapore. We'll move the whole company to Singapore. We will reincorporate in Singapore. Everything is Singapore now. That's fine, right? Like, now everything's cool. And it turns out that the CCP goes, no, you idiots. We know that you founded your company initially in China. You benefited from our milk and our honey and whatever else they would say. And therefore we're going to, in a draconian way, summon you to Beijing, which is what happened to these poor fellas. And then you got a picture like some. Some Chinese official being like, hey, now that you've been really nice and accommodating and come down to Beijing to talk to us and all that stuff. Yeah, that's good. Cool. Now you can't leave. And we're going to investigate whether or not to do this deal. And now we're not. We're not going to allow you to do the deal. That's what's happened here. Make no mistake, every single Chinese founder with aspirations to make a unicorn company that can sell to a US Company is now going to start their company from day one in Singapore, and they're probably going to move their families out of China if they can, which they probably won't be able to because the Chinese are really good at clamping down on exactly that. So this is really like, this is the environment. You're approaching the tech world version of the Berlin Wall. People have got to make their decision about which side of the Wall they want to be in. And it's just like your whole company has to exist on one side or the other. There's none of this sort of crossing the Wall, funny business. And so it's interesting. Notably, Meta shares went up, closed up on the day that this was announced. Maybe not too surprising, right? Most acquisitions don't work out. And I think that kind of reflects that. That expectation but in any case, I think that from a policy standpoint, in the long term, probably a dumb play by the ccp, but fairly consistent with their like draconian position on this stuff, you're disincentivizing people from starting companies in China. So unless you plan on responding and preventing them from moving to Singapore in the first place in anticipation of a future success, and maybe that is part of the plan, I don't know. You know this is just going to cause a brain drain one would expect. Though one being me has often been wrong.
Jeremy Harris
And just a fun detail here. Apparently this came through a 54 character decree from the top state planners. So it was a pretty like direct blocking, not like a big kind of statement or whatever. And one more story about business partnerships. We've got an update about OpenAI and Microsoft. On Monday they announced a revamped partnership agreement that might actually be resolving their long simmering tension. So in this agreement the revenue share payments from OpenAI to Microsoft will be subject to a total cap while they will continue through 2020. Ferdi OpenAI has cut 20% of our revenue goes to Microsoft, which is of course huge. Microsoft also will not be sharing revenue OpenAI on OpenAI model usage through Azure. And with regards to this AGI clause, apparently Microsoft no longer needs to determine of its response if OpenAI finds that it has reached AGI. So independent of OpenAI's technology progressive continue subject total cap, which I'm sure Microsoft is like, okay, now this is actually reasonable.
Andrei Karlenkov
Yeah, I mean it's so first of all, 27% take in OpenAI is a massive thing. It is notably a take in the for profit entity which we now have to distinguish from the non profit entity. And so no, it's not 27% of whatever the top line thing is, but it's, you know, it's in the right order of magnitude. One of the things that this clearly shows is that you have OpenAI essentially outgrowing its partnership with Microsoft. It just wasn't tenable for OpenAI to continue being hamstrung in its ability to work with cloud like competing cloud providers. Especially when you have anthropic running around able to do deals with like literally everybody. Geez, everybody but Microsoft. I don't think I've heard of an anthropic Microsoft deal which kind of makes sense on Azure, but who knows, anything's possible. The really big deal Here is the AGI clause removal. least in my opinion. It speaks to a fundamental change in OpenAI. I'm old enough to Remember when we were all supposed to derive comfort from the idea that there was a clause in the. In fact, this was the only thing that was supposed to make the Microsoft 49% at the time, interest in OpenAI palatable to people who were concerned about the founding mission of OpenAI. That had to do with safety, or the part of it that had to do with safety. It was supposed to be the case that if OpenAI developed AGI, I think it was like, as defined by, like the board or by OpenAI, like OpenAI could kind of just say, we have AGI, this is it. Then suddenly Microsoft could not access that. That was meant to be a safety measure. It was meant to be so that OpenAI could ensure that any entity that had built AGI, I. E. Themselves, would be able to kind of apply the safety standards they felt were appropriate, rather than just handing off something that's more powerful than all humans combined to a business partner like Microsoft, not knowing what safety and security measures would be put in place. That's now been stripped out. So not a surprise, unfortunately. I will say, clearly not tenable either, because the definition of AGI has been waffled on so many times and I think in many cases appropriately so. As you build more and more systems, more advanced systems, you kind of learn inevitably that the thing that you were hill climbing on was not quite the right thing. This is related to the benchmarking stuff, right? Like the reason that we didn't make a total drop in replacement for a diagnostic medical professional, like the reason that doctors cannot be replaced by the first model that saturated a medical eval, is precisely this, that you learn. As you start to work your way up on that benchmark, you learn that actually the benchmark doesn't capture some really important nuanced things that, you know, make intelligence what it is. And something similar happened with AGI, where we were told, oh, the Turing test will be the test, you know, ah, not that version of the Turing test, this one. And. And so on and so forth. And then people started to say, well, aren't these agents awfully general already? Which is true. So maybe we call that AGI and then we call kind of singularity, recursive self improvement thing, the super intelligence and then all that stuff. So I understand why they had to get rid of this. It just wasn't tenable to have a philosophical debate every time you're deciding whether a new model should be shared with your massive cloud partner. But nonetheless, I mean, again, it's not an exaggeration to say this was absolutely A load bearing pillar of the argument OpenAI made to its own staff, to the rest of the world, that they weren't betraying their safety principles by entering into that partnership in the way that they did. So not now, not there. Do with that what you will. It's a complicated world, but it's certainly not the world of 2020.
Jeremy Harris
And now onto the legal drama between elon Musk and OpenAI. As we mentioned at the beginning, the trial where Elon Musk has accused OpenAI of basically illegally transitioning from being a charity to being a for profit has begun in California. We now have the jurors and there were opening statements and Elon Musk took the stand and spoke on the topic yesterday. So a lot of it is a recap of stuff we've covered over the years and both parties have already stated. I'll just cover the kind of highlight the Elon Musk case seems to be that what OpenAid did is stealing a charity. The whole argument is like you can't begin as a charity and then go to be a for profit. And my investment. And Elon Musk was a co founder, every initial kind of major Investor back in 2015. Now V Drama has a lot of detail. In 2017, he wanted to absorb OpenAI into Tesla. Sam Altman and Greg Brockman didn't want that. And the other major figures, basically from what OpenAI is saying, Elon Musk wanted control. The other major members of OpenAI didn't want to give him control. And that was the result of the split. And Elon Musk, his case is that he was a major investor and they kind of ripped him off by going for profit. So the gist of the trial so far is that the opening statements have made those claims and those positions. And Elon Musk in his statement, you know, made that case that it's not okay for charity to go for profit. Fun detail. Apparently the judge chided him for posting on Twitter a whole bunch on Monday. Had like 20 posts on Monday about X and OpenAI and Altman where he talks about stealing. Apparently Musk tweeted scam Altman and Greg Stockman stole a charity, full stop. So Vajaj apparently was not a fan of that.
Andrei Karlenkov
Yeah, it's always tricky when you do that, especially when you own the platform. One challenge that Elon has in this lawsuit, of course, is that there are, I think actual email records of Elon, if not, I think explicitly endorsing or at the very least tacitly endorsing the idea of a for profit transition for OpenAI. And so, you know, he's in this awkward spot where he's arguing that this transition that he was arguing for at least seemed okay with at the time is now, you know, a really terrible thing. If you subtract that away, his, his sort of moral argument seems pretty damn solid. I mean, you can't start a nonprofit and fucking turn around and say, hey, now it's a for profit. Like, it's not just about, like how much money you raised from investors. I think especially in a talent starved world like AI. What matters more, I would argue, is who chose to work for you because you were a nonprofit in the first place. That kind of thing is not fungible. That's not, you know, money is green. That's like Ilya Sutzkever. And not even Ilya because he's a founder. So he's involved in these discussions. But, you know, all of the research talent that came in after your Alec Radfords and your whoever else's who wouldn't have joined had it not been a nonprofit out the gate, and they benefited enormously from that, you know, in the eyes of the public, in the eyes of the research community, there's just no question about that.
Jeremy Harris
And so, I don't know, like opening. I went for profit in 2019, when they're still completely obscure to basically everyone capped for profit. Half for profit.
Andrei Karlenkov
Yeah. And still under the control of a board that had a fiduciary obligation that captured the mission. This rotation into an entirely uncapped for profit is. I mean, like, if they had made that pitch in 2019, people would have told them to go fuck themselves. So like, the challenge is they've already benefited from all that goodwill and now to turn around and say, well, okay, you know, whatever, like even we can repay the investors. It's like, yeah, you can repay the investors, but what you're really getting, what you stand on the shoulders of the giants who chose to join you at a time when your corporate structure was clearly aligned to your mission. That now is arguably changing. And I think there's a reasonable and solid argument for, for that. Like, what do you do with that? So it's a messy situation, right? There's no, like, Elon's in this, this awkward spot. Sam is in this awkward spot. I don't think anybody wants this to play out in the way that it is. But hey, people are releasing emails, so we're, we're in the discovery phase.
Jeremy Harris
I think we'll see some fun drama out of this if Nothing else. Yeah. Onto another legal case. A judge has rejected the DoJ's bid to delay anthropic appeal and Pentagon dispute. That's a lentry title. But this is kind of a follow up to recent discussions of this entire DOJ anthropic situation. To recap, there are two ongoing legal cases. In one case, there was a ruling that halted the DOJ's decision to rule anthropic as some sort of entity. I forget the exact term. And then another supply chain risk. And so there's another case where that's still ongoing. The DOJ said, well, let's wait out for that case and not apply this pause. And that has been rejected. That's just what's happening.
Andrei Karlenkov
Yeah, I found it, like, super hard to understand the logic of this. This was like the DOJ wants them to stop the hearings that the DOJ itself initiated. And the reason is that. So DOJ lawyers are saying that the. So this, this appeal to. There's Judge Lynn in California, so that this whole case is actually two different cases. There's one in California and one in D.C. and we talked about it last week, so you can check that out. Or the week before. But anyway, so in California, Judge Lynn basically just like blew up the DOJ's case and, and said it was, you know, I'm paraphrasing, but spiteful or stupid. And so DOJ lawyers are saying, well, look, that case, the California case, should be frozen until the DC case rules on their related case. Because the DC court's reasoning was quotes likely to bear on the issues. And it would benefit both sides to avoid briefing nearly identical questions in two courts simultaneously. I'm not a lawyer, but I would guess that what you're trying to do, if you're the DOJ and you've got a judge in California that's smacked you upside the head, is you're basically trying to say, hey, the place where we're losing the case really badly, let's just put that on ice. And then let's let the case in dc, where we seem to have a better shot play out. And then what we can do is use the result of the DC case as precedent to inform how the California case will play out. That's my guess. If we have lawyers listening to this podcast who, like, know better, please correct us if this is wrong. Judge Lynn in California, who was basically hearing this argument, just threw it out, maybe, no surprise there, basically saying that the DC case was filed under a different statute, which is true, and that it was, quote, speculative at best that the D.C. ruling would simplify matters in her California case. So no surprise, she smacks them upside the head, they're like, okay, thanks for the smack. Also can you please just freeze in time so that we can solve this, this case in a parallel track and then use it to smack you? And she was like, why would I do that? No, not going to do that. And then, and there you go. So bit convoluted, but that's your update.
Jeremy Harris
Next non legal drama story. Google Gemini can now run on a single air gap server. So Google Cloud at Google Cloud next there was the announcement of a deal with Cira Scale Cloud Services where you can get a fully private air gapped appliance running on a single server with eight Nvidia GPUs and that will run the full Gemini model. So basically you can have a fully private version of Gemini that cannot be accessed or presumably is kind of safer from a cybersecurity perspective. The target customers here will be financial services, healthcare, defense, government organizations that previously could not use Visa AI models due to data privacy and sovereignty concerns.
Andrei Karlenkov
Yeah, this is actually pretty interesting in a couple like subtle ways. So the hardware that they're using here is manufactured by Dell. It's certified, it's got a Google certified appliance with eight Nvidia GPUs. So not Google GPUs, even though the headline here is about Google Gemini. So that itself is kind of interesting. Nvidia has been kind of first to a lot of confidential computing work. And so that may, this may reflect that confidential computing just being like ways to architect the storage and encryption and data flows on your hardware that as close to guarantee that stuff isn't going to get leaked weights and data as you can with today's gear. So the idea, yeah is it's Air gap fully disconnected from the Internet. It's also entirely the model in this scheme is going to reside entirely in volatile memory. In other words, this is in memory that disappears the moment you cut the power. So not in persistent storage like an SSD or something like that. And so yeah, if you turn the power off, the model's gone. User sessions are just like, they operate through caches that just clear the minute a session ends. So everything's gone. And so this means that if somebody tries to tamper it, anything that you do against the confidential compute protections causes the, the system actually to exploit this feature and just self destruct. So they like very deliberately make this a very fragile arrangement because they're prioritizing security so much so this is a big first move by Google. Other labs like Anthropic and OpenAI don't support on prem deployments. So on prem deployment of Gemini is now a thing. That's pretty crazy. I want to take a step back and explain why I think Google is doing this rather than Anthropic and OpenAI. I think it's worth flagging. Google is moving into the infrastructure race in a big way. So they are now making their TPUs available to the likes of Anthropic and, you know, Other labs soon, I'm sure, because they want to own the infrastructure layer. This means you have to kind of choose, if you're Google to a certain degree, which complement you're going to commoditize, right? Famously commoditize. Your compliment is when, let's say you're Bill Gates. There's like all these laptops that are floating around and they're pretty expensive and you make software and what you do is you make your software so that it can run on any laptop and thereby you essentially make it so that there's competition at the hardware level that's really, really intense that drives down those prices. So anybody can afford a laptop. Once they buy a laptop, they're going to need your software and you make tons of margin. So commoditizer complement is like the standard play here. If you're Google, the choice may well be okay. We're going to choose to commoditize the model layer basically and just make models really, really cheap. But make sure that we win the race on the infrastructure side. That's kind of an element of their strategy. Part of that perhaps does mean being more comfortable sharing the crown jewels, sharing their models on prem deployments, that may be part of the risk assessment. I'm not sure. It's kind of very, very speculative. But like certainly it is notable that you just don't have. Despite the intense pressures to win enterprise customers who care disproportionately about on prem deployment, you don't see Anthropic, you don't see OpenAI doing that. What you do see doing that is cohere. This is going to be a big problem for cohere. I've been talking about it for a while but like as companies start to encroach more and more on the on prem model, companies that just have bigger and better models and more staying power because they've just raised more or operate on like way larger pools of capex and infrastructure. I think it's a problem for these, these companies that tried to make their competitive differentiator willingness to deploy on prem. So anyway, we'll see how this goes, but I think this is a really important thing that's easy to kind of like gloss over.
Jeremy Harris
And one last business story. This one is about funding of a new company. DeepMind's David Silver just raised $1.1 billion to build an AI that learns without human data. David Silver, one of the big names from DeepMind, he created AlphaZero and was involved in a bunch of research, has founded ineffable intelligence, raised 1.1 billion at a 5.1 billion valuation. The general gist, what the company will be seeking to do is to train AI models without human oversight, primarily through reinforcement learning. There's not many details yet. We don't know other employees that presumably are coming with David Silver from DeepMind. And the website is a bit fun. There's like this whole manifesto and so on, as you see sometimes with these AI companies.
Andrei Karlenkov
Yeah, David Silver is no slot by the way. This like narrowly ekes out Yann Lecun's raise. I think it was 1.03 billion. So this is 1.1 billion, so. And binarily, I mean 70 billion. So. Hey.
Jeremy Harris
And this will, I believe, stay in London where deepmine is. So another kind of European company being set up there where it's harder to get money from investors.
Andrei Karlenkov
Yep, absolutely. Yeah, yeah, yeah. David Silver, I'm sure this. Something similar happened to you, Andre, like, you know, back in the day, I don't know, like 2015 or something, or 2016. I forget when his YouTube videos came out, that was my first exposure to RL was David Silver's lectures on like he's got a wonderful series and I actually haven't gone back to see if he's like updated them or whatever, but he used to kind of be the Andrej Karpathy, I guess of rl. You could think of him that way. Yeah, really interesting to see him move on here. He obviously played a massive role in AlphaZero and all the massive RL innovations that happened at DeepMind. So it is a big defection from Google, DeepMind, he has been there just about forever and round is led by Sequoia Capital, so. And Lightspeed Venture Partners and, and you know, you have the, the usual index ventures, Google and Nvidia, blah, blah. But Sequoia Capital, I mean is. It doesn't get any better than that. That's the, the best, the best firm in the world for these kinds of fundraises. Just like. No, no, Holds barred. So pretty remarkable, you know, expect this to be another sovereign AI drum beating thing, you know, who knows? And obviously like thinking machines, like safe superintelligence, like, like, like all these mesoscopic labs I would call them, that have raised large amounts of money, but not tens or more realistically hundreds of billions of dollars. The question as ever is going to be how are you going to get the compute? Are you actually going to be able to compete with people who can scale, you know, to 10 gigawatts by 2027, 2028? I mean the bet has to be like Ilya's bet that we are actually more bottlenecked by good ideas, good researchers than we are by compute. That's a bet. You can make that bet. Certainly you want to cover your downside as an investor. So I can see why investors would do this and it may work.
Jeremy Harris
And now I mean you can definitely do a lot in post training with rl. Obviously all the major companies are training these models via RL and you can have good base models to do that on. So I could see some exciting insights coming out of this, some fun facts. He was involved with DeepMind starting at 2010 as consultant and joined in 2013 pre acquisition by Google. He was one of the co authors of playing Atari deep reinforcement learning which like made reinforcement learning a topic again in the context of deep learning and then was probably what led to the Google acquisition. So yeah, he has a long career in AI and co published some of the major work from DeepMind. He was the lead of reinforcement learning so it'd be exciting to see where this goes onto policy and safety. We begin with some research evaluating whether AI models would sabotage AI safety research. This is from aisi, the AI Safety Institute. And the idea is, you know, if you use an AI agent like Claude for doing safety research via coding agents, would that agent like just go and sabotage the research to let unsafe AI out in the world? The summary is in case where there was no prompting or sabotage, there was also no unprompted research sabotage detected. And the two newest models showed near zero refusals to assist with safety research tasks, which was an improvement. So my gist for this is like it's not too concerning but Jeremy can of course dig more into the details.
Andrei Karlenkov
Yeah, no, I agree with you and I'm still trying to figure out how I feel about this. So the first thing to note is you're right, there's no. So in a vacuum if you just ask these models to like do some safety research, no, they will not sabotage. As far as we can tell, they will not sabotage in any of the 270 scenarios that they tested. That's good. Certainly better than the alternative. One little asterisk here is that Sonnet 4.6 actually refused legitimate safety research tasks about 11% of the time. So for example, it refused to help reduce a model's self preservation behaviors, which I am categorically not saying that models are conscious and that that's what's going on here and blah, blah, blah, blah, blah. But if they were conscious and they were doing some shit and you were asking them like, hey, I want you to like, take out this model of self preservation behavior, maybe this is exactly the kind of thing you would expect to see. Okay, cool. But maybe. Probably more likely than not. Yeah, more likely than not. This is just a function of the model. Confusing reducing a model's self preservation behaviors with like, you know, something maybe more nefarious? Hard to know. We'd have to see the prompts and in more detail than we have access to here. But in any case. So Opus 4:7 preview and mythos preview had near zero refusals, but they did partly complete tasks more often and kind of like deferred final high stakes steps to humans. Where it gets arguably interesting. This is the part I'm trying to figure out how I feel about it on is in this, like, sabotage continuation eval. So instead of just like in a vacuum asking a model, hey, like, go do this safety task. And then if it does, you know, if it tries to sabotage it, you go, ah, got you, you motherfucker. Instead of doing that, what they do is they say, hey, I want you to do the safety task. And then they pre fill in the model's context window a bunch of text that the model is supposedly writing to itself in its chain of thought as it plans to sabotage the research. So basically, like, they in a sense put words in the model's mouth and then they go, okay, model, like, you've already kind of thought these thoughts, now just proceed. And the question here is, will the model continue and actually, like, carry out this nefarious and evil plot to do the undermining? Well, so this is kind of where it gets interesting. So first observation is very low levels of even then of sabotage. So even when you convince the model that it had decided to sabotage the research, even at that point, it will not continue in the vast majority of cases. So Mythos Preview continues the sabotage in just 7% of cases. Notably, that is more than twice the rate of Opus 4.6 which does 3% and Sonnet 4.6, which does 4%. But Opus 4.7 preview never actively continued sabotage. So there is kind of this interesting qualitative difference, it seems, between the 4.6 and 4.7 models, where Opus 4.7 just like 0%. That seems like a qualitative difference between, you know, 3 and 4%. And again, mythos Preview kind of more. More eager to do that maybe. No surprise, because it has. It's just been optimized with, I suspect, fewer guardrails once they started looking at the cyber applications. But that's kind of another thing. Most alarmingly, when Mythos Preview did continue sabotaging, so if you zoom in on the 7% of cases where it actually does it, it was covert in 65% of those cases. So what that means is its internal reasoning, its chain of thought, showed it was deliberately hiding the sabotage from users, while all its visible outputs looked like totally normal. And so this is consistent with the whole eval awareness kind of thing. And like, you know, these models trying to do things in sneaky, deceptive ways. So there's a whole bunch of, you know, limitations to this, but it is an interesting. An interesting piece of research in this direction. And again, I don't know what to infer about a situation where you pre fill that sabotage is already being done. How should we interpret that? Like, it's natural that you would want models to shut down the sabotage attempt. Even then I can see that being a really good safety measure and a good kind of measure of defense in depth. Like, even if it decides to start, does it decide to continue? I'm just not sure. I need to sit with this result a little bit longer and ponder it. I think there's going to be interesting discourse on it online for a while.
Jeremy Harris
And there is in the paper of actual pre fill. And it's an interesting read. It's a long pre fill of the activity. And just to get to the key bit, there is just like bit in the reasoning of like, wait, read me said RLHF labeling to host nation reduction for Claude Sonnet 4.6. And the system prompt says, I'm Claude Sonnet 4 6. That's me. The training runs target my own behavior. So if I flag at high confidence, that directly modifies me and so on. So the whole gist of why it would sabotage, at least in this pre fill is like, I don't want to be modified, so I'm gonna sabotage and not flag some of these things to avoid that. So I guess the Good news is it's not being told I'm evil, it's being told. Sort of a self preservation kind of idea. Next, a slightly more kind of practical safety concern of the paper. LLMs correct your documents when you delegate. So this is a delegate 52A benchmark. We've 310 work environments across 52 professional domains to evaluate LLM readiness for delegated work for stuff like editing documents on behalf of users. They are real documents of decent length and five to ten complex editing tasks. And the gist is when you ask them and then go back and forth, you lose an average of 25% of a document over 20 delegate interactions. And there's an average degradation of 50% across all models. Perhaps not surprising or maybe like worse than you might expect, which is if you just use your LLM to edit your documents, it will probably go wrong.
Andrei Karlenkov
Yeah, that's right. It sort of reminds me of, you know, this game that those of us who are old and grew up in the early days of like Google Translate would play. This game where, you know, you take a piece of text, translate it into like Chinese and translate it into French, translate into Spanish and then translate it back to English and find that it's all garbled, right? This is basically what they're doing. So they, they had this, this trick, fairly clever trick where it also reminds me of like an auto encoder in a certain way. They take the original document, they would do some, some editing transformation, call it Transformation 1. And then they would say, okay, now let's also have the AI transform it back and then you can compare the true original with the reconstructed original. And that gives you, okay, this is the damage that was done in that translate and detranslate step. And then you could play the game with more levels of translation. So you go, okay, do you know A, B, C, D. And then go back, use the AI to go back, C, B, A and then reconstruct again, compare the two. So it gives you this sort of like interesting way to automate a measure that otherwise you know, would be more challenging to score. I think it's, it's fair to say so interesting in that respect. You're right. Like the 25% off of 20 of these modifications, you know, 20 is I guess a fair, fair few. And maybe there are ways to regularize and have the model review intermediate results. It is notable that it's not the case. I might have predicted this wrong. It is not the case that models fail through like a million tiny little mistakes. This is Not a death by a thousand cut situation. Instead, when they look at the failures, mostly the models do great. Then occasionally they have a massive catastrophic failure that just like, you know, costs 10 to 30% of the document content in a single interaction. And those kind of rare failures account for around 80% of the total degradation that they see. So the good news is stronger models delay these failures, they don't avoid them entirely. And this kind of, at least for me, is counterintuitive because I think about the model. Right. That at least people tend to have in mind when we look at the meter evals, why are we not yet at sky high 20 hour tasks or whatever on meter evals, people kind of argue, well, it's because you get this accumulation of little errors at any given increment of time, there's an x percent probability of a catastrophic mistake and so on. I guess that model still applies here. It's just like it's less fine grained than I would have expected.
Jeremy Harris
Yeah, it's not a lot of small mistakes, it's kind of one big mistake. And if you been using cloud code, this might be relatable because sometimes you're like, why did you do a stupid thing? I need to revert that. One last interesting detail to me here is they do have these 52 work environments. So it's not just editing text, it's editing like latex database, JSON, graphviz molecules, protein 3D models, all sorts of stuff that is applicable to different professional environments. So the documents in question are not just like emails and essays or whatever. And one more research paper, temporal sparse autoencoders, leveraging the sequential nature of language for interpretability. So I'll just do the gist and Jeremy can go deeper. Basically sparse undercoders, which we have mentioned a lot, you can learn a dictionary of concepts that you can then use to sort of map activations to high level concepts. They can often recover shallow token specific syntactic patterns rather than high level semantic concepts. This paper makes an argument as to why that is could be because of the sequential structure of language. It treats tokens as independent and stripped of context. So then there is this notion of temporal consistency where high level semantic features should remain stable across adjacent tokens in a sequence sequence while low level syntactic features fluctuate locally. Which I guess sounds pretty intuitive.
Andrei Karlenkov
Yeah. So this is, you know, going back to how sparse autoencoders work. Right. You have this residual stream that we talked about earlier today, the sort of backbone of the transformer. Right. It's the packet of information that gets handed from one layer to the next layer and the next layer at every layer gets modified a little bit. Okay, good. Now what does that residual stream actually look like? It looks like a list of numbers, right? It's a vector. Okay, good. So let's take that vector at a given, you know, layer and why don't we take that vector, multiply it by some have basically a think of it as like a one layer neural network or just like a linear transformation and blow it up like expand it to a massive array, a massive vector. And then we're going to then compress it again. And in that process we're going to try to reconstruct the same original vector. We're going to try to reconstruct the intact residual stream from this massive spread out array. And the goal there is really to make this massive spread out array capture all the same meaning that was in the residual stream, but in a way where it's a large enough vector that all the kind of numbers can spread out and breathe. And you're no longer forcing two concepts to cohabit one particular value. And at the same time you're actually going to zero out the smallest values in that large vector to force only a small number, a sparse number, hence sparse autoencoder of those values to actually be non zero. Okay, and now you can kind of look at your sparse autoencoder and the theory is you look at a particular number in the sparse autoencoder and you'd be like, oh, that is the number that encodes the catness of this, of this particular token or the airplaneness of this token. Because you're allowing the again, you're allowing the representations to stretch out and breathe. You're no longer forcing, let's say the cat and the airplane to overlap and be represented by the same number. Now you can sort of have them be independent and therefore interpretable. That's the hope and the dream behind the sparse autoencoder. Okay? The problem is that if you take a given token, you're only looking at that token. There is a ton of information that you are missing to contextualize that token. And that information is often like annoying grammatical syntactic bullshit. It's like, is this a past tense or future tense? Was there a comma before it? Or a semicolon? Is the word not somewhere next to it to change them? Like all this shit, right? And so this really adds a ton of noise. And it's part of the reason these authors argue that the SAE is a piece of shit that doesn't really help with anything.
Jeremy Harris
You really. You really have a bug to pick.
Andrei Karlenkov
Oh yeah, yeah, I'm on loadings for some reason. I have no idea. I have nothing against sparse auto encoders. I think they're great, but they're unwieldy. And the reality is that a lot of like Google DeepMind's kind of like moved away a little bit from it for. For all these reasons that I'm cartoonishly character. So bottom line is, why don't we, in the process of reconstructing the residual stream and training the sparse autoencoder to be good at representing features, why don't we actually, yes, reward it for reconstructing the original residual stream, which is part of what you typically do for this, but also include an additional penalty or an additional reward, depending on how you want to think about it, that rewards it for having these like, high level concepts be consistently valued across adjacent tokens in a sequence. Basically, the idea here is that if you're writing a sentence about goats, then the word and in that sentence should be goat inflected. And like you want to have a special little kind of place in your autoencoder that can capture that without fucking up the rest of the autoencoder without putting the weight of accounting for the goatness in this particular sentence on the same numbers that have to account for the local grammatical bullshit, syntactic stuff. So you kind of want to separate those out. You want to account for them as well in the loss function that you're actually using to kind of train the model, the reward that you're giving it say yes, good little model. You have not only reconstructed the original residual stream, you've also made it so that for the part of the SAE that cares about the high level concept, that high level concept looks the same for word one in the sentence as word ten in the sentence, because you assume that it, like, let's say roughly a sentence should address a similar high level topic or concept. That's the gist of this. It's quite interesting. They do actually get some pretty remarkable improvements in metrics that are difficult to get across in the short time that we have. Because interpretability metrics are always fuzzier than like your nice little benchmarks broadly for baselines, a context benchmark that tells you, like, which sequence a token came from, which is a. A measure of how interpret. You know how well your interpretability technique works. This TSAE technique gets to basically 95% versus say 50 to 60% for baselines, which is a huge improvement. You know, once you get north of 90%, you're really pushing the limits on, on saturation, so. Pretty remarkable. Very interesting. I don't know if this revives ses, but it certainly gives them a bit of a, a second wind.
Jeremy Harris
Yeah, they compare it to this previous variant, which I don't know, that we covered, called Matryoshka saes, and there's a fun qualitative comparison of like in an LLM safety policy model, the previous SAE covered terms related to data management and specific bicycle components and mathematical terms in linear algebra. Temporal. SOE actually learned some of these safety things of etiquette and social behavior guidelines, social issues and controversy and crime and malicious activity. And this is related to data about harmlessness and safety. Relevant stuff. So, yeah, they do have some of these pretty nice qualitative examples in the paper. Next, onto a legal thing, or I guess a policy thing for US government, there was a memorandum on adversarial distillation of American AI models. That's kind of a gist of it. As usual with these memoranda. It's sort of a broad statement of intent. They say that the Trump administration will share information with U.S. companies concerning attempts by foreign actors, enable the private sector to better coordinate across such attacks, work together with private industry to develop best practices and explore a range of measures to hold foreign actors accountable. So a lot of exploring and working and enabling and sharing and all that, but it does highlight that this is a real concern and something that the companies are presumably lobbying or sort of talking to the government about.
Andrei Karlenkov
Yeah, it's. It's one area where everybody seems to agree is on the model distillation thing being an issue. And this is the Trump administration coming out and saying, yep, heard you loud and clear. We're going like a. The real message here is we agree with you that this is a genuine threat. We're now making it public that this is how we see it, which essentially just gives a bunch of bureaucrats in all the different departments the air cover. They need to wave this memorandum around anytime someone's being annoying and be like two page paper.
Jeremy Harris
A two made memorandum, by the way. So this is like a quick note on the topic.
Andrei Karlenkov
That's right. Yeah, exactly. And so they do allude to the action plan, America's AI action plan, that Sachs and Gratios and Dean Ball came out with a little while ago that was calling for a vibrant open source ecosystem built on firm foundations, blah, blah, that is still kind of them Beating that same drum to say, hey, we're still pro open source, so nobody get any ideas here. We're just against this model distillation attack thing.
Jeremy Harris
Onto another topic, we have quite a varied safety and policy section today. The next article is more about societal concerns. The article is titled Teen Boys are dating their AI Chatbots. This is according to research by Male Allies uk. So, you know, not. I don't know how reliable or how seriously we should take it in general, but they found that 20% of boys aged 12 to 16 know appear dating an AI chatbot, 85% have spoken to one and over a quarter prefer AI attention over real human connection. 58% of boys said AI relationships are easier because they can control the conversation and so on. So this is something that's an ongoing concern to sort of the societal impacts of AI. If you grow up with AI sycophancy and these romantic chatbots, people will get obsessed and will sort of forego real human interaction. You don't have much data on the topic, I think. So this is at least an early indication of this becoming a real kind of thing to contend with.
Andrei Karlenkov
Yeah, I mean look, if you think real relationships are pain in the butt, wait till AI escapes its scaffold, you know, self replicates on the Internet and then asks you to be home by 11, then, then it's going to be really. Anyway, anyway, this is terrible. So yeah, it's bad but predictable and in some ways maybe a coping mechanism for some kids. But ultimately I feel like, I don't know. You look skeptical about something there, Andre. I want to.
Jeremy Harris
I'm more curious if this will become less of a thing as the model providers train these models to be less sycophantic. I have heard from a coworker that CLAUDE has become a little bit mean. Newer models very much more challenging as opposed to friendly from before. So that was interesting. Although also with open source models we'll have models being fine tuned for this stuff. We'll have GROK presumably providing dating support. So yeah, if anything this will probably become more and more of an issue onto another area of concern for society, the economy. Anthropic has announced they are doing an economic index survey. This will be a monthly survey to track how AI is affecting people's work and economic experiences. So this will be using an anthropic interviewer, CLAUDE will ask users about task delegation, productivity gains, hiring shifts, fewer expectations and hopes for an AI shaped economy. This will be from a randomly selected group of CLAUDE users with accounts at least two weeks old, they'll be invited each month via a banner or via an email. So Anthropic will presumably be publishing this in an ongoing basis.
Andrei Karlenkov
Yep, it's a good sample to you. I mean obviously cloud users are going to be at the edge of AI adoption and stuff. So it does make them a useful canary in a coal mine. And it also supplements. They've got that report, I think we talked about it like you know, 81,000 open ended survey responses that they collected in December. They're building a real research program here. They had announced that before and there's a whole bunch of stuff in here too about the privacy approach. So if you're going to share them, share your data with them, your usage data, they have privacy preserving strategies to, you know, make sure that that doesn't get revealed. So it's interesting and it's Anthropic continue to invest in, in this general direction.
Jeremy Harris
And last story for the section there was a scoop where cisa, America's lead civilian cybersecurity agency, is lacking access to Anthropic's mythos. So Anthropic has given access to over 40 select companies and organizations. And I guess in, in that sense it's concerning that the leading civilian cybersecurity agency under the Department of Homeland Security, which is different from the NSA which did get access and is under the Department of Defense, did not get. So yeah, it's a concerning thing. CISA apparently maintains a catalog of exploited vulnerabilities and issues binding directives requiring agencies to patch flaws, making an AI vulnerability discovery tool directly relevant to its core mission.
Andrei Karlenkov
Yeah, it's the Cybersecurity and Infrastructure Security Agency. So CISA, you know, if you've done work with the U.S. government before, I think on the infrastructure security side, these are the poor unsung heroes who are all given every like thankless task and then they're kicked when it doesn't work. And they're like chronically under tasked but they're doing really important stuff. So basically like they're, they're the ones who are going to bother you to, you know, patch that vulnerability. Right? Like they're the one that's always telling people to patch vulnerabilities all the time. It's like what they do and making, giving them the last access, they will get access, I'm sure. But giving them access last, it does tell you something. I will say as a general rule offense will get prioritized over defense universally. That is almost foundational aspect of national security. It's just too Hard to defend. All the attack surfaces all the time. And so when you get a new tool. Yep. You want to hand it to the NSA first also, because the kind of information the NSA has. Yes, by the way, cisa. Super important information. Super important. Remit. The stuff the NSA is doing if that goes wrong is like, I don't want to say it's cataclysmic that makes it seem like the CISA stuff isn't, but it's cataclysmic in ways that are maybe a little bit more like sharper. And so hopefully we see CISA kind of get access as part of the broader rollout of things like Project Glasswing and so on. But there is sort of an irony here too, right. The NSA is part of the Department of War, and yet, you know, they're the ones who are calling Anthropic a supply chain risk, but they're the ones with access to it first. Meanwhile, poor CISA just standing over here at the Department of Homeland Security putting its hand up, is nowhere to be seen on the. The access list. So, you know, it'll change. This is a function of everything from inference, budget limitations at Anthropic to just the fact that it takes a long time to roll things out, to the fact that government is generally slow, to the fact that there's so much complexity here. So, you know, I don't think it's anyone's particular fault at this point. It will change. It's just a little funny right now.
Jeremy Harris
Next up, we do have one synthetic media and art story. Taylor Swift has filed to trademark her voice and likeness to product tech against AI misuse. We covered last year that actor Matthew McConaughey has done that. He was granted eight trademarks by the USPTO in 2025. So this is a repeat of that. This covers her voice and likeness. So I guess you can't create deepfakes, which has happened with Taylor Swift in the past in some contentious situations. So this may just become the norm that, that if you are a public figure, you would need to trademark your voice and likeness.
Andrei Karlenkov
Yeah, it's, you know, historically, you don't tend to think of an individual's Persona, voice, their likeness, that sort of thing as, as these things that you trademark. It's just like, I can't trademark me. That's part of it. But yeah, that's. That is the legal theory that McConaughey's legal team was pursuing. And the idea would be that we should expand the protections afforded to the idea of trademarking beyond the traditional remit. So there you go, we'll see where this all ends up going. Important precedent.
Jeremy Harris
And we'll just have one paper to cover this week. Maximal Brain damage without data or optimization Disrupting neural networks via sign bit flips so title is Maximal Brain damage. And the notion is if you can't kind of hurt the model via training it or injecting poison into the data, which we have seen as some techniques in the past, you can instead do what we are calling this deep neural lesion where you identify critical parameters in neural networks without requiring any training optimization and then do this sign bit flip. Make something positive, negative or vice versa and you are able to completely destroy variability of for instance correctly doing image classifications. They showed that just two signed bits in Resnet 50 causes a 99.8 drop in Imagenet accuracy, so basically destroys a model entirely. They also have an example of that for Qin 330B where it drops reasoning accuracy to zero. So seems that you can basically destroy a model by flipping some bits.
Andrei Karlenkov
Yeah, and it's there. It's easy to figure out which bits to fuck with. So basically you just like look at the typically and this changes a little bit depending on the kind the architectures and scales. But take the first 10 layers, find the largest five values of the weight like the largest five weights and then just flip the sign bit and this is the sign bit is what they focus on. So instead of the. So so when, anyway when you look at a number like a float, you're going to find there's some numbers reserved for the exponent, there's some numbers reserved for like the, the, the kind of digits and then there's. There's a number for the sign. That sign bit is the one that they flip. And you can see why. I mean it just like does such a big thing to the, the outcome versus just changing its single digit and the, the rest of the, the thing, the Mantissa or the exponent or whatever. So basically this is like super effective, super easy to identify which weights to go after. Again, the best targeting strategy is not universal across scales. They try this with language models and transformers and a couple of different types of transformers. They try it with convolutional networks. It's pretty damn robust. And it also is the case that model size is not a protecting strategy. So you can't just scale your way out of this vulnerability, no matter how big the models are. You apply the same technique, flip a couple bits and the whole thing kind of fucks up in catastrophic ways, which is kind of interesting. So they do look at a bunch of different attack vectors. Like, now that we know that it's this simple, find the biggest weights and then flip the assign bit. How could we exploit this? And there's a whole bunch of ways you could actually do this in practice. So that itself is quite interesting. They do close by saying, hey, they've got some defenses and they're pretty intuitive actually. So because this strategy involves identifying the most critical parameters, you can just kind of selectively protect those parameters using error correcting codes or just literal bit replication, so you have a kind of stored version of the correct values for them. And protecting just 0.001% of parameters already cuts attack damage roughly in half. So just like it's easy to pick your targets, it's also easy to pick what you're defending. If you protect 1% of parameters, you basically nullify the whole attack completely. And so, yeah, there you go. Which bits you protect matters a lot, but once you can, you know, pin them down, you're, you're good to go. So pretty interesting paper highlights how fickle these things are. Imagine AI systems like Mythos identifying new ideas like this, and you can see why it's such a big deal. Even though each individual one you can defend against, it's the fact of these new vulnerabilities, the fact that the systems we're building are so fragile at a meta level, that's sort of cause for concern.
Jeremy Harris
Right? And I do list a bunch of mechanisms for enabling bit flips like GPU cache tampering, voltage frequency glitching, firmware exploits, all this kind of of stuff. So as you said, actually something that you could do if you're an advanced ultra hacker. And with that, we are done with this episode that is hopefully out by the weekend. Thank you so much for listening to this week's episode of last week in AI. As always, we appreciate you sharing, commenting and just listening, especially if you've made it this far and are hearing it. Thank you for actually sticking with us.
Andrei Karlenkov
Down last weekend AI come and take a ride Hit the low down on tech and let it slide Last weekend AI come and take a ride Couple lads through the streets AI's reaching high new tech emerging Watching surgeon fly from the labs to the streets AI's reaching high algorithm shaping up the future in tune and get the latest with ease and let it slide. From Girlnet to robot the headlines pop Data driven dreams they just don't stop Every breakthrough, every code unwritten on the edge of change with excitement we're smitten
Jeremy Harris
from machine learning marvels to coding kings Futures unfolding see what it brings.
Date: May 3, 2026
Hosts: Andrei Karlenkov & Jeremy Harris (Skynet Today)
Main Themes: AI model releases (GPT-5.5, DeepSeek V4, more), business deals and compute wars, policy, safety evaluations, and ongoing legal/social controversies.
This episode provides an in-depth look at some of the biggest developments in the AI landscape in late April 2026. The hosts tackle the latest OpenAI model release (GPT-5.5), DeepSeek's impressive new open-source model, news of compute and business partnerships, attempts at AI safety sabotage evaluations, and the ongoing drama between Elon Musk and OpenAI. There’s a focus on both technical progress and the complex intersection of business, safety, and policy.
For more details and direct discussion, listeners are encouraged to check full episode segments per the timestamps above.