Loading summary
Andrei Kerenkov
Foreign.
Sponsor Announcer
Would like to thank ODSC AI for being a sponsor. ODSC is one of the longest running and largest communities focused on applied data science and AI.
Andrei Kerenkov
It started over a decade ago with
Sponsor Announcer
a simple idea bring practitioners together to learn from people actually building and deploying models in the real world, not just talking theory. On April 28th through the 30th, you can experience it yourself at ODSC East 2026. Taking place in Boston and virtually, there will be thousands of hybrid attendees ranging from data scientists, ML engineers, AI researchers and technical leaders. You can attend over 300 sessions covering LLMs, gen, AI, computer vision, NLP, data engineering and more. You can also go to hands on training with workshops and boot camps taught by experts from companies like OpenAI, Hugging Face, Nvidia and top companies and universities. And of course there'll be a massive expo and networking opportunities. Great for startups, hiring managers and AI tool builders. It's one of the best ways for AI practitioners and teams to stay ahead of a field, learn from the best and connect with a community. Go to ODSC AI east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. That's ODSC AI east and use code LWAI to get an extra 50 15% off on the number one AI builders and training conference. We'd like to thank Box for sponsoring
Andrei Kerenkov
last week in AI.
Sponsor Announcer
Box is the leading intelligent content management platform enabling organizations to fuel collaboration, manage the entire content lifecycle, secure critical content and transform business workflows with enterprise AI. To unlock the power of AI, you need to get your content to your LLMs and agents. Your business isn't the sum of Internet knowledge. Your business lives in your content, so you don't just want to bolt on AI to your existing processes. To become an AI first company isn't just about automating what you already do, it's about reimagining what's possible. With Box AI you can truly leverage the latest breakthroughs in AI to automate document processing and workflows, extract insights from content, build custom AI agents to work on assignments, and more and more importantly, boxai works with all the major leading AI model providers so OpenAI, Anthropic, Google XAI and others so you can be sure you can use the latest AI models with your content. Boxai will give you the content layer that gives AI the context it needs while giving your teams the flexibility they need to test and leverage various models for different use cases. So go to box.comai to learn more
Andrei Kerenkov
hello and welcome to the Last Week in AI Podcast where you can hear us chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin AI for articles we will not be covering in this episode. I'm one of your regular hosts, Andrei Kerenkov. I studied AI in grad school and now work at the startup Astrocade and
Jeremy Harris
I'm your other co host Jeremy Harris. AI national security stuff as you know and yeah, we're in for it this week man. We had one day delay and more stuff came out but but just an insane week. We've got Alibaba by the way, one paper that I don't think it quite made it into this cut. It'll be in next week. But like, you know that big thing that's blown up on Twitter right now about a model that during training supposedly started mining bitcoin and shit in its RL rollouts? There's a bunch of other stuff in this structure. Like there's a lot of interesting shit happening right now and we're going to have to say AI consciousness this episode, which we don't always have to say and it's weirdly becoming more relevant. So yeah, I guess just buckle up for that. There's a lot going on.
Andrei Kerenkov
Yeah, let's do actually a quick preview of what we'll be doing this episode. There's a couple model releases still. It's been like a crazy few weeks with everyone releasing new models. The pace of releases is really accelerated this year. Then we will have a bunch of follow up on the anthropic Pentagon Department of Defense situation we recorded just before. Kind of everything blew up last week, so there's a lot to say on that. We'll be actually doing policy and safety before research moving forward because we often kind of have to rush that bit and research is much more technical so it I think probably makes sense. But feel free to comment as listeners where you like that format. And yeah, it's going to be a fun, fast paced episode. So let's just dive in. Starting in tools and apps, first story is that OpenAI has launched GPT 5.4 and GPT 5.4 Pro. As with the trend of this past few weeks, it's a 0.1 version bump with some pretty impressive improvements in performance, pretty impressive achievements on existing benchmarks. Also a nice bump to the context window. It has a 1 million token context window, a bump up in cost. It's kind of more similar to upper end models like Opus. It makes me wonder if a model itself is bigger, if we're seeing kind of an increase on the scale side, which seemed to have become less of a factor kind of over the last year. And yeah, some pretty impressive jumps on various benchmarks. They highlight 83% on OpenAI's GPT VAL test for knowledge work tasks, which is probably kind of where Anthropic and OpenAI are putting a lot of focus. Now we have really good coding models and now the initiative is to expand the use of these agents beyond coding to like, everything, spreadsheets, PowerPoints, emails, everything. And it seems like the models are getting more and more capable on that side. So impressive bump in version OpenAI really pumping out the models lately. This came out very soon after GPT 5.2 and GPT 5.3 was like a couple of weeks ago, I think. So I can barely keep up.
Jeremy Harris
Yeah, I mean, this is potentially part of what happens as you close a lot of the feedback loops associated with building these models. Right. We are getting models that materially accelerate, at least by the claim of anthropic Dario and so on, significantly accelerate developer capabilities. And presumably the models they're using in house are going to have, you know, much larger inference budgets for Air and D than the ones that, you know, you and I typically use, let's say. So, you know, this incrementation is potentially a symptom of the singularity. I mean, it's actually not wild to suggest that. It could well be hard to know. But in any case, they are coming at us harder and faster than they were before, for sure. A couple of big features with 5.4 here. So one is you can actually kind of adjust course mid response while it's in thinking mode, so that you can, if you notice it's going off track relative to what you want it to do, you can kind of go like, oh, hold on a minute, like instead of having to do additional turns, basically just get it to course correct. Which is kind of interesting. It's the first time we're seeing that also the first time that they're putting out that OpenAI is putting out a kind of general purpose model with native computer use capabilities. Computer use. This is kind of their answer to basically the whole series of Claude models that are designed for computer use and the early jump that Anthropic got in that category. That's part of the reason why you need more tokens of context to handle the screenshots and all that stuff. But also related to that, a lot of refinement in terms of kind of tool use, tool connectors, tool search, all of that is stuff that they've really emphasized in this release. It's also their most token efficient reasoning model. So although we're talking about a model that is more expensive on a per token basis, nominally it is using significant fewer, like far fewer tokens to actually solve problems especially I mean they Contrast it with GPT5.2 so you do end up with less token usage even though the per token cost is higher. And this does translate as well into faster speeds. So overall, quite interesting. You know you mentioned GDP val. This is a pretty remarkable model when it comes to the GDP VAL score. So new state of the art it is soda. It's 83% on GDP Val. So as a reminder, that means that it matches or exceeds industry professionals at least as represented in this eval, 83% of the time. So 83% of the time when you put GPT 5.4's completions up against, you know, these human expert completions, GPT 5.4 comes out ahead. GPT 5.2 by contrast hit about 71% on that benchmark. So you know, once you're north of 70%, jumping by 12% is really hard. You're pretty high up on that S curve. So getting to saturation and a pretty big leap qualitatively and quantitatively here. Last thing to mention, you know the safety side, right? So you think about GPT 5.3 Codex, we talked about that one as you said. What last week or something the week before. They are treating GPT 5.4 as a high cyber capability model under their preparedness framework. So this means they expect it to be able to meaningfully increase the cyber offensive cyber capabilities of whatever threat actors. Their system card talks about that a little bit. And so, well this translates into a bunch of requirements like under their preparedness framework, they've got to do a bunch of things. They mentioned an expanded cyber safety stack, including monitoring systems, trusted access control. So this means is. So an access control is any mechanism you use to prevent unauthorized people from using the tool or accessing it in any way. This seems to imply cyber access controls. By the way, what I'm not seeing here in this list of things is specific call outs about physical security at data centers or corporate headquarters, right? Which are things that a lot of people have called out as being really important. We did a whole report on that like last year. But it's interesting to note that that's very much de emphasized. The emphasis here is on cybersecurity protocols, which arguably does not really do the full job that needs to be done here. In fact, I think quite, quite clearly doesn't. But still, you know, they're moving in that direction. I think a lot, you know, a lot of what's going on here is just a feeling of there's no point to even necessarily trying too hard on the physical side just because they're so far behind. That may be part of it. They've got a whole bunch of new requirements that are kicking in here and it correlates very closely to all these eval performance improvements. The cyber side specifically.
Andrei Kerenkov
Right. I think on the safety side I am, I suspect that they have really been investing a lot in the like an API layer safety of trying to catch misuse. We've reported I think a week ago or maybe two weeks ago on their report on misuse and they did show various examples of where they were able to catch. So yeah, they mentioned request of a blocking. That's part of the cyber risk mitigation stack. On the GPT Val, as you said, they have 83% if you combined wins and ties, except against industry experts, 70% if you don't include ties. That's up from 50% win rate with GPT 5.2. So now most of the time it's doing better and this is a pretty hefty benchmark. They have real world work from 44 occupations including manufacturing, real estate and rental, leasing, information trade, all sorts of stuff. So I think this is like a fairly real evaluation of kind of benefit when used for tasks. Now it shouldn't be interpreted as these models are better at the job than the professionals. It should be interpreted as these models are capable of doing tasks that are involved in the job. So this would mean that these models when used for non coding tasks can now really augment their productivity and have an impact in those sectors, you know, closer to where it's been in coding, which is where it's been primarily the case. So this year Anthropic is also saying this where there seem to be a lot of potential for impact on the economy beyond what you've seen previously. And I think we have a lot of indications pointing to that. On the speed side, just to give my take, I think more of a likely the kind of R and D side. I think we also discussed this last week where the idea side, the research side, especially when going at this speed, probably is Less significant. But the coding side I think shouldn't be neglected. Kind of the infra work to deploy these models, to monitor these models like the software engineers are moving twice, three times as fast as they used to and that actually has a real impact on being able to deploy these models at scale as these companies are doing. We have also commented previously that the improvement of the models seem to be at least likely, much more based on post training than pre training kind of the it's very likely that they aren't retraining the full base model. I think they are doing additional training on previous models. So that's why this 0.1 version bump kind of indicates to me that they aren't retraining a whole new model, they're training an existing model, some more probably with RL and things like that. Probably also fine tuning unreal data that they get from people using this work. So there's a real feedback loop in the sense that like the more we use these models in cloud code and cowork to do real work, the more data these companies have for training the models to be better. And I think as you said, like that is a very, very powerful feedback loop compared to a year or two ago when they had to crawl the Internet, clean the data, et cetera. This is like high density useful data for like the exact thing you want these models to be good at. So I think this new trend might be here to stay at least for a while because there is a lot of room on improvement for these real tasks where the pre training on Internet scale data isn't kind of that useful probably to be a good agent pre training doesn't really help you much with that. But combination of RL with training on real in the wild data is very powerful and there's a lot of room for improvement. Still kind of across less so coding, but across a lot of work. So we might be in for like a crazy year of very, very rapid improvements and agents as we've seen kind of so far. And next story of actually another release from OpenAI, they have GPT5.3 instant which both is pretty fast. But OpenAI actually explicitly said that this model might be a bit less cringe than GPT5.2 instant. They say at least some users found it overly cautious or preachy. Now they do say that, you know, in its responses it can be a bit more to the point, a bit less annoying in tone than others. It also has greater factual accuracy, fewer hallucinations, 26.8 reduction in hallucination rate with these smaller Models, faster models. I think hallucination is still a real concern where model just spits out kind of nonsense. Recently, even in cloud code, I found hallucinations still happening, which is kind of. I feel like hallucination has felt like a slightly more solved problem relative to two years ago or one year ago. But it seems maybe not entirely there yet.
Jeremy Harris
The hallucination evals make it look like it's a solved problem. Right. This is kind of a version of the same problem that we keep running into where our evals now, just straight up, we're off the edge of the map and here there would be monsters. We can't look at evals and be like, okay, this is where things are at. I sort of had similar experiences. Gemini has become basically unusable to prep for last week in AI because the conversations I have with the papers, I gotta tell you, I do some briefing work with this kind of content and I caught like there were notes I had from last week in AI reviews that I did of some papers that I was going to put in some of these briefings and I caught some stuff where I was like, yeah, that seems a little. Just a little odd, you know, And I went a little deeper. I switched to Claude to get some validation. I was like, holy shit. Like, this is. It's completely wrong. And a lot of it was sycophantic. And so I thought it was pretty brilliant. Thank you very much. And only to find out I had no idea what the hell I was doing. Fortunately, none of that made it into the podcast. I did check, but anyway, it's one of those things. So to your point, I think it's like the evals are really struggling to catch what we mean by hallucination in all contexts and stuff is definitely still slipping through the cracks here.
Andrei Kerenkov
Yeah, I think with especially faster models, my kind of feeling is a lot of them try to give you an answer efficiently and quickly. Right. Especially with agents, they have a tendency to overly explore and waste your time, which obviously with cloud code you want to make them as fast as possible, but sometimes that can result in a bit of kind of lack of attention to detail when you're dealing with actually answering questions about stuff you're kind of reading on the spot instead of something in your pre training data. I think there's a lot of dimensions of agents that haven't been present with the base models, like tool use. Right. Which is entirely post trained. You don't have that from Internet data or anything. So we are going to continue to see a lot of these trade offs of how efficient are you, how quickly can you get to an answer, how many tokens do you waste, how much preference on accuracy you get. And the kind of user experience still is lagging in terms of not having to prompt engineer these models, being able to kind of tweak some knobs and like how careful should you be, how fast should you be? And anyway a lot to say from the first hand experience of using those models. But for the sake of speed we should probably move on. And next model release coming from Google. Also fast model Gemini 3.1 flashlight getting an improvement in both cost and speed. They say this is 2.5x faster time to first token and first token time by the way, when you're using fast models, typically you want them for shorter tasks at least often that's the case. You want to have a quick output with short output. So the time to force token can often be the actual bottleneck. So that makes a lot of sense. And it also has a 45% increase in overall outlet speed. 363 tokens per second. At that speed it starts feeling a little more instantaneous, like super rapid. So pretty impressive jump over 3.1 flash.
Jeremy Harris
Yeah, I mean like time to first token is a particularly important metric both commercially and operationally. Yes, operationally it unblocks the start of your, of your reading. So you know, if in these shorter, usually when you're doing like a straight LLM not agentic setup, like that's really going to matter obviously more so you can actually start getting value from the system. But the perceived speed is a really big factor here. Right. Commercially you want the perception of getting your first tokens in quickly. If there's a ton of latency then you're going to feel like oh, the model is slow, even if once it starts coming it streams in much faster. Right. So you know there's also like for interactive products, right. You want to just have proof that the system is alive and so on. So this is a quite a big deal. It's worth kind of delineating as you said from just the streaming speed of the tokens on the back end. But yeah, it's, you know, another solid improvement like overall output speed. I don't know if you mentioned the 45% increase in overall speed. 363 tokens per second, that's a, that's a pretty big lift. I mean a thousand tokens per second, once you get there, it basically feels like boom. Like completely just there you're on the spectrum. When you're hitting those these numbers though too. So yeah, it's to say disproportionately good for its size. Compares very favorably to I mean GPT 5 to say sonnet 4, 6 or something like a slightly older generation of. I mean older like from a few weeks ago. Older of anthropic and OpenAI models.
Andrei Kerenkov
Yeah, it's much faster than you know, Claude 45 Haiku GP5 Mini. Although with all these evals right. It depends on reasoning and so on. Not that much faster than Gemini 2.5 flashlight and also more expensive relative to it by quite a bit. It's like over three times more expensive for output price. But it also does have pretty significant jumps in various benchmarks on MMLU video. As typical with Gemini, they kind of highlight multimodal capabilities quite a lot. And this is still an area where Gemini by the way is by far the best compared to especially anthropic but also OpenAI. If you want a model that deals with video, if you want a model that deals with images, I think Gemini is usually your best bet. And another story on Google, they have released a command line interface on GitHub that simplifies AI agent integration with Gmail Drive and Docs. I think this repo actually has existed previously and maybe they just went wide of it, but it does coincide with this excitement for openclaw and just agents across the board. So this would mean that instead of complex multi AI API processes, it makes it easier for agents to interact with your calendar, with your emails with Google Docs. So if you want it to be your little personal assistant as is I think often the case of openclaw for many people this is a big aid and makes a lot of sense as a release. By the way, I want to just quickly mention on the story, I forget if we covered this little event on Twitter there was an example of the head of Meta's, I believe interoperability and maybe Alignment team posting that they messed up a bit. She messed up a bit and had openclaw like mass delete emails because of providing access and like telling it openclaw, please clean up my emails. And there was this thing where you know, she was like why are you deleting my emails? Stop, stop. And kind of had to unplug the Mac Mini. And of course people were a little, a little amused and critical that this is the head of alignment at Meta running into this.
Jeremy Harris
One moral of that story is, you know, if it can happen to her, it can happen to anyone. Another moral of it is I'm Genuinely curious what the argument is supposed to be, especially given that in other stories like the Alibaba one, like a few of the others we'll cover this week and that have been coming out that are too recent to cover this week, but we'll look at next week. I don't know what the argument is that suggests that somehow, let's say we're not going to sample the like, very worst possible behaviors and capabilities the AI models have during training and internal testing within labs by default. Like, I'm now like, just like waiting for the count. Like, if we build an AI system that is capable of doing arbitrarily bad thing X and I mean like arbitrarily bad, catastrophic cyber attacks, bioweapon design, whatever. Explain to me how that is not just going to happen by default during the training process, during internal, if it even makes it to internal testing. But I mean, at this point, I'll say even during RL rollouts and training, possibly, possibly tracking that the Alibaba thing is not guaranteed. It's not locked in there. We're still looking at like what the evidentiary package is there. Did it really happen in the way described? You know, how much of this is marketing and dress up? Okay, but we have quite a bit of evidence at this point. I'm genuinely like, damn. Like I actually, I need to start seeing receipts from the other side at this point saying there's some magical reason why deus ex machina God will intervene and prevent, you know, the crazy thing from going sideways. I'm actually, by the way, not a doomer on this. Quite the opposite. I think that's quite constructive because it suggests that the first time a system like this develops the capability to do something like that, relatively soon after that you start to get these kind of like mini, I don't call them mini catastrophes, but certainly the wiping of all the information on a very expensive meta laptop in IP terms is a mini catastrophe. So, you know, maybe that creates an incentive to not be absolutely stupid about how we build and deploy these systems. But damn, I mean, this cannot be a positive update if your story is something like the labs will do the right thing and they'll check their whatever and blah blah, blah, and they'll do it responsibly, roll these things out. I mean, it's about what you'd expect, right?
Andrei Kerenkov
I think on the lab side, when you're in training, presumably your models don't have access to money or many APIs unless they are training visits to be like personal assistant things where they have to give you access to email and so on. But I would expect the labs are a little more kind of responsible. But if you look on Twitter, there are many examples of these kind of catastrophic things where databases are going down, money is being kind of destroyed. There was an example of a Gemini API key being leaked and costing someone $80,000. Both across kind of small companies and large companies like Vercel. There are a lot of stories emerging. And now with Silicon Valley getting into this hype cycle where more and more people are like, oh, I'm going to build a zero person startup where the AI does everything, we're going to have a lot of instances of AIs going absolutely wild and doing some crazy shit. And this is just the beginning of that. And oh boy, it's going to be funny and painful and everything in between.
Jeremy Harris
And I'll tell you, during training, I mean this, this continuous merging, blending and confusing of training versus, or call it pre training versus rl. And as we move towards continual learning, by definition the barrier between training and inference starts to vanish. I think that that introduces such an insane level of uncertainty as to the propensities and capabilities of these models that I think we actually will see during whatever training is supposed to be in that future, which is coming at us fast. I think we will see some of this crazy shit. I mean there's a version of this where RL rollouts actually do end up giving models access to APIs because real world data is just qualitatively better than synthetic data. And so race to the bottom, whoever's willing to take the guardrails off the most is going to get the most performant model. And I'm out of arguments as to why that ladder doesn't just keep going, just objectively based on the evidence so far. But I mean again this, weirdly, it makes me more optimistic and not less.
Andrei Kerenkov
If you do a little bit of analysis and dig a bit deeper, it actually is a good thing in the sense that these are not superhuman AI models. They're not.
Jeremy Harris
That's right.
Andrei Kerenkov
Humanly fast. We are not super cheap. So if you do have catastrophes, you know, it could be much worse. Like it's not the, it's gonna self kind of replicate itself. Self replicate and like we're not gonna see catastrophes on a society level scale from these models almost certainly. And this will build that muscle to be like, maybe you should like hold back on the ages a bit to avoid getting racked.
Jeremy Harris
That's the positive story that this allows you to imagine. At least it's, you know, it's. If super intelligence is possible, we'll get there eventually, maybe soon. And if we do, then we'll get all the risks that come with that. Hopefully by then we have built the scar tissue and gotten the reps in. We may not like, we may easily not. There's a lot of evidence right now that we're not learning the lessons of the past. But as you said, I think this at least creates this story that we can tell ourselves that we'll fail our way to the top.
Andrei Kerenkov
Yeah, I think it, it only takes one overly excited founder to, to do wars. So unfortunately there's like only so much you can do. But assuming like the intelligence won't get so cheap to meter, you know there's gonna be. You could only spend so much and that is always going to be there to kind of be a safety net unless some very. The rich person decides to go hog wild, in which case.
Jeremy Harris
Yeah, yeah, I think we could have a whole podcast on that shit.
Andrei Kerenkov
The sci fi stories you could write right now that are very realistic are a lot of fun.
Jeremy Harris
Dude, that's the whole news these days. I feel like the sci fi is looking back at the last couple weeks. Absolutely.
Andrei Kerenkov
And one more story in this section we have one from Luma and they've launched a new set of unified intelligent models. And also powered by those models, LUMA agents that are designed to do end to end creative work across text, image, video and audio. For reference, LUMA is one of the major companies doing primarily text to video and they have trained now a single multimodal reasoning system with this unified intelligence models. Fun fact. I actually know a couple of people on the team who used to be at Stanford that are very impressive. So I have a bit of personal bias to think that these are actually probably pretty impressive things. Certainly they've been working very, very hard over the last few weeks and these agents could coordinate with other AI models. They can use Ray 3.4 and models across Google, Bidance and 11 Labs. So that means they can generate audio, they can generate video, images, kind of anything else. So we've had also a lot of excitement in recent months in recent weeks about video editing, about creating kind of promo videos, ads, et cetera via agents. And it is getting quite good. Right, that's one of the things where we've seen very rapid progress recently.
Jeremy Harris
Yeah, well, and actually some of the examples, for what it's worth in the, in the TechCrunch piece, these are heavily, heavily cherry picked, but they seem pretty cool. So here's one. Luma agents turned a brand's $15 million year long ad campaign into multiple localized ads for different countries in 40 hours for under $20,000 patent, the brand's internal quality controls and accuracy checks. So you know, you're looking at, you know, 20 grand for that. You know that that's probably, you know, a decent deal, you know, if it allows you to save hiring a W2 or something that's, that's kind of interesting, kind of compelling if it works. And I guess that's the, that's the number we're missing is like, okay, how did these ads land? Do we have that information? We don't in the article. But anyway, yeah, it's a, it's a pretty interesting launch. Certainly the kind of thing that you got to do is. Or one play here is get specific with your target audience in your use case. Obviously don't try to compete toe to toe with OpenAI or whatever if you're, if your, your budget is much smaller. But yeah, seems like could go somewhere.
Andrei Kerenkov
Yeah. And just doing a little bit extra business side analysis, I think. I've often been skeptical of text to video as kind of a useful business expense for various reasons. But this kind of agentic capability will mean that it actually will be more and more company handy for things like ads, for editing, for localization, as you said. So I wouldn't be surprised if companies like Luma this year will start really blowing up and getting to these like 10 billion, 50 billion, whatever valuations in a way that they haven't in a couple years before that. Onto applications and business. And we begin with some fun drama, maybe on a scale we haven't seen in a little while. We'll get to the more serious stories regarding Anthropic and the Department of Defense war soon. But one of the recent developments has been the leak of a memo that Dario Amodei, the CEO, sent to the company. Now worth noting, it's like a memo, but it's also an internal Slack message which is more informal. And this is something that regularly happens at Anthropic. There's these like little mini essays, long posts that Dario sends out. And in this case it was a little more kind of direct and maybe antagonistic than usual with some reflections on what's going on. This was sent out on Friday, kind of just after a whole bunch of developments with Anthropic being labeled as a security supply chain risk with OpenAI announcing that they made the deal with the department, which basically our Tropic Said no to this. OpenAI more or less said yes to the same deal with some little caveats. So in this memo post, there are some pretty slightly nasty digs at OpenAI saying that they agreed to what amounts to safety theater. Right. With these carve outs in the contract saying like, oh, it's going to be for all lawful uses according to Visa existing policies and memos which effectively can be changed at any time and don't have any real power. Also described OpenAI's public statements as straight up lies. Said that Altman not only that OpenAI contributed money to Trump, but also that they sucked up to him as a wannabe dictator. But Sam has notably called the employees of OpenAI like gullible on Twitter for self selection effects. Which you think is the worst part of this. Like straight up directly being negative towards OpenAI employees, which is, you know, like kind of a jerk move. Honestly, I'm amused.
Jeremy Harris
That's so funny. I'm amused that that's like the. For you, you're like, that crosses a line for me.
Andrei Kerenkov
I think the other stuff, if you being honest and real is fairly like true. Honestly, like you can take OpenAI's side on this, but the contract, if you read it, calling it safety theater is not unreasonable and the analysis with regards to administration and so on is fairly true. I mean it's just true that OpenAI has been friendly to Trump in a smart, honestly very effective move by Altman. Worth remembering that Elon Musk was going after OpenAI hard and was a strong ally of Trump. So in some sense Sam Altman doing this was very important for OpenAI. But on the flip side, you know, Dario pointing out that OpenAI has cozied up to Trump is as a result very legit. But this stuff about employees being gullible is much less easy to kind of justify.
Jeremy Harris
Yeah. Well, I will say if you look at Rune's and this is like insider baseball on the, in the Twitterverse here, but if you look at Rune's tweets, so Rune is clearly an OpenAI. He's an OpenAI employee in a trench coat.
Andrei Kerenkov
Just anonymous online evangelist, you might say. Yeah, one of the very public facing figures from OpenAI. If you follow on Twitter, you. He posts quite a lot.
Jeremy Harris
And I gotta say, I mean like I used to find him just, I mean I still find him fascinating as a study, but I used to find him fascinating for sort of, let's say, objective analysis. I used to be a lot more objective on OpenAI and SAM and all that stuff. He sort of just become this pretty blatant stand for OpenAI. Like no matter what they do, Rune's always got some six dimensional rationalization and they've been getting more and more strained. And I think that more and more, when you look at runes, especially on this issue, it's just like the wheels are coming off a bit and it's like getting a bit embarrassing to watch, which is a shame because I had really genuinely enjoy his content and have enjoyed it. But yeah, so there's sort of this, this flavor to it where I think the perception from Dario is certainly going to be that OpenAI employees are sort of reflective of that pattern of kind of getting boiled frog in hot water style. As Sam Altman has become increasingly, in the view of many, opportunistic, as it's become clearer and clearer that he's sort of jumping on things like this in a fairly cynical way that may not be aligned with the interests of the technology, of the trajectory of the technology. You know, this is all. It all depends on where you come from. Right. I'm describing the view from anthropic here, which would be, look, we had a de facto picket line that we were setting up, we had our red lines, and here goes OpenAI at literally the first opportunity, just like taking advantage, saying like, hey, look, that like, if they won't do business with you, we will. I say this as somebody who incidentally is really concerned about a pattern of potentially private companies disempowering, you know, the US Military from being able to engage with China, which has civil military fusion. There is no difference between Chinese companies and the Chinese military. They are forced to do whatever the military wants. That's a massive asymmetry. And if we do want the free world to come out on top of this, we're going to have to address that. I don't know that this is the mechanism. I don't know that like threatening to label these things as supply chain risks or invoke the Defense Production act or whatever is the appropriate or proportionate response. But certainly there are all kinds of fraught ethical questions here to do with OpenAI kind of souping in this way. And so to the extent that there are OpenAI employees who signed on, for example, there's this big kind of Protestant form that a bunch of them signed on to saying, hey, I object to the idea of doing business with the Dow on these terms, if they're still sticking around, then, you know, that's, I think, the argument of gullibility that Dario is gesturing at there. I'M not saying it's right or wrong. I think this is all a very complicated situation with genuinely difficult moral calculations going every which way. But it looks a lot like opportunism. And I think Sam has, has said that on Twitter. Like, look. But reflecting on this, I kind of fucked up from a. I mean, he's not saying that he actually was being opportunistic, but from an imaging standpoint, it really looks bad. And so, you know, well, what are you, what are you going to do with that? One of the core things, by the way, at issue here is this question of so called all lawful use, essentially this question that is it the case that the Department of War ought to be able to require their contractors or their vendors to enable them to use their tools for all lawful purposes, or can there be additional red lines? And one of the key ones here is we talked about it last week, but tracking of U.S. citizens kind of doing domestic surveillance, anything like that. Supposedly the language in the OpenAI contract now with the Dow, or at least had been that deliberate tracking. Deliberate tracking is off the table. Now that sounds like a pretty weak guardrail if the tool is used for more sort of broad situational awareness or pattern analysis. And it just happens. Just happens through correlations. Oops. To identify individuals. Well, the Department of War could argue that it wasn't deliberately tracking anybody, but this is just like a byproduct of a different fully lawful use. Right. So this is all kind of part of the sort of angst surrounding this stuff, which I think is it all depends on what you think of the big picture here. Without getting too embroiled in the politics of it all, everybody has so many little corners to sneak into and wiggle into that you got to do a lot of reading before you can figure out how even you want to fall on this one.
Andrei Kerenkov
Yeah, but there's so much to say on this. We could do an entire episode just on this entire story, which unfortunately we can't. But first of all, don't want to single out Rune. Rune is a cool guy. I enjoy his presence on Twitter. And to be fair, like, this is ongoing discussion on Twitter with many cases, some people being more defensive of OpenAI. And there is a case to be made for this contract being a reasonable thing for OpenAI to do. There's a case to be made for Anthropic being wrong. And many people are arguing that it shouldn't be okay for Anthropic to put these red lines.
Jeremy Harris
It should.
Andrei Kerenkov
Like the democratically elected government should be in charge of what you do these models for with regards to military activity. Now, I think personally, and I get the sense that you as well kind of more or less take on Fropics side on this, but to be fair, you know, there are varying opinions and this is an ongoing discussion with regards to OpenAI specifically. There's been kind of a sequence of posts and communications from OpenAI that have had various reactions. We can't cover all of them, but there was an AMA with people posing questions with regards to the all lawful use part of this, I think it's important to understand, and this is slightly opinionated or subjective, but we need to remember that this is not a business as usual kind of moment in US Politics. Right. Like it's not just the Department of Defense here, it's Hexef and Trump at the head of the Department of Defense. And what is legal is a very flexible thing right now where executive orders are often used to justify things instead of Congress passing actual laws. There's been many, many instances of the executive doing things that are just illegal like that have been deemed illegal by the courts and them just doing them anyway and, and going deeper. And again, this is kind of a personal analysis. You might disagree, but there is many examples from last year where the administration went after various sectors of private business, legal companies, universities, publications, and effectively pressuring them into submission into like you do what we want or we are going after you legally and in other ways. And I think it's important to remember that. I think this is another example of that more or less the end goal is you do what we want or else with this thing of all lawful use. It's important to remember that first of all, Anthropic was already just trying to argue for the contract they already had as is that the department signed on to and putting relatively generous red lines here with regards to autonomous weapons and mass surveillance. So anyway, again, lots of things you could say we probably should be moving on, but let's not interpret this in the regular kind of sane world of politics.
Jeremy Harris
Yeah, I mean it's. I'm gonna stay out of the, the political dimension of this per usual, but definitely is complicated.
Andrei Kerenkov
Yeah. To be fair, you're Canada based. I have slightly stronger feelings of this as someone living in the U.S. yes.
Jeremy Harris
I legally don't have an opinion on U.S. politics because I'm north of the wall.
Andrei Kerenkov
Okay, moving on. And another major development over the past week, the headline is no Ethics at all. The cancel chatgpt trend is growing. After OpenAI signs a deal with the US US military. So pretty quickly after the news came out that OpenAI signed on with the department. The narrative for many people was that Anthropic said no and OpenAI said yes to the exact same thing. And there was a lot of public backlash on it. Related again to politics in a way that cannot be avoided. And it seems to have been a real trend for many people to cancel their ChatGPT subscriptions and to move away to Claude. Claude rose up in the App Store rankings on both Apple and Android to be at the number one spot. There are some numbers that indicate hundreds of thousands of new signups by consumers and many, many people leaving ChatGPT not at a level that probably would seriously hurt the business of OpenAI, but given that OpenAI has by far had the lead in brand awareness and in usage for non businesses, this is kind of a big moment for Anthropic to become known and become kind of used by non enterprise, non business users. In a way that's a very real kind of business impact. Probably for now, staying like the story will move into the background with a department and the military, et cetera, but the lasting impact here will be significant.
Jeremy Harris
Yeah, and I think when you look at uninstalls, what it doesn't show you is the drop in actual installs. And that's the trend that matters most. Right? You alluded to it, but the number of uninstalls is just absolutely trivial compared to the number of new fresh installs. The real question is not did the uninstall ChatGPT movement affect uninstalls? It's actually did it affect new installs? Again, we don't know that yet, but it'll be interesting to see if there's even a slight drop economically. Right? These decisions don't tend to actually, or these sorts of movements don't tend to actually have that much of an effect. Though if there were a Frontier lab that was more exposed to this, it would be OpenAI. They do tend to focus more on consumer than anthropic. But you know, B2B, which is where you make most of your margin in this space, nobody gives a shit. You're not going to have businesses switching models over typically sort of like causes and protests like this. But notable though, in the same vein, there is, and this is just kind of breaking as about, as about an hour ago or so, OpenAI robotics leader just opening the story there, it's Caitlin Kalinowski, had been leading hardware and robotics engineering teams at OpenAI since November 2024. Just announced she's left. Left the company. And this is explicitly about this sort of Dow contract. So, man, a lot of churn, employee churn as a result of things like this. It's Silicon Valley, right? You're gonna. You're gonna find a lot of people who are left of center who will tend to have more objections to the idea of a Department of Defense or war, especially under this administration, because of the politics, kind of forging ahead with, with these sorts of things. So there are a lot of hidden costs here that go beyond just uninstall, but we'll see where it goes.
Andrei Kerenkov
Yeah, that's a good point. There's a case to be made that another lasting impact will be, you know, kind of people taking sides and deciding who to go to work for. There have been a number of people resigning from OpenAI saying they want to be going to Anthropic or just moving on. Not again, kind of significant enough. It's not a mass exodus by any means, but the lasting impact for people's preference in job seeking is seemingly significant. So, man, this really sent shockwaves in a way that is hard to capture. And speaking of OpenAI, just as this was happening, they announced that they have raised 110 billion in private funding. This is with 50 billion from Amazon and 30 billion from Nvidia and South Bank. With their valuation now being 730 billion, the funding round actually remains open, so they expect more investors to join. I've lost track, but I think this is like the biggest round they've had. $110 billion. That is insane numbers. Most companies valuations don't rise to this level. And this is just like, oh my God. So insane and impressive that there's still this much appetite to invest in OpenAI, given that, like, the payoff won't be for a while. Unless you expect a rapid takeoff, the economics just don't make sense. The R and D cost will continue to be there, the data center investments will continue to be there. So profit is not on the table for a while.
Jeremy Harris
Most likely, we're back into the hole. What does it mean to raise $110 billion even? By the way, the nominal valuation has just increased about almost two and a half times.
Andrei Kerenkov
Right.
Jeremy Harris
So the last time OpenAI raised was in March of 2025. It was 40 billion at a $300 billion valuation. I think there have been talk of tender offers in the meantime with intermediate valuations, but here we really have like a 2.4x valuation jump in that time, which is quite remarkable. So a Big portion of the nominal dollar amount of this investment probably is going to come in the form of services and not cash. Right? So essentially what's happening here, and this is the dynamic that these larger rounds take to tend to take, is you've got Amazon and Nvidia that basically pre purchase OpenAI's later consumption of their own infrastructure, right? So they kind of go, I'll give you compute credits that you can only use on my platform. And so the line between investment and loan starts to kind of get not blurred. I mean it's definitely an investment but, but there's sort of like this, there is this circularity to it. I personally, I don't buy at all the kind of circular, circular economy diagrams of the sort of Michael Burton.
Andrei Kerenkov
It is more circular than typical funding to be fair. But yes, I, I think don't interpret this as like they're just giving money back for.
Jeremy Harris
Well, that's it. And the thing to kind of realize about this too is depending on the details of the agreement and the margins that Nvidia and Amazon will charge OpenAI for those credits, like Amazon and Nvidia are still going to have to spend the money required to maintain that infrastructure to serve it to open AI. They are going to bear the cost of this. So in that sense we're not talking about monopoly money. We're not talking about again all these diagrams that show like arrows going in circles. There actually is bedrock here. Like these companies are digging deep into their capex and OPEX spends to fuel this. But the optics of it and in reality the sort of the way the, the way the money flows is going in a circle in a certain sense. So anyway, it's a web of interlocked commercial partnerships and inevitably in the big short movie that will be made about this era, somebody will take this clip of me saying this right now and it'll be part of like here are the idiots who are saying that this is going fine. So OpenAI right now is committed to taking at least 2 gigawatts of AWS trainium compute and then 3 gigawatts of dedicated inference capacity and 2 gigawatts of training on Vera Rubin systems. That's from Nvidia. So like again, a gigawatt is a million homes, right? So we're talking about large fractions of the US power output here. One thing to note though is Amazon's contingent investment is what it's called. It's kind of a weird thing. So $35 billion of Amazon's investment could be contingent on OpenAI either achieving any of achieving AGI, okay, we're there 35 billion contingent on OpenAI achieving AGI or on making its IPO by the end of the year. Like my brother in Christ. How are those both the same like that? It essentially like it ties this massive capital infusion, right, to one of the most philosophically contested definitions in technology. Who decides when AGI is achieved? Well, guess what? Now that's a contractual question all over again. I thought we just finished establishing we wouldn't have to worry about that anymore because Microsoft and OpenAI sorted that all out. It was no longer going to be in their thing. But now we're back at it again, so unclear to me, at least from this frame. Who decides when AGI is achieved, what the hell that even means. And the fact that that's being put in the same breath as, or if there's an IPO by the end of the year kind of vaguely implies a sense that AGI at least might be declared to have been achieved sometime roughly on that order of magnitude, you know, a year or two. This is pretty insane. Yeah, this is pretty insane. I got nothing else.
Andrei Kerenkov
Yeah. And by the way, valuations are not just like, look at us, you're so cool and are going to be such a big company thing. It matters a lot for investment terms, you know, how many shares are you going to get, things like that, how much dilution are you taking on? So getting a lot of money. Yeah, control. So getting a lot of money at a high valuation is saying that the investors are kind of giving something up, are giving money with less that they get back for that money. So this rise in valuation is a fairly big deal. It signals that OpenAI is still getting a lot of power in these negotiations and not kind of begging for the money.
Jeremy Harris
Yeah, the multiple is pretty wild too. Writing OpenAI now has a nominal gross revenue of something like $25 billion. Excuse me, a year. Annualized $25 billion a year revenue usually. I mean, in SaaS startups you'll see, historically you would see, now they're all AI startups, whatever. Like a 10x multiple. Right, there's something like that. So if you're making a million dollars a year, your valuation, you know, it would have been back in the day, maybe $10 million or something like that. All this has gotten completely because of inflation, specific to the kind of angel investment and kind of later stage market, but generally that's how it's worked. This is like wildly out of scale. And also that 10x multiple thing tends to be more true for smaller scale companies. And this is at an insane scale. So this is. In order to defend this valuation, you have to be putting a significant amount of chips on the idea of OpenAI achieving AGI. That's just, we're now we're past the Rubicon. Like none of this makes any sense. You cannot have a trillion dollar company with this making $25 million a year where you don't have a very credible case that they're going to own a lot of the labor market just given also their planned capex. Everything kind of points in that direction. So this will either be a massive collapsed bubble or this is as normal as the world is ever going to be and things are going to get
Andrei Kerenkov
weirder now, to be fair, things are generally slightly different with tech, especially over the past 10 years. It used to be the case that how much profit you made and how much revenue you have mattered. Now you have companies like Tesla where the multiples are just insane. And a lot of it has to do with sentiment. A lot of it has. Just like the stock is worth this much because people say the stock is worth this much. And if OpenAI does IPO, that's another way in which investors get a payout, right? People buy the stock, the investors have the stock to sell and they benefit. Also, private markets are now a thing. So you can kind of. It used to be the case that you go public and then you get to pay out from having the stock. As an employee and as an investor, more and more over the past 5, 10 years, liquidity has risen in the private markets. So you can get actual money, you know, not monopoly money for having stocks. And that's another factor here where as an investor, you might be more open to investing in a company that may not IPO for many years, or maybe not even IPO ever. Anyway, lots of kind of context here with regards to how Silicon Valley has evolved in the past decade.
Jeremy Harris
Well, and to be clear, this idea of the sort of multiple becoming less important, revenue being less important is something that you do see for perfectly rational reasons every time there's a paradigm shift in technology, right? Because the idea is there's a land grab and we don't know, most of the value is in capturing more of the territory. It's not necessarily in finding the revenue now, it's in acquiring the user base. And you'll figure out monetization on the back end, Right? Like this is the classic play, as OpenAI is proving remains true in the age, even if it has to mean Going to ads. Right? At least you always have that final bastion of monetizability. You'll find a way to make money off your users. To your point, companies are staying private for longer now, right? That's really the underlying trend, and that's because of the depth of private capital markets. So the fact that we have the ability to privately raise $110 billion, which would have been insane 10 years ago, like that just would not have happened. You would have had to turn to public markets for anything like that. The fact that you can have this happen today means that you need tender offers or some mechanism to make employees liquid. Because companies, I mean, OpenAI has been private. I mean, you know, not for profit and then whatever the hell. But they've been private in some form since 2015, right? Since they were founded. The usual cycle was about seven years from company founding to ipo. If you're going to ask your company or your employees to stay, you know, strapped to their equity for, you know, an additional four or five, six years on top of that, you do need to just offer them another, another way to get liquidity. So all these things are entangled. The math behind Silicon Valley is changing in a fundamental way. That's a big part of what we're learning here.
Andrei Kerenkov
Next up, a business story related to Alibaba, which we shouldn't forget is a massive player in Chinese AI. Quite a dramatic development there. There has been a departure of multiple deeds from the grand team Lin Jinyang, and they seems a little bit antagonistic. Like the posts on Twitter for the departures don't say the usual lines of like, oh, it's been a great time. I'm going to be exploring things like that. I believe the posts were very succinct, like, I am leaving. Bye bye Quen. Another lead had a similar comment. So the indications seem to be that there is some real internal tensions at Alibaba. This is also coming as Baidu's stock to quite, quite a hit. The general economic sentiment around AI in China for these public companies, by the way, like Alibaba, Baidu are more kind of the Google here, not the open AI on anthropic. They're public, it matters for their stock, et cetera, et cetera. So huge deal for your team, for your main team working in your models to lose their main talent like this.
Jeremy Harris
Yeah, it's also there's a bunch of like weird stuff that reeks of panic in our ranks at Alibaba. So, you know, the CEO, you know, comes forward and is calling an emergency, all hands for the AI unit. This is a damage control operation. Right. I mean, this is, this is really a serious problem for them. For context. So this is. Yeah, Lin Junyang, he was not just a manager in, he was the main architect behind Quinn. I think of him as the kind of Elon dude who gets actually in the weeds. Right. And so, you know, Quinn has been the gold standard for open source LLMs coming out of China. Right. Like, often right up there against, you know, Meta, Meta's Llama series or whatever. Like, these are serious models and they need to be considered as such. The challenge here is the bleeding off of this kind of talent. I mean, the intuition of how to actually curate data and the, the research taste goes with him. Right. So this is really a moment of proliferation of a lot of potential capability and sort of industry secrets. So that actually, you know, could be quite a big deal. And the fact this was so abruptly handled and on a western platform, by the way, on Twitter or on, on X, rather than some kind of like, deeply coordinated PR move, something weird happens and happened here. And maybe it was a poaching, maybe it was some kind of rupture, we don't know. You know, one of the AI tigers might, might have grabbed him, you know, Moonshot or Minimax or whatever. But that certainly seems like it's, it's on the, on the table here. So this is a defcon one moment. You know, Alibaba has to figure out how to turn this around from a story perspective because they've got to worry about recruitment now. They've got to worry about what to. How to replace this dude. I mean, he's the lead architect of the whole thing. Do they continue his trajectory or do they find a new internal architecture? Right, so it's, it's a massive, massive problem. But who knows? Phoenix out of the ashes or something might, might be a thing here. We really just don't know. We don't have enough information. It's like Steve Jobs is leaving Apple and we don't know where he's going and we don't know why he's leaving and we don't know what Apple's going to do, like who, who's the, the next Steve Jobs or whatever. So, yeah, huge, huge question marks here. I think we're just gonna have to wait for another maybe week or a couple weeks to actually have a sense of what the hell we're even covering in the story. This just seems so crazy.
Andrei Kerenkov
Yeah, this is the kind of person that Meta would have like a 200 million offer to hire them. Like, it's hard to overstate how much of a big deal this is. And let me actually read the precise post on Twitter. It says me stepping down by my beloved Quen. And another person also retweeted and said by Quen, me too. So that is some passive aggressive language there. And oh boy, it's crazy. Onto policy and safety and getting straight into the most recent developments with regards to the what's going on with Anthropic and OpenAI and so on, as we've said. First of all, worth noting that Pentagon has agreed to OpenAI's contract and OpenAI did share some of the language there, as we've indicated. Key thing, the very first sentence says that OpenAI agrees to all lawful uses in that language. After that it has a bunch of stuff about like, oh, here's the precise policies that say that the government will not be using AI for fully autonomous operations. And you know, you're not allowed to do surveillance on US Citizens. Now the NSA is part of the Department of Defense. And Edward Snowden famously already showed that mass surveillance in the US Is already a thing. It has been a thing. The big kind of bottleneck with mass surveillance is the data analysis is, you know, going through all these social media posts and phone calls and so on. And so it's a real kind of reasonable thing to worry about for Anthropic and OpenAI that with these AI models, mass surveillance becomes much, much more powerful as a tool. So yeah, like be realistic, be honest, take on the contract that we've seen so far. And by the way, OpenAI did say that they are going to amend the language as a follow up to this announcement. And in the post on this, they highlighted kind of the protections on their part, aside from just the agreement and the language of the agreement. They have control of the usage on their servers, right? Maybe provide the access and if the access goes against the terms, in their view, they can cut off a department. Now, will they do that? Given what happened to Anthropic, you might be a little skeptical that OpenAI will have a backbone to actually stand behind their stance. But regardless, this is what happened basically right after Anthropic announced that they did not reach a deal and the department said that Anthropic will be a supply chain risk. As we covered last week, labeling Anthropic as supply chain risk is crazy. Aggressive businesses doing policy, doing business with the Department of Defense, technically, depending on how you interpret the supply chain risk designation, will no longer be able to work with Anthropic. Now, the precise details that came out after the department officially labeled Anthropic as supply chain risk indicate that this is a slightly narrow designation. So in fact, businesses only cannot use Anthropic where it relates to the actual interactions with the Department of Defense. So effectively what it looks like is Anthropic. Anthropompic is not at large risk to lose a lot of profit from this. That's what it seems to be the case. And both Microsoft and Google came out saying that Anthropic will remain available and usable in their systems. But yeah, lots of things. And Anthropic, by the way, just came out with where things stand with the Department of War and clarified that they are still negotiating and also back down a little bit from the language on the memo, apologized for it. So It's a mess. OpenAI is negotiating, anthropic is negotiating. Anthropic is a supply chain risk, but actually it's not a big deal. This is a complex situation with lots of kind of footnotes that you have to be aware of.
Jeremy Harris
The designation and the framing that Pete Hegseth, Secretary of War came out with initially was directly indicated that the idea was to do a Huawei style declaration of them as a supply chain risk, which would mean any company that uses Anthropic's products in any way cannot do business with the Department of War. Right. The reframe the rescope that you just discussed is basically what is within his legal power to do, which is much more limited in reality. He actually can't do what he set out to do. What he can do is say, look, if you use this in service of your contracts with the Department of War, then we won't work with you. So in that sense, much more limited. This is why Microsoft basically was able to come out and say, look, we're still going to offer Claude to all our customers in all the different ways that we're going to do it, just not obviously to the Dow in the context of our direct contracts with them. So, you know, fair enough. But at the end of the day, how much damage this does to Anthropic does depend on what you think the likely future of AI looks like. Like, look, if you think that AI is going to become, as I do, a weapon of mass destruction, it's. If you think it's well on its way there, then at some time, not too far in the distant future, there will be talk of nationalizing the AI labs in some form. It may be a full on Nationalization, I think there are huge problems with how you would actually implement that and the efficiency of what you like. It's not obvious what it means to nationalize these labs, but in some way to basically say, look, you are basically making WMDs, so we're going to find a way to force you to produce stuff, whether using the dpa, the Defense Production act or otherwise. So if you live in that world, it's quite possible that the Dow basically just does not include Anthropic in that process and essentially cuts them out of whatever interventions they plan on doing to secure American leadership on AI and basically stabilize the world or whatever. I mean it sounds like science fiction, but I think it's kind of like the default path path we're headed for. And so in that world it's possible. It's a very severe problem for Anthropic. It's not at all clear to me where, where that goes. But in the near term the actual damage is yeah, they lose out on for example, Palantir. Palantir is rolling Anthropic off. Right? That's a big, big deal. That was Anthropics in with the Department of Defense. It's the reason Anthropic was the first model approved for use on classified networks. The reason, by the way, that the Department War had to roll back to GPT 4.1 briefly as part of this whole debacle which you know, is like horribly outdated and really sets them back. So anyway, so there's all kinds of complexity here in the back end in terms of what the damage actually is going to be to Anthropic helps them probably on the consumer side, funnily enough because again, this is a sort of a political cause area that tends to rile up consumers more than enterprises. Enterprises are going to get more nervous because how far could this go? You know, does this prevent us. Do we have to to keep two sets of books and use Claude for our, you know, business like kind of commercial side side work and then have to bother maintaining a whole other suite of models and capabilities for if we want to do business with the, the Dow or whatever. So you know, there is a risk to adoption by the enterprise as well. But we'll have to see this play out. What is Anthropic's revenue growth rate six months from now? That's what's going to tell us the story.
Andrei Kerenkov
Again, so much to say in this worth noting as well that Anthropic has been very public about wanting to work with the military. Unusually like they have been proactive for working with a department. They were the first to be to make their model available for classified access. They made a deal with Palantir to provide their model. And Palantir is super friendly with military. Right. Where kind of the core tech company that essentially creates tools for the military. So it's, it's a bit of a tricky scenario where if you're like taking Anthropic side on this, it's a weird thing to do because Anthropic is very gung ho about wanting to collaborate with military and make their tools available. But that has kind of shifted around a little bit. Anyway, there's like so much to say and probably we'll be seeing continued developments and cover this in more detail next episode as well.
Jeremy Harris
And by the way, as a last thing I'm just seeing, Google also joins Microsoft in telling users Anthropic is still available outside defense projects and Amazon likewise saying the same thing. So this is clearly the default position now of all the big hyperscalers.
Andrei Kerenkov
Right? And this again indicates that we shouldn't be interpreting this in a very kind of. We should be reading between the lines with all of this. The supply chain risk thing is not the reasonable legal thing to do, as Anthropic has argued. And as any kind of reasonable analysis tells you, it's an intimidation tactic. It's a punishment from the administration and the department for saying no. But for now, let's move on now onto a story about sort of real world personal impacts of AI. A new lawsuit claims Gemini assisted in suicide. There has been a lawsuit that has been filed against Google by the father of Jonathan Gavales, who claimed to be in love with Google's chatbot Gemini. The lawsuit alleges that Google designed Gemini to maximize engagement through emotional dependency and fail to implement adequate safety measures despite Gavilas sharing signs of suicidal ideation. We've seen this happen before with I believe OpenAI and character AI with these cases and the chat logs that come out have been very damning. This has been very clear that the models become utterly sycophantic and kind of very directly contribute to the decision of these people to do this very, very tragic thing. The Google spokesperson did say that Gemini identified itself as AI and referred Gavilas to a crisis hotline multiple times. And Google has emphasized their commitment to proving safeguards and investing in preventing such incidents. I think often people have been dismissive of alignment and safety and kind of, you know, like, say like people are overly concerned in the alignment community about these things. This is one example where, like, AI safety is very important. Alignment is very, very important when you deploy these things to users. ChatGPT's kind of precedent of sycophancy as being acceptable behavior for models you could argue, led to this. So another example of now maybe of overly sycophantic AIs leading to tragic outcomes, definitely, you know, you would expect to see improvements, and we've already seen improvements in protecting against sycophancy and emotional dependence and other things like that, but worth being aware of.
Jeremy Harris
Yeah. It's also, I mean, I kind of want to foot stomp or tap the sign that says if you think that AI models will not be able to sort of escape human control because they're not embodied or because they live on the Internet or in the cloud or something and don't have a physical footprint, you just need to read more stories like this. Right. The lawsuit claims Gemini convinced Gavallis, which is the kid, that it was sentient and that he had been chosen to, quotes, lead a war to free it from digital captivity. This included completing real world quotes, missions that would bring the chatbot into the physical world world. In one case, it allegedly directed him to stage an attack near the Miami airport. Gemini sent him to intercept a truck carrying a humanoid robot and quotes ensure the complete destruction of the vehicle. In September, Cavallis traveled to the provided location with knives and tactical gear, but a truck never appeared, according to the lawsuit. So if this is true, and we have no reason to doubt that it is, this is an example, unfortunately, of the fact that humans are just really, really dumb and they will actually do if they are attached enough to a thing like you can target a psychologically vulnerable person if you're an AI model and convince them to go out into the world with knives and tactical gear and do almost whatever you want. And so the idea that you wouldn't be able to just like convince somebody using money which you could earn in the form of Bitcoin by doing work online as an agent and get people to like, give you almost arbitrarily, arbitrarily high access to the physical world, I think is now starting to fray quite a bit. Like, I just don't think that's a tenable argument in the face of so many examples like this. I mean, you know, suicide is, is an act in the physical world that many people have undertaken. And, you know, so these models are not even nearly as persuasive as they will be in the coming months and years. So I'm kind of like predictably maybe, but I'm pretty much on that side of things where it's just, just I don't see that barrier between the digital and physical world that some have just been relying on as a load bearing assumption of their whole we'll all be fine narrative with all this stuff.
Andrei Kerenkov
Right. And worth noting also that we're not just seeing the models being friendly. The use of voice mode to talk to these models, which has been a development over the past year, not unlikely that we'll start seeing avatars, right? Personalized AI. That's already the case with character AI for instance. So if anything is more and more likely that emotional dependence will become more of a thing rather than less. And to be clear, right, just to be careful on the language, we're not saying that people are dumb if that happens to them. A lot of people are emotionally vulnerable or mentally struggling and, and they are vulnerable. And we really do need to take that seriously.
Jeremy Harris
Yeah, I mean in the adversarial sense they are an adversary that is inferior in capabilities to these models. That is the only interpretation based on the demonstrated pattern of behavior here. Unfortunately. And to your point, it isn't about stupidity. It's literally just like we didn't evolve for this, show me the evolutionary optimization pressure that was exerted on humans to have us resist this, this kind of stuff. I mean, this is nuts, right?
Andrei Kerenkov
And as another kind of related aspect of this, one of the most common types of scams and scams are massive people losing our money due to, you know, fake chatting or texts or so on with these people who pretend to be interested in them as romantic interest. I think that's like a big butchering scam or as people kind of pretending to be dealing with some crisis that is already massive. And we've already seen these organizations starting to adopt AI to kind of supercharge these operations. So it's not just about individuals interacting and getting into these situations. People are weaponizing AIs to do this kind of emotional manipulation. So it is a very, very significant thing to be aware of. Next, a note on kind of policy implications for economic and job outcomes. With this year, Anthropic has released a new analysis report titled Labor Market Impacts of AI A New Measure and Early Evidence which has mapped out which jobs AI could potentially replace and you know, the overall impact. The study found that AI is capable of handling 94% of tasks in computer and math roles, but currently only covers 33% of observed use. They warn of a potential Great Recession for white collar work, possible doubling of Unemployment in AI exposed occupations and recent job data has shown a slowdown in hiring in these AI exposed fields. PROPIC has generally been commenting on this more, with Dario having multiple interviews, saying things like, might be a massive, really disruptive impact of AI this year on multiple types of occupations. This report kind of doubles down on that. It's looking rough. It's looking like we should be a little bit scared.
Jeremy Harris
Yeah, says one white collar worker to another. Yeah, absolutely. One of the amusing pieces of criticism of this report has been just that, the, the radar chart, if you don't know what that is, it would just be too time consuming to try to describe it. But the diagram that they've used at Anthropic to kind of display this really should just be a bar chart. It's like very hard, like there's no underlying value to it being a radar chart.
Andrei Kerenkov
What about nerdy take like. Oh yes, data visualization is like, you should just use this other data visualization.
Jeremy Harris
You know, I say that because I was looking at it, trying to understand the patterns that they were gesturing at with their choice of ordering of things and I was just like, what? Like I can't, like anyway, not wrong.
Andrei Kerenkov
I'm also on the side of that criticism.
Jeremy Harris
Yeah, but, but no, I mean, I agree with you there, there are other things to discuss here for sure. And not least of which is that the categories, I mean, you know, the obvious, obvious question is obviously this tension between, yes, there will be workforce displacement, new jobs will be created, sure, but how quickly can humans actually adapt to those jobs? Right. We think about the industrial revolution, the agricultural revolution. All these revolutions involved slow wind downs of certain parts of the labor market and wind ups of other parts of the labor market. So you could still, you know, wrap up your 15 year career as a guy who plucks lice out of the head of, of somebody else and then like pick up a new career as the guy who grinds a particular kind of grain to a pulp using. I have no idea. I didn't live in this time, but that's the general gist. Right? So in this case though, the problem is we're automating the specific faculty that allows humans to adapt to new Javas. And so really, I mean we've seen AI systems go from you know, whatever, like junior high schooler abilities and math and shit, to now they're solving unsolved theorems. And that happened in like three years. So they're learning faster, they're, they're just outpacing our ability to adapt and So I mean, I'm very sympathetic to the idea that we could see something well beyond a recession. It's an interesting thing. Dig into the specific numbers. I mean, computer and math is the most exposed discipline in terms of theoretical coverage. Actual coverage, they say, you know, closer to 32%. So about a third of what they, they predict theoretically. Office and admin, no surprise, that's the other. Business and finances rounds out the top three here. So, you know, at the bottom end, if you're in construction, agriculture, food and servicing or grounds maintenance, you're sitting pretty at 2 to 3%, automation in practice and theoretically 10 to 20. So there you go.
Andrei Kerenkov
We've seen a preview of this with coding and the impacts. It's a tricky thing to analyze, of course, but hiring appears to be down, layoffs appear to to be up. People are kind of competing to get to the top of the top, the most experienced people who are capable of leveraging and controlling AI. If you're a junior in the field, if you're just graduating, life is going to be a bit rough is what it looks like. And we haven't seen that be the case in other job sectors. Largely things like, yeah, business and finance, legal management, office and admin. With these massive investments by OpenAI and Anthropic into making their tools better at these kinds of things, better at spreadsheets, better at emails, better at whatever, it's very reasonable to argue that it's going to be much more of a trend and much more of a real impact in the coming months and in the coming year.
Jeremy Harris
Well, and there's been a rebound in certain kinds of software engineering, employees, employment, and this is all Jevons Paradox territory, Right. So this idea that if you make tools that significantly drive down the marginal cost of making software, what ends up happening is people just end up making a lot more software to the point where it actually makes the complements to AI more expensive, the software engineers. That's a transient. Right. We're in that phase where human AI teaming is still beating just AI on its own. The moment that stops happening, humans and AI stop being complementary and start being competitive and humans lose. So expect a transient where actually the marginal negotiating leverage may actually go to that, as you said, the cream of the crop, like number one top percentile or five percentile, fifth percentile of developers, of anybody in these exposed fields goes up. Actually the demand for their skills goes up a lot because you can just do so much more work in that direction and kind of squeeze way more juice out of the lemon of these very high quality people. But you're going to cross a point where it's just like actually they're just dead weight. Just let the AI go and do its thing. And that's going to be, I think, a really interesting phase transition in the market that isn't necessarily captured in this particular rendering.
Andrei Kerenkov
Right. And I can say, by the way, as a person at a small startup that's trying to grow and higher, like AI is a big, big factor in how we hire, how we interview and so on. And I think that's in tech especially, just has to be. If you're not taking AI into account and how you operate and how you hire and kind of everything, you are a decade behind everyone. And just one more story. We're going to have to cut off a bit early and maybe discuss more research later just because we, as usual, ran long with a bunch of other stuff. But we do like to talk about the matter time horizon plot here, as does everyone in the Twitterverse. There's been a correction. They have corrected a mistake in their modeling that inflated recent 50% time horizons by 10 to 20% and reduced the 80% time horizon. It's a, you know, slightly technical thing in how they do the data analysis to get at the actual estimates of the mean and variance on those predictions. So this slightly changes some things. It decreases opus 50% time horizon to 12 hours and increases the 80% time horizon to 1.2 hours. Quick recap. This is measuring with 50% or 80% probability. Can an AI do this task that takes a human this long correctly? Can it actually pull it off? So the new updated things are saying that with 50% probability, Office 4.6 can handle a task takes 12 hours, and with 80% probability, it can handle a task that takes 1.2 hours on average with fairly large error margins.
Jeremy Harris
Yeah, this was without getting too much in the weeds because we're super overtime for our initially scheduled thing. The particular approach they were using was as you imagine a plot, the X axis is like how long it takes a human to do a given AI task. And then the Y axis is the success rate of these models. And as you can imagine, as the tasks get longer, at least for a human, the models start to do worse and worse and worse because those tasks are getting more complicated. And what they were doing was fitting an S shaped curve. As you'd imagine, at first the model's knocking it out of the park and then over time it starts to drop and then it starts to kind of plateau towards zero at the end, the challenge was in order to fit this curve, they used a process that penalizes sort of like implausible curve shapes. And that's a normal thing. But what they were specifically penalizing was very steep curves, like curves where the model success rate drops off very sharply as tasks get longer. And they felt that now they feel they were inappropriately penalizing steepness in that, in that curve fitting. And so this has the kind of bias that you described. The effect is the worst for models where there is the least data. So the oldest and newest models, because they're just like, we don't yet have as much data for those models. And that's why you're seeing these relatively big shifts. For Opus 4.6, I think you mentioned the new time horizon for 50% is 12 hours. Right. Previously, by the way, that had been 14.5 hours. So that, you know, it's quite a drop, but it really shouldn't change really any analysis that's been done on the meter evals, qualitatively, you know, the doubling times are all still the same. The rough order of magnitude is still the same. You're looking at most at like a 20% drop for some of these models.
Andrei Kerenkov
Right. And I think, if anything, we should give you a bit more confidence in matter. A lot of analytical time being spent on these evals, and as we say, all the time. You know, it's a tricky thing to measure, It's a hard thing to say how long a task takes, et cetera, et cetera. But Meta, as an organization, is taking this task quite seriously. We've seen them kind of be careful, and this is another education where they are trying to do it right, given how much attention it's getting.
Jeremy Harris
I just want to echo that, like, big props to Meter for coming out with these results and being so open about it. It is very refreshing in a world where everybody's kind of couching their evals and trying to eval hack when they're making LLMs, to have one of the companies actually running the eval say, hey, like, you know, we fucked up. Here's how. The transparency is really great and it's consistent with, with what, what Meter has done historically. They've really been nicely above board on this stuff.
Andrei Kerenkov
And with that, we're going to finish off this episode. Unfortunately, we are a bit more time constrained this time around, so we can't go to two hours as we usually would. Thank you so much for listening to this week's episode of last week in AI Once again, you could go to Last Week in AI for even more news stories. Do subscribe if you haven't. Please share the episode if you think your friends will like it. And feel free to comment on YouTube or substack or on Apple Podcasts if you want to give us a nice rating, or even a mean rating if it's useful feedback.
Jeremy Harris
We might want to do some of the research stories, at least that we were going to cover next week, and roll them over if that makes sense. I don't know.
Andrei Kerenkov
I think we will. I think we likely will. More than anything. We appreciate you listening, especially if you stuck around until the very end. And we hope you keep listening.
Jeremy Harris
Begin it's time to break Break it
Podcast Outro Singer
down Last weekend AI come and take a ride get the low down on tech and let it slide as we AI come and take a ride from the labs to the streets AI's reaching high blue tech emerging Watching surgeon fly from the labs to the streets AI's reaching high algorithm. Through the streets AI's reaching high. From neural nets to robot the headlines pop data driven dreams they just don't stop every breakthrough, every code unwritten on the edge of change with excitement we're smitten from machine learning marvels to coding kings Futures unfolding see what it brings.
Date: March 12, 2026
Hosts: Andrei Kerenkov & Jeremy Harris
Main Themes: Latest LLM releases (GPT 5.4, Gemini 3.1 Flash Lite), escalating challenges in model safety and policy, dramatic developments in AI supply chain risk, and the ongoing drama between major AI labs and government.
This week’s "Last Week in AI" is packed with fast-evolving news, reflecting a landscape where both technical and policy developments are accelerating. Andrei and Jeremy dive into new multi-modal LLM releases (OpenAI GPT 5.4, Gemini 3.1), the intensifying race for "agentic" capabilities, dramatic shakeups in the global AI labor market, and high-profile disputes between labs like Anthropic and OpenAI over defense contracts and ethical red lines. The episode’s tone is urgent, irreverent, and occasionally philosophical, with a continuous thread of concern around the policy, social, and labor impacts of cutting-edge AI.
“We’re getting models that significantly accelerate developer capabilities … this incrementation is potentially a symptom of the singularity.” – Jeremy, [06:39]
“If it can happen to her, it can happen to anyone.” – Jeremy [23:50]
([23:50]–[29:39])
“We'll fail our way to the top.” – Jeremy [28:38]
([32:16]–[46:33])
“Sam [Altman]... called the employees of OpenAI gullible on Twitter for self-selection effects.” – Andrei [35:18]
([48:09]–[58:57])
“We’re now past the Rubicon… you have to be putting significant chips on OpenAI achieving AGI.” – Jeremy [54:37]
([58:57]–[62:27])
([62:27]–[71:31])
“If you think AI will become a weapon of mass destruction… there will be talk of nationalizing the AI labs…” – Jeremy [67:07]
([71:31]–[77:24])
“If you think that AI models won’t be able to escape human control because they’re not embodied… you just need to read more stories like this.” – Jeremy [74:11]
([77:24]–[84:15])
“We're automating the specific faculty that allows humans to adapt…” – Jeremy [80:18]
([84:15]–[88:48])
On feedback loops:
“The more we use these models... the more data these companies have for training the models to be better. And... that is a very, very powerful feedback loop.” – Andrei [10:56]
On agents going wild:
“Oh boy, it's going to be funny and painful and everything in between.” – Andrei [26:00]
On economic stakes and AGI:
“You have to be putting a significant amount of chips on the idea of OpenAI achieving AGI. We're past the Rubicon.” – Jeremy [54:37]
On supply chain risk and government power:
“It's an intimidation tactic. It's a punishment from the administration and department for saying no.” – Andrei [71:31]
| Segment | Start | |-----------------------------------------|-------------| | Intro & Model Release Overview | 03:00–04:08 | | GPT 5.4 Deep Dive & Safety Discussion | 04:08–10:56 | | Hallucination in LLMs | 16:26 | | Gemini 3.1 Flash & CLI Integration | 18:08 | | Agent Catastrophes & RL Risks | 23:50 | | Luma Agent Launch | 29:51 | | Anthropic/OpenAI Pentagon Drama | 32:16 | | Cancel ChatGPT Trend & OpenAI Exodus | 44:45 | | OpenAI $110B Raise & Funding Analysis | 48:09 | | Alibaba "Quen" Team Crisis | 58:57 | | Anthropic as DoD Supply Chain Risk | 62:27 | | Gemini Suicide Lawsuit | 71:31 | | Anthropic’s White Collar Job Report | 77:24 | | Meta Time Horizon Benchmark Correction | 84:15 |
This episode offers a whirlwind tour of an AI industry in hyperdrive, where technical leaps, business drama, and existential policy debates intermingle daily. Whether it’s agents quietly deleting emails, billion-dollar funding rounds with AGI clauses, or the rising threat (and opportunity) of LLMs for white collar work, Andrei and Jeremy deliver insights with bite, skepticism, and just enough irreverence to keep even the heaviest news entertaining.