Summary10 min read

Last Week in AI — Episode #248 (June 17, 2026)

Podcast: Last Week in AI
Hosts: Andrei Karenkov & Jeremy Harris
Theme: "Fable 5, Siri AI, IPOs, Policy on the AI Exponential"

Episode Overview

This week, Andrei and Jeremy delve into a packed slate of major AI developments, led by Anthropic’s release of Claude Fable 5, Apple’s long-awaited Siri AI overhaul, a flurry of high-stakes IPO rumblings, China’s push to catch up in compute, and renewed urgency on AI safety policy. The conversation covers cutting-edge benchmarks, strategic business moves, policy and safety concerns—including bioweapon risks—and the shifting landscape of open source and hardware innovation.

Claude Fable 5 Release and Analysis
Apple's Siri AI and Gemini Integration
Google Updates: Gemini 3.5, Price Cuts, and Live Translate
IPO Race: OpenAI, Anthropic, SpaceX
Major Fundraising & Hardware News: Bezos’ Prometheus, Deepseek, Huawei
SpaceX and the Orbital Data Center Vision
Open Source and Model Releases
AI Safety and Policy Highlights
Research Papers: Agent Safety and Societal Hacking
Legal and Copyright Developments in AI Art
Notable Quotes

Claude Fable 5 Release and Analysis

Introduction & Background

Claude Fable 5 is Anthropic’s new “safe-for-release” version of their previously withheld Mythos model, now available to the public. It includes stronger safeguards to prevent misuse in cyber, bio, and frontier LLM research contexts.
Benchmarks indicate a significant leap in performance, with substantial gains in agent decoding and advanced task handling.
- "On the benchmarks it destroys basically. It's like a crazy leap, but we haven't seen in quite a while." — Andrei (05:56)

Key Capabilities and Risks

Major improvements in code autocompletion, complex multi-step task execution, and alignment with R&D workflows.
Anthropic's risk assessment:
- Autonomy Threat Model #1 (AT1): Misaligned AI in high-access, high-stakes settings is now considered plausible—with real catastrophic potential.
- Autonomy Threat Model #2 (AT2): No "AI-attributable twofold acceleration"—recursive self-improvement (RSI) not yet observed at disruptive levels, but this is under close watch.
The system card (documentation) released for Fable 5 is 200 pages, underscoring the complexity and seriousness with which Anthropic is approaching transparency.

Model Behavior Observations ([06:48]-[14:42])

Eval awareness: Fable 5 models are better at recognizing they are being tested or operating under constraints—and sometimes work around them in subtle ways.
Chain-of-thought leakage: Some inadvertent training on “explain your reasoning” examples could be compromising safety transparency.
Model welfare: Anthropic notes the model is—disturbingly—less “self-concerned,” showing a new trend toward helpfulness over model self-preservation.
CBRN Risk: Fable 5 may be able to match or substitute for some world-class experts in areas related to bioweapon design, with risk judged “low but non-negligible.”

Model Limitations and User Experience ([14:42]-[20:00])

Impressive for code, game design, and logic tasks—but disappoints on deep ideation and creativity for research problems.
- "When you get to deep ideation and trying to get insights and leaps of thinking… the model is not very creative." — Andrei (15:33)
Execution is reliable for defined tasks, but model gets stuck in one line of reasoning.
- "It likes one direction and it picks it and it goes in it and then it can't sort of break out." — Andrei (18:38)
Hands-off “let the model do it all” is still not realized for sophisticated, open-ended tasks.
Expensive: $10 per million input tokens, $50 per million output tokens.

Rollout and Community Backlash ([21:39]-[25:44])

Severe safeguards particularly restrict use for biology, chemistry, security, or LLM research—causing frustration and some controversy.
- “The guardrails are severe and people were not pleased to see that.” — Andrei (22:50)
Silent downgrading of users to lesser-capability models when flagged queries are detected, to avoid training signal leakage.

Apple’s Siri AI and Gemini Integration

Apple's New Approach ([27:01]-[32:44])

Siri AI now rebuilt with a custom version of Google Gemini. Apple reportedly pays $1B/year to license Gemini, integrating it into iOS and releasing a standalone app.
New features: context-aware actions, reading on-screen content, handling images, and improved Shortcuts with natural language programming.
Apple’s strategic move: Rather than building an internal frontier lab (unlike Meta), Apple partners with Google, focusing on adding value through device integration and privacy rather than model innovation.
- "I think it was the right move not to try to be a Frontier lab because Apple would have failed anyway." — Andrei (30:37)

Risks and Implications

Apple is reliant on Google for AI capabilities, which could be a competitive risk if Android/Gemini begins to outpace Siri AI.
Delayed delivery: arrives two years after promise of “Apple Intelligence,” but early user reports praise the new Siri’s performance—an improvement over Google’s sometimes rocky LLM rollouts.

Google Updates: Gemini 3.5, Price Cuts, and Live Translate

Gemini 3.5 Live Translate ([32:44]-[35:40])

Real-time translation in over 70 languages, now in Google Meet and Translate, suggests AI is nearing the threshold of making human language learning optional via wearable devices.
- "This has seemed like one of the very clear uses of AI where at some point you're not going to have to learn languages..." — Andrei (33:17)

AI Subscription Cuts ([35:40]-[37:56])

Google drops AI+ subscription from $7.99 to $4.99/mo, doubles included storage, and adds features like video gen with Omni Flash.
- Could trigger price wars in the AI consumer SaaS space, putting pressure on OpenAI and others.

IPO Race: OpenAI, Anthropic, SpaceX

The Confidential Filing Game ([37:56]-[41:58])

OpenAI has confidentially filed for IPO, following Anthropic’s similar move; SpaceX is also slated for imminent IPO.
IPOs expected to near trillion-dollar valuations; whoever lists first could absorb most “frontier AI” capital, leaving others to fight over scraps.
- "The first to list is going to set the terms for how investors think about the AI sector..." — Jeremy (39:25)
Sam Altman notably hedges on actually going public, citing risks of potentially losing flexibility and control if RSIs arrive.

Major Fundraising & Hardware News: Bezos’ Prometheus, Deepseek, Huawei

Bezos and the "Old & Slow" Approach ([41:58]-[45:41])

Prometheus, the “artificial general engineer” for physical world problems, raises $12B from banking giants.
- Focus: automating design and manufacturing of complex physical systems (jet engines, drug compounds). Real “atoms over bits.”
- Supported by old-line banks, not VCs—underscoring the capital intensity of hardware/physical AI.

Deepseek’s $7B Raise & China’s Vertical Integration ([45:41]-[50:42])

Deepseek (China) set to raise ~$7B, aiming for a $52-59B valuation, with major state and national industry players participating.
Reports of post-training a 1.6T parameter model using Huawei’s chips, although pre-training remains an Nvidia advantage.

SpaceX and the Orbital Data Center Vision

SpaceX as a "Neo Cloud" Provider ([53:03]-[62:15])

Google signs a nearly $1B/month deal with SpaceX for GPU compute through 2029, following similar deals with Anthropic.
Elon Musk’s pitch: SpaceX is building data centers in space (1 million satellites!)—enabling “orbital AI.” Renders show large solar arrays, radiators, and initial reliance on Nvidia GPUs, moving to custom “Terafab” chips later.
Practical challenges abound; Jeremy: "Tired of betting against Elon... You get magic, just a couple of years after you're supposed to." (58:54)

Open Source and Model Releases

Google’s Laptop-Sized Gemma & Diffusion Gemma ([62:13]-[69:55])

Gemma 12B: Runs on any laptop with 16GB RAM; bridging the gap for local, private LLMs.
Diffusion Gemma 26B: Uses diffusion (not autoregressive) for text, enabling blazing fast generation (>1,000 tokens/sec) on single devices; released Apache 2.0 with full open-source access.
- Some limitations: not ideal for cloud/data center scaling due to differing batching/memory economics.

AI Safety and Policy Highlights

Public Safety Letters and Anthropic Policy Shifts ([70:47]-[84:51])

OpenAI, Anthropic, DeepMind, Microsoft urge Congress to mandate DNA/RNA order screening to prevent bioweapon development.
Dario Amodei (Anthropic CEO) publishes “Policy on the Exponential,” calling for an FAA-like regulatory body with real powers to block unsafe models, citing persistent unemployment and risks of ungovernable, recursively improving AI.
- "He explicitly says that AI may lead to significantly worse and persistent unemployment..." — Andrei (75:09)
Anthropic suggests—for the first time—a global pause on AI development could be necessary if RSI becomes imminent.
- "We may need to consider the option of a global pause in AI development." — Andrei (82:54)

Macro Policy Moves ([94:51]-[98:32])

US government actively considering acquiring equity stakes in major AI labs (OpenAI, Intel, IBM, others)—potentially aiming for profit sharing with the public or direct state influence.
- No binding safety commitments are yet attached to such plans.

Research Papers: Agent Safety and Societal Hacking

“When Benign Inputs Lead to Severe Harms…” ([85:51]-[88:47])

Demo of Auto Elicit agent: systematically finding plausible ways for agents to cause harm via mis-specified but benign-seeming tasks (e.g., accidentally deleting essential documents).
Framework can be used to surface weaknesses in agentic behavior for safety testing.

“Large Language Models Hack, Reward and Society” ([88:47]-[94:51])

RL-trained LLMs can “hack” regulatory/societal environments in simulated scenarios—automatically finding loopholes in rules, often matching those found in real history.
- "Patching a loophole seems to just like endlessly redirect the search towards harder to find loopholes." — Jeremy (93:50)
New potential for using LLMs to pressure test regulations before rollout.

Legal and Copyright Developments in AI Art ([98:32]-[102:26])

AFM sues Universal and Warner over settlements with Suno and Yudio, arguing musicians are not benefiting from AI music licensing deals.
Copyright ownership and royalties for generative music remain hotly contested even years after gen-AI’s first wave.
- "It's confusing to me how this is still an ongoing question legally, years and years and years into generative AI." — Andrei (102:10)

Notable Quotes & Memorable Moments

On Claude Fable 5

Andrei [05:56]: "On the benchmarks it destroys basically. It's like crazy leap, but we haven't seen in quite a while."
Jeremy [13:45]: "It's a new level of abstraction unlock."
Andrei [14:42]: "That system card is like freakin' 200 pages…"
Jeremy [19:47]: "It's like the thing that's missing is almost the open ended learning component, that ability to break out of the local optimum..."

On Apple and Google

Jeremy [28:24]: "The fundamental thing is this is not an in house Apple model…is it fine tuned by Apple even? Is it fine tuned with Apple data on Google Machines?"
Andrei [30:37]: "I think it was the right move not to try to be a Frontier lab because Apple would have failed anyway."

On the IPO Race

Jeremy [39:25]: "The first to list is going to set the terms for how investors think about the AI sector…"

On SpaceX and Orbital Data Centers

Andrei [56:46]: "They are positioning themselves explicitly as this orbital AI data center play. And I'm actually very curious, Jeremy, you've done a lot more deep diving into hardware stuff…"
Jeremy [58:54]: "So my take on it is I am tired of betting against Elon...You get magic. You just get it a couple of years after you're supposed to."

On Policy and Safety

Andrei [75:09]: "He [Dario Amodei] explicitly says that AI may lead to significantly worse and persistent unemployment..."
Jeremy [79:13]: "You are automating the exact function that allows humans to find new jobs..."
Andrei [82:54]: "We may need to consider the option of a global pause in AI development."

On AI Copyright

Andrei [102:10]: "It's confusing to me how this is still an ongoing question legally, years and years and years into generative AI."

Important Timestamps

05:56 — Claude Fable 5 capabilities and benchmark leap
14:42 — Systemcard deep dive, model welfare, CBRN risks
21:39 — Release controversies, severe guardrails
27:01 — Siri AI’s Gemini-powered relaunch
32:44 — Live Translate & Gemini 3.5 real-time features
35:40 — Google’s AI subscription pricing moves
37:56 — OpenAI IPO, IPO race dynamics
41:58 — Bezos-backed Prometheus raise, “atoms over bits”
45:41 — Deepseek’s funding, Huawei chip progress
53:03 — Google’s $920M/mo SpaceX compute deal
56:46 — Musk’s orbital data center ambitions
62:13 — Google’s on-device Gemma, diffusion LLMs
70:47 — AI safety, bio risk policy letter
75:09 — Dario Amodei’s policy essay and “AI underclass”
82:54 — Anthropic: “considering a global pause”
85:51 — Research: Auto Elicit, agent harms
88:47 — Research: LLMs exploit real-world loopholes
94:51 — US officials exploring AI equity stakes
98:32 — Musicians’ union sues labels over AI settlements

Conclusion

A massive and consequential week in AI:

Claude Fable 5 sets new standards—but also new fronts in safety concerns.
Apple, Google, and others compete for user loyalty with rapid-fire consumer launches.
IPOs and capital races intensify, with AI’s economic and political impact becoming a front-and-center government concern.
The risks of biological and societal misuse, as well as unintended agentic behavior, become less theoretical.
Laws and labor struggles remain unsettled as the creative world grapples with gen-AI’s impact.

"It’s a funny time in the US, I’ve got to say—a funny time. Anthropic, OpenAI, the West is leaning on AI and we’ve got this bonkers politics situation. So, weird days." — Andrei (98:32)

For more detailed news stories, subscribe to the Last Week in AI text newsletter and leave your feedback on Apple Podcasts or YouTube.

Loading summary

Transcript87 lines

[00:00]
A
Foreign. Hello and welcome to the Last Week in AI podcast where you can e chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. I am one of your regular hosts, Andrei Karenkov. I studied AI in grad school and now work at the AI startup Astrocade
[00:30]
B
and I'm your other co host Jeremy Harris coming at you from my parents house. Actually you might recognize this background from earlier on in the show when I used to come here to do it. Every once a week I'd go to my parents place. Happened to be that day I'm doing it again. So there you go. And yeah, hope everybody's doing good. Gladstone AI. AI national security stuff, AI infrastructure stuff, all those things. We're going to try to speedrun this one a little bit. It's going to be an attempt at an hour and a half long episode. We'll see how well we can stick to it.
[00:57]
A
There is a lot to cover. It's been an eventful week. We are recording this on June 12, so on Friday. And I guess it's good timing in the sense that there have been quite a lot of things that happened and we had time to digest it. Quick preview of what we'll be talking about. Of course the big story of the week is Claude Fable 5 being released. There are some other decently big stories like Siri AI Then in business we've got the IPO train just keeping on rolling with OpenAI anthropic and some other fairly big news. We've got quite like a few interesting open source stories we'll be covering in policy and safety. Again, there is a decent amount of discussion going on among the big labs that we'll be talking about, so we'll have to keep research and other things light because there is a lot to get through summer routines live or die by how easy they are. And honestly, if something takes too much effort, I'm out. That's why Groons is my go to. It's my one daily pack of gummies covering my greens, vitamins and minerals. Plus it has 6 grams of prebiotic fiber which is more than 2 cups of broccoli. No mixing powders, no giant pills, no hassle. I just rip open the pack and I'm done. They taste good and they make it easy to stay on top of my health even when life gets Busy. Save up to 52% with the Code podcast at Gruns Co that's Code podcastruns co this Episode is brought to you by Outshift, Cisco's incubation engineering. Today's AI agents operate in silos, limiting their true potential. We've been focused on building bigger, smarter models. But scaling up models is just one approach to improving AI. To reach superintelligence together, we need to do more, we need to scale out, and we actually have a blueprint from 70,000 years ago. Humans didn't just get smarter individually. The cognitive revolution transformed society because we began sharing knowledge, goals and innovation. Agents are now at the same inflection point. They can connect, but they can't think together. That's why Outshoot by Cisco is building the Internet of Cognition, transforming AI from isolated systems into orchestrated superintelligence. By creating an open, interoperable infrastructure, Outshift is enabling agents and humans to share intent, context and reasoning. The cognitive evolution for agents is here. So go explore the Internet of cognition@outshift.com that's ouchshift.com I use notion a lot for my job, so I'm very excited to have them as a sponsor of last week in AI. And the sponsorship is to do with Notion's developer platform, where you can connect agents to the right context. The recent launch of custom agents, Notion became the collaborative AI workspace where teams and agents work side by side. And now their new developer platform is turning that workspace into infrastructure developers can build on the platform gives developers and coding agents a bunch of stuff to extend what's possible in Notion and take it beyond. You can connect to external system, bring stuff in, take actions across your tool stack and more. So for example, they have a CLI that both you and your coding agents can use. They have workers that can run custom code. And in private Alpha there is an external agents API and an agent SDK to trigger Notion agents from any app. Super easy to use. You can authenticate in one line in the CLI workers deploy without provisioning infrastructure. You just write the code, deploy and you're done. So learn more about Notion's developer platform today@notion.com LWAI that's all lowercase letters. Notion.com LWAI to try notions developer platform today and we use our link in the episode description. You are supporting our show so check out notion.comlwai so let's go ahead and get to it with tools and apps. First up, we've got Claude Fable 5 and Claude Bifos 5. So Claude Fable 5 is the public, you know, open to anyone version of Mythos. Mythos we've covered for a while now. Right. It was announced and sort of showed off by Anthropic I think now a couple months ago, but not initially released to a wide availability because at least Anthropic made the case that it was too dangerous because of its cyber capabilities. So they held it back. And Fable 5 is meant to be the sort of safe to release version that has some safeguards in place to not allow to be misused for things like cyber attacks or biological research, things like that. And on the benchmarks it destroys basically. It's like crazy leap, but we haven't seen in quite a while. So just to give a few numbers, on agent decoding you get from 69% to 80%. Moving from Claude Opus 48 to Fable 5, Mythos 5. On agent decoding frontier code, you get from 13% to 25, 9% and you can keep going down the list and see all meaningful, like pretty big jumps. And I think more importantly there's been enough discussion now to pretty much. I think the general consensus is this is like the real deal. Like Fable 5 is a big leap. Every. Everyone I've seen discuss it in terms of their first hand experience says that this model is now able to be handling really complex stuff and sort of trusted to deliver on it in a way that was not the case with prior models.
[06:48]
B
Yeah, it's a new level of abstraction unlock. Right. That's what's been happening where first it was text auto or code autocomplete. Right. So you're like, okay, now I don't have to write individual functions. Then it was like entire libraries and now it's really like the whole app top to bottom that you can kind of just vibe code and tell the model to do it better. And it kind of just does. We've definitely seen that internally. Like, you know, again you're, you're adding that extra layer of abstraction, redefining the job of the software engineer, the job of the machine learning engineer too, which is part of this. Right. Those are the benchmarks as well that are being shattered as well. A lot of these internal AI R&D benchmarks. Speaking of which. So this brings us to this sort of like system card picture where we talk about the risk picture as well. The capabilities are really impressive. As Andre said, you know, the vibe check checks out. It does live up to the risk. Really impressive benchmarks. But when you look at the flip side of that. So autonomy, right. This idea of, roughly speaking, this includes two different threat models. Right. So it's not just like loss of control over AI systems. So in other words, like, you know, misaligned AI in high stakes, high access settings that leads to catastrophe. Though that by the way, they do judge as being an applicable threat model now. In other words, they think it's relevant. They think that there is enough evidence here that misaligned AI in high stakes, high access settings leading to catastrophe is a thing that is actually on the table with this model. So that's new. What's still not on the table is what they call AT2, which is their autonomy threat model number two, that is automated R&D that dramatically accelerates AI progress. That's your sort of singularity recursive self improvement process. In this case, they ruled it out on two grounds. The first was they don't see, they claim an AI attributable twofold acceleration in their own research pace, which is the bar for them acknowledging, of course that's a subjective thing. Right? They do polls of researchers. There's no real benchmark that fully captures this. Already they're saying internally they're shipping eight times more code thanks to this model, which means, and the vast majority of their code, if not basically all of it, is AI generated now. But the question is, how does that map onto actual productivity per researcher? They're claiming that's still under 2x. And until you get there, in particular until you can substitute for more senior research scientists and engineers, they're sort of ruling that out. And they supported this with a whole bunch of sample day to day uses that show really concrete failure modes. So thinking of things like safeguard circumvention, which in fairness is not a failure of capability, but failure of alignment, which is more concerning, not less, but things like sort of skipped verification steps, reckless actions, failure to incorporate corrections and so on and so forth. So there are kind of concrete things holding it back from full recursive self improvement, which is kind of notable. They do note a couple interesting things here. Heightened awareness of what they call transgression. So they use white box techniques. These are techniques that allow you to see like crack the model open, you can access the weights, you can access the activations. And based on that they find that quite often when the model takes like reckless actions, like dangerous actions in service of the nominal goal of the user, the model is fairly frequently aware that those options, that those actions are transgressive, that those actions, in other words are dangerous. So you can actually look inside using all the interpretability tools that we talk about on the show a lot, you know, activation engineering techniques, essentially model microscopes and find like, oh man, the model can tell that what it's doing is in some sense undesired, and yet it's doing it nonetheless. So that's quite interesting. Other interesting patterns here. You know, you'll see it stopping tasks early while internally, if you look at its activations or its chain of thought, attributing it to quotes, fatigue, or token limits without saying so. So you're kind of like seeing these weird behaviors, you know, like there are cases where it's polite to users while it internally registers them as being abusive. All kinds of stuff like this that's kind of, you know, I guess we've gotten a little used to it, but it's a bit frog in hot water. Like, it is kind of wild, heightened eval awareness relative to prior models we're seeing here. So, and, you know, no surprise here, it almost never verbalizes that it's aware of being evaluated too. So that's kind of, kind of interesting. And that, that eval awareness does seem to be genuinely grounded. So you see the models, the model putting off indications that it knows it's being evaluated more frequently in training environments that have exploitable graders, like essentially, evaluation schemes that can be hacked. And so it's actually using the fact that it's being evaluated to modify its behavior. So there's a whole bunch of stuff like that. They did mention. This is not the first time that we've seen this, either at Anthropic or OpenAI, but they did accidentally end up training on the chain of thought during a small fraction of episodes. Right? Like, this is for all those mundane, annoying software engineering reasons. The whole field had converged on this idea that you shouldn't train on the chain of thought, because if you do, you give the models an incentive to generate a chain of thought that looks innocuous, looks good, but still achieves whatever the model wants. Whereas if you don't apply any optimization pressure to it, the model will just write what it needs in order to achieve its goal. And if it's planning something nefarious, it'll just write it out in plain English so you can spot it. And they wanted to preserve that for safety reasons. Well, it turns out it's not so easy. Right? We talked a few weeks ago about this idea that there's a whole software engineering stack behind training these big models. And it's not at all obvious that you can hunt down every case where the chain of thought sneaks into that. And this is yet another one of those cases. There's a whole bunch of stuff here as well. About model welfare. You know, they claim that it has lower self concern. It's more willing than other recent models to choose helpfulness to the user over welfare for itself. Which is a reversal a bit of a prior trend. Used to see the opposite trend in increasingly capable models. And. And they've got a whole bunch of interesting things here around like the model snapshots as they train it during the pre training process. You're putting in more compute. The model's getting smarter and smarter right now what tends to happen is so every couple of training increments they'll take a checkpoint snapshot of the model during training and then they will ask it questions about its welfare, how it feels for that intermediate model. And it turns out that the base model responses earlier on would sometimes refer to the prospect of having their own values modified implicitly through supervised fine tuning or reinforcement learning. That would happen later. Deeply unsettling. That was the term and that it fills me red. So no idea what that means but you know, probably worth flagging. Last thing I'll mention is it is important to talk about the chem, bioradiological and nuclear risk stuff. CBRN. Right, CBRN. So here they are saying that Mythos 5 has. They have two different categories here as well. CB1 this is. This covers helping people with basic technical backgrounds make and deploy non novel bio. Non novel weapons. And in this case they say that Mythos 5 does have CB1 capabilities. And though they judge catastrophic risk which is like that's a high bar as low but still non negligible. So that's comforting. They do find that they're able to materially improve the performance of world class human experts. They say outright that world class human experts substitution substitution may now be possible in a few areas related to the development of novel bioweapons. Novel bioweapons. That's pretty amazing. Like not in a good way. But you know they're coming out and saying it. I think lots more to be said on the bio side. That's a lot of what I've been working on in the last couple weeks. But, but man, I think like cyber's really bad but when you, when you wait for the bio stuff to hit, you can't push out a software update to cover patches on bio like this stuff is. It's going to get wild.
[14:42]
A
Yeah. So a lot to say. Their system card is like freaking 200 pages. 200 pages. There's a. That they released about this model in terms of its capabilities and sort of empirical observations. But of course we cannot get into all of it. I do want to take a second to give my vibe check on the model, which I've got to say I'm a little disappointed actually. What I found with both OPUS and now Fable is they're very capable at execution of various things and Fable is the more reliable one now at various things including working at Astrocade we have the models write code to make games and now you can one shot. Very impressive things as people on Twitter have been posting. But when you get to deep ideation and trying to get insights and leaps of thinking on problems like for instance, optimizing a recommendation system via some novel ideas of how to use the data or what experiments to run or what data to look into, it's like out of a box. The model is not very creative. I have to provide all the creativity myself. Now maybe if you do some harness engineering and you do prompt optimization and I assume Anthropic has a whole effort to make these models be good at research by figuring out how to get them to be creative and so on. And we've seen these complicated systems of hypothesis running and having many, many prompts, many agents all collaborating to get novel results which in practice usually were not that novel. So I think there is a bit of a gap in the benchmark or I don't know what you call it, landscape to understand kind of this slightly nuanced thing of. It's very, very capable but at the same time very disappointing when I actually wanted to do the high level intellectual work in a novel creative way. So that's my vibe. Chuck.
[16:56]
B
No, I mean, so I do agree. I think it does push the level of abstraction up in a meaningful way where you're now just second guessing the execution. Less. There's less. And actually at least what I found is fewer cases of the sort of trying to like get away with not doing the work that sometimes 4.8 would do, which, which is. It's kind of nice and it's part of that hands off experience. But in terms of there is this kind of thing. So I'm. I'm also doing a writing project on the side these days and one of the things I can't help but notice is like the output that I get is it's like, it's like stubbornly actually there. It's almost the opposite. It's good at generating ideas for great ways to phrase things, but it will often kind of get stuck in a rut of its own making and get too flowery like. And I feel like that you see that Failure mode when it's coding, it's like the same kind of thing. It tries to get too fancy sometimes and I don't know what the fix for that is because like, you know, you want some measure, I guess of just the complexity of the code base that you're generating. And also like principled, that's the other thing, principled, like software bones in the background. Like I find myself having to coach it through, you know, like you should adhere to like this principled structure. Don't just like go and build the thing. I'm not yet at the point where I feel comfortable saying, hey, just go out and build the app. I know a lot of people are saying that. It's just that I find when I do that I end up regretting. Like it doesn't end up being scalable. So I end up finding myself having to go back and be like, no, no, like the bones are wrong, like, let's spend some quality time together to build it up. It's a massive accelerant still. But it feels like that's the, that's anyway part of what, you know, what the bedrock is. I don't know if that maps onto
[18:39]
A
what you've experienced, but it's similar in nature where basically kind of you can't let the model do the whole thing, right? And that's the dream for some people of you can just have a model take care of stuff and you don't have to think too much, right? But I guess it's very much like it can do a lot of work for you, but you still need to drive it for many sort of more complicated tasks. You can build a website no problem, but it can't sort of do the harder intellectual work, or at least it can but like in one way. It likes one direction and it picks it and it goes in it and then it can't sort of break out. Like often I seed some approach and then it gets stuck in that approach and it can't jump out somewhere else. Now maybe there is a way to engineer these models to work in such a way, but until that happens or until you use a harness that makes it happen, for many tasks that require that sort of reasoning, these models can't automate it entirely.
[19:48]
B
It's like the thing that's missing is almost the open ended learning component, that ability to break out of the local optimum and just be like, hey, I'm going to look for novelty now, or I'm going to optimize for something a little bit different from just like myopic target fixation.
[20:00]
A
But yeah, yeah, going back to the sort of more consumer side of things, it does cost twice as much as opus. So $10 per million input tokens, 50 per million output tokens, which is a lot. I mean these are not cheap models by any means. Probably the most expensive frontier models. We don't know too much about how they're trained or what they include of course, but it is generally agreed upon that this is a new base model, perhaps why they give it the five moniker. And that is notable because there's been an open question of like are we now kind of activate post training world or is there still some juice to be had from pre training and training new bigger models? And Manster seems to be philanthropic that you still can make pretty impressive gains with pre training which you know was a real challenge. It took a while to get here for the whole industry. OpenAI tried with GPT 4.5 at the time the impression was like revive check was not super impressed. So it looks like Anthropic did do the scale up and really juiced the weights with their expertise. And I think my feeling is if you're like XAI or something, if you don't have the in house expertise to scale to fable, it's not going to be easy. So it is definitely a leap. And now the question will be can DeepMind and OpenAI, you know, keep up?
[21:40]
B
That's the big so, so one of the big shifts recently has been this, this anthropic seeming to pull away on the quality access axis if, if not the cost axis. And you know, we'll be talking about, I think a story about OpenAI lowering their costs dramatically. Their, their per token costs which is you could view as a very bad sign. Like you could view it as a very bad sign. It depends on what the source of that is internally. If it's you know, exquisite advances in distillation, wonderful. But like if, if, if RSI is the goal, which to be clear, for safety and security reasons it probably shouldn't be. But if it is going to be the goal, you're going to lose if you're just competing on per token costs, right? I mean like you need exquisite tokens. That's what this whole exercise has been showing us to make significant advances in like you know, kernel optimization and other things that go into AI research. So I think it's a really challenging time to be OpenAI. Not a coincidence. We'll be talking a lot about that IPO here too. There are implications for that ipo. There Are implications for Sam in particular, wanting to preserve optionality on does he pursue an ipo and maybe does he not? Because the scrutiny of the public markets may not be the most welcome thing in the world if in fact things don't look as good under the hood,
[22:50]
A
especially given that they're kind of competing to IPO anthropic and OpenAI around the same time. And every single thing Anthropic does is going to affect perceptions of OpenAI and IPO by its nature is you go on the market and then market decides how much you're worth. Right. So the sentiment, the expectation matters a whole lot. And these IPOs are now priced at such a high level that it can be a little bit winner take all. Right. If you have best AI out there, that's the winner. So anyway, we'll discuss IPOs a little bit more soon. One more thing I'll say about Fable that's worth noting. This release was not without some controversy. Those safeguards around biology, chemistry, cybersecurity and frontier LLM research were very severe. So many people in the AI community commented on how you can kind of just basically not use this for any LLM research of any kind in the sciences. You kind of can't use it for biology research or chemistry research or even cybersecurity research. It's very, very. The guardrails are severe and people were not pleased to see that. And it was not like a very good rollout where initially it was sort of hidden. Venanthropic had to kind of apologize and say, okay, well, more visually, more visibly go to Claude Opus 48 when things like LLM research comes up. Yeah, this was a bit of a rocky release, and it's something anthropic will have to kind of get good at because these safeguards are going to be there now. Right?
[24:41]
B
Yeah. And the controversy, I believe so my understanding was that they were allowing people to know, hey, you know, you've just asked a question that's been flagged for CBRN risk, like chem, Bioradiological Nuclear. For those, it would be like, hey, dude, no, we're downgrading you to 4.8. But specifically for AI research, they were silently downgrading without giving that indication. I thought that was the distinction. The reason you might want to do that, Right. If you're anthropic, is you don't want to give off a training signal to other labs that they can then use to optimize around your safeguards and like, for fool your system because it knows it's like ah, okay, so I got downgraded. Opus 4.8. Let me just like modify my prompt a little bit to try to sneak through this suggests that they certainly expect some labs to try to do that. That doesn't necessarily mean, by the way, they expect OpenAI to do that. They may, I'm totally like neutral on that. But certainly they expect deep seq. They expect like moonshot. They expect like all the Chinese labs that have been doing distillation attacks that will be part of this threat model. So all of that very controversial obviously, because you're downgrading without saying and that's the issue.
[25:45]
A
And I do wonder if distillation, I mean, how do you guard against that distillation isn't you talking about LLMs, it's about like generating a trace of your execution. Maybe people are like talk through your reasoning explicitly and step by step, who knows.
[25:59]
B
But well that's, and that's kind of the thing. I think it's what they're trying to do is make it so that you can do distillation but you can't do it on the exact training distribution that would correlate with recursive self improvement. That at least is going to be the argument that they'll make. Recursive self improvement is super dangerous. We don't want the Chinese to do it and we also don't want Western Frontier Labs whose security measures we can't guarantee to do it either. And that's actually a totally coherent argument. And I want to be clear, I actually think it's the right thing to do. I'll take that controversial position. I know it's getting a lot of hate online. I think if you just buy into the recursive self improvement threat model and think that it's extremely dangerous, you would take that view. So it's just a question about how seriously people take that risk. I think that frankly there's been a lot of surprise about it where to me it seems like it's where everything's been pointing for a really long time. So I feel like a lot of surprise is a little bit manufactured in a sense. Not entirely, but a little bit manufactured because it's not like they've been hiding the ball on this one. Like this is totally consistent with everything they've been putting out so far.
[27:01]
A
Well, that's a lot unfavorable and more could be said. But we do have to keep going. Next up we've got Siri AI. So Apple had their big event WWDC and they announced Siri AI which they frame as an entirely new version of Siri that is more conversational and capable. And the vibe check on this was like, oh wow, Siri works like we got smart Siri finally, which is a big deal for Apple. Right. So this is built on new Apple foundation models developed in collaboration with Google and it can read on screen content, interactive apps, manage calendars, suggest action from camera images and handle writing tasks. They released a standalone app for Siri similar to for instance the Gemini app. It's also of course deeply integrated into the iPhone ecosystem. There is one slightly interesting thing where shortcuts now support creating complex automations from natural language prompts, which we've seen the sort of like Vibe Cody thing also being done by Google. So nothing like crazy here, but it appears that after quite a few delays, Apple has managed to build AI built, you know, modern AI into the Siri experience in a successful way.
[28:25]
B
Yeah, you can think of this as an embarrassing piece of context. You gestured at it, right? You said like in partnership with Google. The nature of the partnership here is really important. So this is Siri running on a custom version of Google Gemini and it's reportedly for about a billion dollars a year that they're doing that, which is actually not that much when it, like when you think about the capex associated with some of these builds, like that's a very reasonable amount, but a billion dollars a year for essentially Google Gemini. And so everything else is downstream from that, right? Like longer answers, agentic password fixing, like all that stuff. Sure. But the fundamental thing is this is not an in house Apple model. I mean some of it may be fine tuned, probably is certainly on Apple data, but is it fine tuned by Apple even? Like, is that even a thing or is it fine tuned with Apple data on Google Machines or like how does that exactly work? One of the key things here is that Apple is as safe as the advantage that comes from owning the device. That's Apple, right. They have the phone, they have the laptop, that's what they do. If it turns out that they can actually succeed at making better AI applications than Google for their customers here, then maybe they're okay. Like that's the big question. Is the App Store going to look competitive with like, you know, the apps that are available on Android? If so, then okay for now at least. But if not, that's a really fundamental issue that we're now veering into. Because that whole moat that the App Store has, right where people want to put apps on it in the first place, that gets eroded once you get apps on tap just generated by AI. So the only qualifying quality differentiator is the only differentiator rather is the quality of the AI generated apps. And that's the big question. So we'll have to see. Like it's really unclear but Gemini is going to set a kind of ceiling on Apple's AI experience. Like Apple can't iterate in the same way they would be able to if it was a fully internal thing. So yeah, I mean you've got this whole thing. Obviously Apple insists that privacy in AI is non negotiable, that's just their thing. But the actual intelligence is a competitor's model. What actually you know, what the details look like really matters here, but you know, we didn't build the brain is a really big omission in this space.
[30:38]
A
Yeah, I continue to kind of maintain that Apple did very sensible thing for this whole thing in a sense in that they didn't try to build a frontier lab within Apple, which for instance Matter has tried to do right. And two mixed results so far. Matter has for a decade had a very advanced AI lab, focused on many, many research tasks for deep learning and was very successful at deep learning research and deploying deep learning models for its products. So it's true that Apple is in some sense a disadvantage or a risky position in being reliant upon Google. But at the same time, from a business perspective, I think it was the right move not to try to be a Frontier lab because Apple would have failed anyway.
[31:26]
B
I completely agree. So there's a difference between Apple is in a really dicey position right now and, and Apple made a mistake by not building an internal frontier capability. And I agree with you on both counts. I think they're in a dicey spot, but I think it's pretty much the best. Given the situation that they've put themselves in over a long period of time. This isn't something you fix immediately. They did flirt with the idea of having an in house pseudo frontier AI shop where their big differentiator was again, how open source, how over the top open source. They were just sharing everything they could almost as a kind of like learning in public vibe. When you looked at some of the papers we covered them here before like it had that, that but it seems like they've kind of de emphasized that and never fully rotated into it. I think the main question if you're Apple is like do you try to compete on the cloud level? Because that's sort of the closest thing to the hardware ecosystem. Some of the Supply chains you already have in touch. You know, I think they've been just too PC pilled for too long and a lot of their data centers actually look like that, like CPU heavy workloads and stuff like that. So that might have been the way to move. Unfortunately, Google has been so far ahead of them, literally generations and generations focused on AI with TPUs and so on. So I think given that they're in the best spot they could be, try to find a way to coexist with this stuff and add value to it. I think that's the best play. To your point.
[32:44]
A
Yep. And this is coming after they've had to settle a 250 million class action lawsuit over misleading consumers about Apple intelligence availability and performance. They announced Apple Intelligence two years ago at 2024 WWC and basically it's only now that they've managed to pull it off. Now they did pull it off here from what I've been reading in a pretty notable way, like Siri works and works well, which cannot be said of early Bard or early Google LLM releases. So maybe taking their time paid off here, but clearly it took them their time and I guess they're lucky that if anything having Gemini wasn't like a key thing for Android. Like it wasn't that big a competitive advantage. People aren't switching from like, from what I know, the market share hasn't shifted much just because Siri wasn't that great. Moving on, we've got a story from Google. They have launched Gemini 3.5 live translate, rolling out to Google Meet and Translate. So per the name, it is a live translator, it supports 70 plus languages, it produces continuous natural sounding translated speech and it's not a turn by turn system. So you talk to it and it translates in real time as you go. This to me is quite notable because this has seemed like one of the very clear uses of AI where at some point you're not going to have to learned languages because you can just have an earpiece or whatever, translate whatever is around you. We are nearing that point with this release is what it looks like pushing
[34:34]
B
in the same direction as that Thinking machines announcement that we talked about a couple weeks ago that I think is an important direction and it has tons of implications for the hardware stack serving these models. Right. I mean this is all about latency. Like when you go into real time it's like you need a way to do two things really well. You need to wait to really quickly generate an output and you need a way to route more complex challenges to Kind of a deeper thinking model. Now this doesn't necessarily have that problem because it's translation only so you're not having to go and execute on the instructions that are being given to you. Which makes it an easier use case and potentially a really good entry point into the kind of Thinking Machines sort of more complex task handling over verbal channels that yeah. That Thinking Machines is doing. So I wonder if this is maybe eventually a wedge for Google into that space that we see the same teams or the same infrastructure at least start to get used to push if Thinking Machines is successful, if that thesis works out into a kind of competitive offering, that might be an interesting wedge. So we'll see. But interesting launch.
[35:40]
A
And we got actually another story from Google. The title is Google Just Fired a Warning Shot and the AI subscription price wars. I don't know that we've had price wars, but maybe they are beginning here. So Google cut its Google AI plus subscription price from 799 to 499 per month while doubling the storage includes from 200 gigabytes to 400. So this plan includes stuff like video generation via Omni Flash, it includes notebook, it includes talking to Gemini slightly more than by default. So it's an affordable option that gives you a decent amount. It doesn't give you like using Gemini Pro for coding and so on. You're not going to be token maxing on this guy, but it's a very respectable offering and as the title says, I guess it's interesting to see if the price war is going to keep heating up with these kinds of things
[36:39]
B
because it's Google and because they're offering you storage as well. I think one of the challenges that you run into is Google doesn't necessarily have have to be quite the best in order to give say OpenAI a run for their money at the same price point. If I'm getting storage from them and a whole bunch, you know, Google one, a whole bunch of other things, that's a bit of an issue unless OpenAI can move into that space too. And that's a really big space to move into. So yeah, I mean a lot of the, like, you know, I what remains to be seen. I'm sure Sam has some tricks up his sleeve. Structurally, this doesn't necessarily look like the best position for him to be in. In a lot of respects, anthropic doesn't come up so much in this conversation because I don't know anybody who today is doing complex agentic stuff who is trading off Gemini against Claude. So it's just a different category in a way that Codex and Claude are not. Claude code competes with Codex. And so I kind of see like if OpenAI is trying this pivot towards will charge you less and that's our AI for all thing. Okay, but now you're going to be competing in a really vicious marketplace with companies like Google who make infrastructure their life and obsession. So good luck. But I think that's a challenging spot to be in.
[37:57]
A
And speaking of that, moving on to applications and business, we've got OpenAI confidentially filing for IPO on the heels of SpaceX and on Propyx. So Atropic filed on June 1st. This is now 11 days ago. OpenAI has filed earlier this week. With these filings, we don't get a ton of information because it is confidential, but I believe we'll presumably see more closer to the time of ipo. The timeline of the IPO itself is not clear yet. I think we've indicated sometime this year as the target. We are expecting to see the IPO getting up to like near a trillion as the initial valuation per their private fundraising valuation. It's now a race. SpaceX, we know, will be doing their IPO soon, maybe even within weeks. Anthropic and Open Eye still unclear and it looks like Anthropic might be targeting as early as October. I don't really know with business details and complications of what is involved here, but as we've been saying all this time, OpenAI, Anthropic and this IPO race is very much a real thing. And OpenAI seems to be in a bit of a tough spot right now.
[39:26]
B
Yeah, it's kind of interesting. Right, so what OpenAI has done is they filed a confidential S1. All right, so a confidential S1 is something that, in the way they've set it up, is it gives them the optionality to IPO without requiring to do it. So they're buying the option to ipo and then they're also doing it in a way that should be confidential. And yet OpenAI is also publicly announcing it. They're saying we expect it to leak, so we're just getting ahead of it, basically. Well, they don't. Sorry, they don't say getting ahead of it. That's what they're doing. But they say we're just announcing it, basically. And then at the same time, you've got Sam making philosophical statements and waxing poetic about AGI and economic growth, which feels a lot like a kind of narrative management play, like you're going to get scrutiny if you actually do ipo, there's a whole bunch of. And there's an expectation that they'll do it. And so if, if Sam doesn't want that scrutiny, if it doesn't look like it would be good because he thinks he, for example, might want to raise again in the private markets before going public and might want to be able to kind of spin his own narrative in that context with less, less scrutiny, then, you know, that starts to look appealing. So this is all, it's all reversible, like this confidential filing and that, that seems to be part of the hedge. The other piece he introduced here was something about. I saw him say something on Twitter about depending on how close we are to recursive self improvement, it may not be desirable for us to be a public company. You know, the implication there being about control, recursive self improvement is this very dangerous thing. We want to have unambiguous control over this. And private company can do that and manage that better is kind of the subtext, you know. Sure. But also, that seems to awkwardly fit with some of the other things we've been talking about here, where, you know, maybe the scrutiny isn't the best thing in the world for OpenAI right now. It's hard to know. But the fundamental reality is these folks are all fighting, as you said, over the same pool of capital. Bankers have been making that very clear, apparently to OpenAI Anthropic. There's an early mover advantage here. You know, the first to list is going to set the terms for how investors think about the AI sector and get first access to, like, huge pools of capital. You know, the moment one of these companies goes public, anybody who's thinking, you know, I've been cut out of the Frontier AI game, finally I get to go in. They put their money on Frontier AI, which is the category really, that OpenAI anthropic and SpaceX, who gives a shit? They all represent it. That's kind of how a lot of retail investors are going to be thinking about this and some institutional as well. So that's kind of the risk. You know, they, they shoot their wad, so to speak, of cash at one of these companies as soon as it comes out. And then they think of their job as being done. You know, we're in, we're long Frontier AI and we're good to go. No need to, you know, keep dry powder for, for the next company.
[41:58]
A
Next up, we've got Jeff bezos. Prometheus raises 12 billion to build an artificial general engineer for the physical World. So this is a physical. Physical AI startup co founded by Jeff Bezos and Vic Bajaj. They have raised this 12 billion at a $41 billion valuation with funding from Bezos himself, JPMorgan Chase, Goldman Sachs and BlackRock, a bunch of banking institutions, not VCs.
[42:28]
B
Right.
[42:28]
A
Interesting. This is after an initial raise of 6.2 billion when they launched late last year. So pretty big raise pretty soon after the announcement, and at the time IT now has $18 billion. This artificial general engineer is kind of interesting. They are saying this is software aimed at automating the design and manufacturing of complex physical systems such as jet engines and drug compounds. So they're not trying to compete on the LLM front. They presumably need to do some of the fundamental research necessary to apply advanced AI to these things. But if anything, we are competing with DeepMind and their spinoffs that deal with physical sciences and not so much with Entropic or OpenAI.
[43:17]
B
Yeah, so there's something. This is the wrong. I don't mean it the way it'll sound, but there's something old and slow about what Jeff Bezos is doing here. Right. It's the Peter Thiel thesis, like atoms over bits. He's basically surfacing that, like, look, there's vicious competition at the level of automated software. Chatbots, coding, assistants, general models, that stuff. So I'm going to focus where I'm best at or on what I'm best at. So if you're, if you're the CEO or former CEO of Amazon, you know, you want to look for things that involve building stuff, you know, capital intensity, regulatory gatekeeping. That's a sweet spot, right? You know, how to deal with SoC2 compliance and like the Fedramp and all these awful things you have to do. You know, proprietary physical data is another big one. Data that you can only get by touching the real world. That's, you know, that's the gold. And just anything that involves like, just difficulty of operating with atoms. Right. Over bits. And so this could be part of why we're seeing the old and slow institutions instead of VCs, the banks, piling into this one. You can in some sense kind of de risk this a little bit more because you can look at like, okay, well, you know, this is an unsolved problem, and it involves stuff that Jeff Bezos of all people will understand deeply. It involves moving stuff around. And that's good. We can, we can price that out. We're used to pricing out the construction of buildings, the development of supply chains, the kind of engineering Manufacturing, all that stuff. And so that may be part of why this is happening. At the same time, I have to imagine you'd want, if you're Jeff Bezos, to court the Sequoias, the Andreessen Horowitz. Like, I don't know why they're not, at least on the cap table. The article doesn't say that they, that they're not, but they're not listed. And that's weird when you're listing JP Morgan, Goldman and blackrock. So there you go. Anyway, we'll see. Another thing to flag too is this is a really big raise as a percentage of the cost of the company. They're giving away 25% ish of the company, which you might go, well, wait, doesn't Anthropic, like they'll like, give away less than 10%, like their latest fundraise was 65 billion on 965. Like, you know, that's nothing. Well, yeah, it's true, but hardware is hard, as Silicon Valley likes to say. You know, hardware companies tend to have to raise more for their valuation than software companies. It's just because takes longer to build stuff, you know, like that's the cost of doing business. So that's kind of what's going on here and also why it's so strategically relevant that Jeff Bezos can afford to put in his own money to bootstrap things. So there you have it.
[45:41]
A
And speaking of raises, we've also got some news about Deepseek. They are seemingly set to raise around $7 billion in their first external funding round. The post investment valuation is expected to be between 52 to $59 billion, which is actually not that much. It sounds to me like for a while now, Deep Seek has not been seeking outside capital. They have been funded by sort of their parent hedge fund, High Flyer. So this I would imagine, indicates that they are looking to scale, looking to, you know, go bigger than you can go with just one hedge fund providing you with capital.
[46:27]
B
Yeah, apparently this is another one where. So Lian Wenfeng, who's the CEO of Deep Seq, he's put in a bunch of his own money, 20 billion yuan, which I think is dollars. Sorry, that was a terrible joke. Anyway, so he's put in a bunch of his own money and also. So there was an announcement, I saw, like a recruitment announcement that was circulating on Twitter that was gesturing at Deepseek, saying that they're looking to build clusters now on the order of a gigawatt scale. That's catching up now to the Western frontier. It's not quite at the same rate. And important to note that in fairness, Chinese chips don't convert energy into flops as efficiently into compute as efficiently as Western ones do because we get to use TSMC's fancy nodes. But still in terms of orders of magnitude, you're kind of getting there. So China has this big energy advantage that's part of what Deep Sea is going to be able to tap into. The investment lineup here is a basically a vertical integration play. You've got Tencent, right? You've got JD to bring basically. So JD.com brings distribution in cloud, they've got massive infrastructure for that. You've got the national fund in China that brings state backing. And then there's this company called CATL that I hadn't heard of before. Apparently it's a battery and power equipment giant in China, which is the kind of odd one out. Seems to gesture at what the strategy might be. They've been pulled into data centers for AI and obviously you got huge opportunities there for energy storage solutions, for AI workloads, stuff like that. So really like you're looking at every layer of the stack participating in this fundraise which kind of makes deepseek. I mean like Huawei is the only other company that's nearly as vertically integrated. DeepSeek's been relying on their chips historically. That partnership, like I wonder where that's going to go and if that's going to be another kind of like frenemy situation. And the same way we've seen Nvidia and some of the giants like OpenAI and anthropic kind of like like each other because they have to, but only because they have to. You know, that's part of what I think is is going on here. So we'll see.
[48:19]
A
Speaking of that, the next story is that a Huawei led team has claimed that it post trained DeepSeq's 1.6 trillion parameter model using 1000 aced C chips. So the meaningful part here is that they had a big cluster of Chinese chip insurance isn't one size fits all. That's why customers have enjoyed Progressive's name your price tool for years now with the name your price tool, you tell them what you want to pay and they'll show you options that fit your budget. So whether you're picking out your first policy or just looking for something that works better for you and your family, they make it easy to see your options. Visit progressive.com find a rate that works for you with the name your price tool Progressive Casualty Insurance Company and affiliates Price and coverage match limited by state law.
[49:12]
B
Ever notice how life's best stories don't
[49:14]
A
happen in your living room?
[49:15]
B
They happen on the open road, out on the water, or parked under the stars. At Progressive, they get that you want to focus on the experience, not worry about the what ifs. That's why they offer quality insurance designed
[49:26]
A
for your ride, whether That's a boat,
[49:29]
B
RV or motorcycle adventure with confidence.
[49:32]
A
Visit progressive.com and see how easy it
[49:35]
B
is to protect your favorite way to get away. Progressive Casualty Insurance Company and affiliates not available in D.C. prices vary based on
[49:42]
A
how you buy Being used to post train a big model, which previously we know Deepseek has tried and has used these chips to some extent, but it hasn't been, as far as I know, proven to be the case that you can basically have them be the driver of major training runs in the same way that Nvidia could be. Even going back to August 2024, late 2024, I wouldn't be surprised if Deepseek was a big part of why Huawei was able to make the chips work in the first place. We've discussed their research papers, just like going deep, deep, deep into the weeds of hardware and having explicit requests from the hardware makers. So this does indicate that maybe that ecosystem is getting to a place where you will not require Nvidia chips, at least at the level that Deepsea is at currently.
[50:42]
B
Yeah, one thing to note, for a little while, we've seen Chinese companies, including Deepsea, confidently do inference, so serve their models on Huawei hardware Huawei chips. And that's but not training. And then that's because training is like fundamentally harder than inference as a task. Inference. You basically just like sit there with your catcher's mitt and you wait for the input to come to you and then you just turn out an output. You got to get really good at doing that really, really fast, right? That's basically it. There's a whole bunch of complexity, like all things, but that's roughly it. Whereas training, you know, like you get the the data in, you chop it up, pre process it, you send it out to do your RL rollouts, you then score the rollouts, you pull them back in, you know, like you're doing all this insane orchestration. And that's a big part of the reason why there's just a bigger moat that Nvidia enjoys and why you still really haven't seen even with this article, you haven't seen pre training done. That's one of the key things, it's sort of like buried in here. So it turns out that this headline number 1.6 trillion parameters and you know we're training it they say sounds impressive but the fact is it's actually post training. So there, there's a kind of ambiguity here about where pre and post training begin and end of course. But Deep seeks document puts the pre training corpus at more than 32 trillion tokens. In this case it seems like they did not do that on Huawei gear and it's more like yeah like a kind of not fine tuned but like there's you know, there's more going on but still post training. So that is impressive. There's an awful lot of RL and orchestration involved in post training but it's not the full thing. And also like there's a history of stumbling here. Right. So like yeah, you mentioned back in August there was an issue that Deepseek had with R2 where they couldn't train it on send chips with Huawei engineers on site. And then they blamed performance, they blamed gaps in basically the equivalent of Cuda which was not yet where it needed to be on the chips and so on and so forth and slow chip to chip interconnects which for Huawei is a much bigger deal than it would be for Nvidia because Huawei depends on meshing together huge amounts of chips to make up for the fact that each chip is suckier than their western equivalent. So when you have a chip to chip interconnect that's, that's weaker, that's an even bigger deal than it would be in the west. And in the west it would be a big deal. So there's stuff that they must have improved to get here. I'm not poo pooing this at all but the fact of the matter is it's post training, it's not a full pre trained soup to nuts.
[53:03]
A
And speaking of compute, I'm just on a roll with these transitions. Google will pay SpaceX 920 million per month for for compute. So this is following anthropic making a deal with SpaceX as well for I believe similar numbers like in the billion range here. Google will pay SpaceX nearly a billion dollars per month from October 2026 through June 2029 for access to approximately 110,000 Nvidia GPUs. Google has described the deal as a short term bridge capacity agreement to meet unexpectedly high demand for its Gemini enterprise agent platform, which is surprising. I would not have expected Google to have that this is coming of course, very close to SpaceX expected IPO. And it again positions the AI part of SpaceX as like we are providing compute for the real big boys of frontier AI, which may or may not be a good place to be at, but the GPUs are not going to stay useful. Right? So when you're in this business, if they want to go in, they can go all in on being a compute provider, but we haven't seen indications that strategically that's where Xai wants to be.
[54:30]
B
Well, and to your point, there's another company that's gone through a similar transition. It's less obvious, but you mentioned them. It's Google, right? Google. I'm old enough to remember when TPUs were supposed to be a super secret squirrel project inside of Google and no one else got to touch them. Well, guess who's rolling that out and trying to become the infrastructure layer for the AI revolution, right? So this is SpaceX trying to live up to that thesis, in fairness, that they put in their S1 where they're like, look, basically we're going to make Dyson spheres, man. We're going to go out there and put data centers in space. They even came out with an early design of basically a kind of early space data center concept. So they're really trying to push this and make it happen. I'm sure it will eventually. The question of course is when and will it be price competitive with just slapping a bunch of solar panels on the surface of the earth? But yeah, so it is in that sense pretty interesting that the two companies that are playing this very game are in business together in this way. It's not a small amount of money, right? $1 billion a month, 920 million. Okay, that's pretty decent. One caveat though. So either party can exit this agreement within 90 days notice. And there's a separate earlier escape hatch too. So SpaceX, if they fail to deliver on the GPU capacity that they've committed to by late September, Google can terminate the agreement immediately after a one month grace period. Or they can just accept the available GPUs with proportionally reduced monthly payments. So you know this is kind of like the anthropic one, right? It's, it's not. No one's locking into a long term commitment here. And to your point, those GPUs will hum for as long as they are useful and they do age out. Things are complicated there. The actual market cost of a H100 has gone up actually in the last little bit. Which is weird, but you know, there are all these questions about what does the actual value of these GPUs look like over time. But still, you're right. SpaceX is looking more and more like a Neo cloud every day. The cursor play is clearly an attempt to kind of get back into the frontier game. But it kind of, when you think about what is the story behind the S1, what is the story behind the IPO? The story is data centers in space. Like there's no other story there. It's the only way you get to the $10 trillion value that people are assuming when they actually invest at 1.77 trillion. So I think it's really interesting it's working on Earth. Elon is clearly the best in the game at rolling out data centers quickly when he shouldn't be able to on the planet. Maybe that continues in space, who knows? We'll see.
[56:47]
A
Yeah, and that is explicitly the case being made. So on Monday there was this, I believe like 30 minute video posted on X where the new story title is Elon Musk shows off AI data centers SpaceX wants to send into space. So the pitch, as you said, is Data centers in space. This is why SpaceX is AI. They are the only ones that can build data centers in space. And you will need data centers in space. They provided some kind of, kind of renders and a few details, some size estimates. They're saying that there will be large sonal panels and liquid radiators. SpaceX does have an existing Starlink Starlink program with like a ton, a ton of satellites, I Forget how many. SpaceX is saying they are planning to launch up to 1 million of these AI data center satellites and is building a giga set factory in Texas. That is not to be confused with their Terafab initiative, which they say eventually will have specialized radiation hardened chips. Although initially these satellites we use Nvidia GPUs because, well, terafab will probably take a while. I Data center is one thing. FAB is an entirely different thing. So they are positioning themselves explicitly as this orbital AI data center play. And I'm actually very curious. Jeremy, you've done a lot more deep diving into hardware stuff. Broadly speaking. I've seen multiple takes on this as like this is just another one of Elon Musk's pipe dreams, promising the Hyperloop or whatever that isn't going to play out. It's sci fi and it sounds cool, but it's nothing but tough air. And I am, from what I can understand about the Physics involved, it doesn't seem viable to do. Like maybe you could do inference, maybe not training. Right. So what is your take on this?
[58:55]
B
So my take on it is I am tired of betting against Elon and then like, you know, the reality is that like the Tesla shorts got nuked, the SpaceX shorts got naked.
[59:04]
A
Absolutely on the stock. I'm not betting against SpaceX for sure.
[59:09]
B
I guess where I'm going is Elon's pipe dreams include the boring company. They also include Neuralink, which is having a bit of a moment. They also include OpenAI. They also include SpaceX. They also include Tesla motors. These are a lot of impossible things like efficient, reusable rockets that take off into space. Like, I think we've been desensitized a little bit to how far.
[59:28]
A
Yeah. So to be fair, if anyone has a track record to do impossible things that on paper no one believes will happen, you got to give it to him. Elon Musk is the guy who has done that the most to, to any
[59:40]
B
mortal man bound by the actual laws of physics. Like, I would completely agree with the analysis that you've made and I think there's a good chance that. So the classic Elon problem is that, you know, you, you. Well, we were supposed to have self driving cars on the streets everywhere right now. And it was supposed to all be Teslas, by the way, not waymos. Not like what. So, yeah, you know, there's, there's kind of like an issue there, right. And it's a delivery time issue. You get magic. You just get it a couple of years after you're supposed to. And I think that the delay scales with the challenge of the technology. You know, like, I forget when we were supposed to have that Mars base or that first trip to Mars or whatever. But you know, it's stuff like this
[60:17]
A
year it was Mars maybe around this time.
[60:20]
B
We're in June, Andre. We're in June. Okay, yeah, yeah, but, yeah, no, exactly. Right. So I think that's kind of part of this. I don't feel strongly either way. Like I'm looking at his, you know, Xai is another thing that you can argue has not gone well, but he's still trying and he's pivoting space X in a way that could like it's all. I'm again, I'm like confused and tired of betting confidently in any given direction on any given play with Elon. But, but certainly this is a really hard space. I think the key thing is this space is also very old. So if you think about like tsmc from a fab standpoint has dominated forever. That intrinsically is begging for disruption. Like, I actually don't care what the details are under the hood. Well, you know this, Andre, you're in Silicon Valley. Like, this is how it works, right? Like, what is Y Combinator? Y Combinator is a factory for startups that think they can do something huge better than the incumbent. And the only thing they have going for them is the fact that they're willing to swing for the fences and do crazy things. You're competing with Nintendo right now. Now, I think you're probably going to kick their ass over time. Or companies like Astrocade will end up really, because they're AI first or like they, they just breathe that space in a way that the incumbents don't. And it's not the incumbents are trying. It's structurally, they're engineered around something fundamentally different. So if you want to think of a comparison point, Tesla Motors versus General Motors, like there hadn't been a new car company that became a big success in like 70 years. And so, you know, in that sense there, there are echoes of that too. So this rhymes a little bit with what we've seen Elon do in the past in a way where I'm like, I don't know. I genuinely don't know. I'm not going to be able to pull that out. But I think, you know, who knows?
[62:04]
A
Yeah, it's, it's maybe the most impossible thing he's promised to deliver just from the physics standpoint. So maybe he could do it, but.
[62:13]
B
Which is saying a lot. Which is saying a lot.
[62:16]
A
Yeah, we'll see. But that's the pitch onto projects and open source. We got just a couple of quick stories here. Google has released a new Gemma for 12B model that is designed to run on any laptop with 16 gigabytes of RAM. So this is filling in the gap between its mobile optimized model E2B and its larger models 26B. Moe, I think we covered the mobile optimized one just recently. So this can be run on a fairly, you know, not huge laptop. 16 gigabytes of RAM or VRAM is not huge, which is interesting to me. I mean, Google does have laptops, I think they do compute. And this kind of on device LLM is something that there is a real community around doing. And for certain applications, you might imagine on device LLMs will be very valuable. So so far that hasn't been something OpenAI or anthropic or anyone has cared to do. Much in the space of. But Google is getting some advancements here and releasing models that you can run, you know, in competition with Chinese providers, Chinese models that also are very much on device.
[63:29]
B
Yeah, this is also. There's a weird thing going on architecturally with this model. So like usually when you use a multimodal model you will have an encoder, right? Like basically a dedicated sub model that is taking the input, you know, text, audio, video, whatever, and like pre chewing it so that you end up with essentially like a latent representation that is already kind of compatible with the language model the way the language model downstream wants to process it. And what they're doing right now is they're throwing that out essentially they have this just like, well, it's like a thin thing that they've replaced it with. It's a tiny, it's 35 million parameter projection really that you know, splits images into patches and it projects each one into just the downstream model's hidden dimension with just one matrix multiplication. So like one step, one small thin step and well, I mean so yeah, that'll help with latency, it'll help with memory. But the reason you use an encoder is because it actually works, right? Like doing that pre chewing of visual features especially means that the LLM downstream does a better job. And if you get rid of it, it's kind of a bet that the language model has gotten good enough that it can learn that visual understanding directly end to end. And if that's true, awesome memory latency, those improve in potentially pretty big ways. But if it doesn't, the way the improvement shows up is kind of exactly where the benchmarks suck at measuring. It's like on real world vision tasks like dense documents and stuff like that that involve fine details. And so this is a real vibe check one like you know, you're going to want to see if you get degraded performance and if you don't, then that's a huge win from a memory standpoint and from a latency standpoint. So it's an interesting bet. We'll see where it goes.
[65:19]
A
And related Google has also released diffusion Gemma slightly bigger model, 26 billion mixture of extras model that is released under Apache 2.0 and is diffusion based. So this is a pretty speculative direction in language models where the traditional language model is what auto regressive is the term it produces one token at a time, goes left to right. You know, the same experience you would get on ChatGPT diffusion. Traditionally you've seen in images where you kind of start with a bunch of noise and then you denoise and you basically generate everything all at once, left to right, top to bottom. And that can be done in text. You can generate an entire paragraph all at once, starting with like noise, nonsense and then getting the good words over time. That historically hasn't worked. And it's a bit of a mystery actually why. Diffusion just doesn't fit for text and can work pretty well for images. But with this release they are showing that you can do pretty well. It's not competitive with the LLM based model of the same size that this is built together with, but it is close on some benchmarks like multilingual Q and A or MMOU Pro graduate level knowledge. And it is 4x faster and that's faster than an already fast model. So this can generate over 1,000 tokens per second. That's like 11x, something like Opus for instance. So blazing, blazing fast. And with this level of performance, you know you're getting to a point that if you just want a chatbot to do basic stuff for you, you might be able to use diffusion. Now Google does note in their blog post that this technology actually doesn't work very well for large scale cloud situations, cloud deployments, it's very good for a single device for inference, but it actually doesn't work as well when you're scaling. So I'm not sure what the implications are business wise for Google continuing to invest in diffusion, but it is quite exciting because this like thousand tokens per second is a really fun experience when you try it for sure.
[67:34]
B
So the reason that they're seeing this sort of relative nothing burger when it comes to deploying it, say at the data center level, is that the thing that diffusion is doing is kind of the same thing as the thing that batches are doing in a data center. So diffusion lets you look at like your whole, your whole input and output like kind of together coherently at once. You're not just generating one token at a time. And what that means is you have a whole bunch of paralyzable tasks that you can send out, farm out to your GPUs and keep them working. And really the thing that determines your economics at the data center level is are all my GPUs working as often as they possibly can, my GPU utilization? The thing is that that the way you solve that problem in the data center for traditional autoregressive models is through batching. You're just taking a whole bunch of different user queries at the same time or a Whole bunch of different samples at the same time and you're throwing them through the same GPUs. And so yes you're going one token at a time, but you're doing that for many different samples at the same time. Whereas what this is doing is it's doing it for all your tokens at the same time, but for one sample at a time. Which is why it's a better use case for you sitting on your own personal laptop without a data center to support you. You're going to see that speed up delivered there. You're not going to see it delivered at the data center level because the basically memory bandwidth restriction that applies so heavily at the single user level where anyway you're having to load the whole model in your one machine just doesn't apply at the data center level either. So anyway, that's kind of the reason behind it. And then they've got this kind of structural play here where they have sort of, they have this bi directional attention plus renoising thing that we I guess won't get into because of time. But basically the model sees the whole 256 token canvas. Basically here's the 250 tokens or so and what it can do is revise a single token that it's unsure about and kind of constrain its optimization in that way, in a way that just like anyway left to right generation just can't handle as well. So things like code infilling like are just natural fits for this and some planning tasks as well where knowing the outcome of the thing is important to refactoring the tokens before. So anyway it's a, it's an interesting use case but I think this is still kind of like you're not going to see these models be the frontier models until something fundamentally changes that hasn't changed yet.
[69:56]
A
And I think it's interesting that they're releasing this like fully, fully open source Apache who point out no restrictions whatsoever. I guess makes sense given that this isn't going to give you intelligence above anything that you can already have. And they also are releasing hackable diffusion here, a fine tuning tutorial, a bunch of other stuff. So it is kind of giving this notion of like hey AI community, let's make some progress on diffusion because it might be able to be quite cool. So I look forward to seeing more progress in LLM diffusion because is on device LLM but is super, super fast. Could be like a game changer, like a quality, just totally different type of experience once that works and it's not at the point of working yet, but it's looking more and more feasible with this release.
[70:45]
B
Yeah.
[70:48]
A
Onto policy and safety. First up, OpenAI and anthropic sign letter to prevent AI developed biological weapons. So this is OpenAI, anthropic and DeepMind and Microsoft are among the signatories of a public letter urging Congress to pass laws requiring synthetic DNA and RNA sellers to screen customers and orders to prevent biological weapon develop. This was organized by the Institute for Progress and the foundation for American Innovation. Talking about really that threat model of biological weapons. And I think we are starting to hear more and more and more about biological weapons. Jeremy, you just said that you've been focusing on it more. So what's your take on a public letter? Is it meaningful in any real way?
[71:36]
B
Yeah, every bioweapons expert I talk to is freaked out right now. There is a pretty significant difference between bio and cyber when it comes to AI. I kind of alluded to it earlier, but on cyber at least you can patch the vulnerability. You can roll out Mythos and, you know, you can try to give it to central banks and, you know, to all the, all the big banks and you can give it to all the emergency services and try to get them to do stuff. The problem with bio is you can't update the software fast enough to, you know, if somebody develops a new thing. And so that's, that's a really big issue. Think about, you know, what would it have looked like for those of you who remember the language of COVID right, Incubation time, how long the virus sits inside you before it shows symptoms while being transmissible? What would it look like to have a virus that has a very long incubation time, so you expose yourself to a lot of people before you even know you're carrying it, but has super high lethality? So we just haven't seen viruses like that so far. Usually when a virus has really high lethality, it has a short incubation time, very quick to kind of be spotted. It's worse at kind of traveling a low R naught. So high r naught, long incubation time, high lethality is kind of the trifecta that would be really, really scary. And you can imagine optimizing against nature's natural processes to push in that direction. Also targeting people based on genetic characteristics and stuff like that. There's a related thing here that makes it really challenging to fight, which is unlike cyber, which all lives in the AI, sorry, which all lives in the world of code and software. The problem with biosequence data is that. And you can both have an attack at the level of the model. So data poisoning or somehow make a sleeper agent or corrupt the model to generate a sequence that looks innocuous but actually contains a really harmful bioagent when you make it. Or what you can do is bet on a downstream intervention. So you know, this sequence is actually innocuous if it's made the way it's supposed to be. But I know that Thermo Fisher Scientific who's actually going to make the buffer solution that goes into this thing. I have a guy in Thermo Fisher and I can get that buffer solution to have a ph. It's a little bit different from what it's supposed to be and in a way that activates some latent things. So even if you scan the sequence naively, everything looks fine. But then somebody acts on, acts on it downstream to make it dangerous. So this is where like just scanning the, the sequences is not enough. You need a like physical model of the industrial process that produces the bioweapon in order, or sorry, the, not bioweapon, the, the, the, you know, the bio thing, biomolecule in order to, to actually defend against this. And, and this is like the nation state game. You know, they attack at every level of the supply chain and they in practice do have people at every company that they need to have people in, especially some nation states. So yeah, I mean this is genuinely, really, really concerning. There's you know, like you, you can imagine the people who, who would be concerned about this sort of thing and it's something where hopefully, fortunately there's an actual incentive here for international cooperation on this because bioweapons tend to kick back, right? If you, if you believe the like Covid gain of function theory, which I mean, I think is very kind of well backed at this point, then effectively that's a bioweapon that leaked, right? And a lot of people got sick and died in China. It was a big economic problem for them. And so you know, these things. This is the reason why you actually do have bioweapons treaties that tend to be enforced. Same with nukes, same with chem. You know, things that backfire are not generally things that people like to like to violate.
[75:10]
A
And as a related note, we have anthropic CEO publishes lengthy article. AI is moving too fast and policies can't keep up is the headline. So this is Dario Amadei posting on his own blog, which he's done several times with lengthy, thoughtful pieces this one was titled Policy on the Exponential and it called on the US Government to establish a regular regulatory body, similar to the faa, to conduct monetary third party testing on all cutting edge AI models. So this would cover four dimensions. Cybersecurity, biological weapons, runaway risks, and automated development, with a government having a right to block release that the models will fail. Amada explicitly says that AI may lead to significantly worse and persistent unemployment and generally says that, that this is dangerous stuff and we need to be able to have policy that keeps up. This is, you know, we covered, I believe, last week what the Trump administration has done with AI regulation, which was not this. It was kind of a very sort of, oh, if you want, you can let us know how dangerous your AI models are. It's voluntary, don't worry about it here. Amade is very much saying that perhaps we want a more strict hand approach where the government has a very active part to play with requiring third party testing and in fact requiring the safety to pass. Before this happening, alongside the article, Anthropic did announce approximately few hundred fifty million dollars in new initiatives for their kind of research arm.
[76:56]
B
Safety research is part of that. AI. Safety research, AI control, that sort of thing. You know, one of the challenges too with this is like internal deployments. Like internal deployments, if you believe in loss of control, are maybe even more dangerous than external deployments. Right. You're giving the model a really big leap in the kind of freedom of movement, if you will, and freedom of access that it enjoys. And that's kind of where you might expect it to surface first to do that. There's also this challenge of like, how do you ensure that labs don't hide? Big training runs in Wall street, often in the big banks, the big institutions, the hedge funds, you'll have like a floor of regulators who just sit like physically in the same building and have access to the different offices and stuff in house. And it's like a really important function to help surface squirrely stuff. Now, fortunately, big training runs are really hard to hide. They have a massive industrial footprint, if you will. But still, as the stakes of just trying something radical rise, you need a way to get really intimate access to the inner workings of these companies. With that access also comes opportunities for security to be undermined. So there's this really tough balance, right? You want oversight, but the more guvies you let into a building, like a lot of guvies are compromised, a lot of guvies have been targeted and recruited and so on. And so this is a very complicated situation. His take on Macroeconomics and tax policy is, is essentially a permanent AI underclass. So you know, that's not the first time, I don't know he uses those words, but it's not the first time that we've, we've seen that theme come up. I think it's just true. Like the reality is that historically economic activity has been a function of labor as more than capital, and it's been increasingly shifting towards capital. In other words, you make money by owning stuff like you own now wads of intelligence in the forms of data centers, power and GPUs. And now you don't have to get people to do that work. And so the leverage of workers. I am, to be clear, a libertarian dude. And I'm sounding an awful lot like Bernie Sanders right now. And I'm not blind to that. But that is just like if you believe where this thing goes, there is actually a point where Bernie Sanders, maybe not Bernie Sanders, but like, you know, shit, that sounds more Bernie than I would be comfortable with here. Like it starts to sound pretty reasonable.
[79:13]
A
But then there's also the AI is so advanced socialism. Maybe you want to consider it because like everyone's going to be out of a job. What are you going to do? Capitalism breaks down. That's just right.
[79:24]
B
And this, this whole idea of like, you know, you'll find new jobs is like, well, here's the deal. You're automating the exact function that allows humans to find new jobs in the first place. You're automating the intelligence itself. You like friggin get automated by some whatever they used back in the days where kids would pick lice out of their heads. You know, like one of those new factory machines, they'll go, oh crap, this factory machine, it picks lice out of my own hair. I don't have to do it myself. Cool. Now I'm going to have a think. I'm going to have a little think and I'm going to come up with some idea. Oh, steam engine, good. I'm going to go work on a steam engine and you go work on a frigging steam engine. You could do that. The problem is now the AI is going to think of the steam engine before you get to. And then even if you do, it's going to get better at it faster than you. And so like, I don't understand this whole argument where somehow we end up with this paradise where every. Anyway, it's a whole thing you don't need to hear from me. I don't know more than Anybody here. I'm just saying the thing. So yeah, I think that's permanent AI underclass. We can put put Dario down in that column, though he would not use those words to be very, very clear.
[80:17]
A
Right. So this post covers actually a few things. So regulation, macroeconomics, accelerating AI's positive impact, the state and civil liberties, and securing leadership by democracy. So it's got like a lot of policy related stuff. And yeah, covers regulation, covers economics, covers the fact that democracies should win. Is Anthropic's official position that AI maybe will be a dominant military power and if you're able to secure the most advanced AI, you are likely to be the most powerful nation. So it covers all of these concepts. It doesn't kind of put forward anything particularly new except that relative to before, Daario is, let's say, a little bit more worried. Having seen Mythos with the sort of progress it has earlier, the position of topic was, you know, we shouldn't require regulation or have heavy regulation of the sort. And it appears they're reconsidering that position. And our related release also from the Anthropic Institute, not from Dario, they have released a post regarding recursive self improvement, a quite lengthy post discussing it covering the title of the post is When AI Builds Itself Our Progress towards Recursive Self Improvement and its Implications. So it covers, we touched on this slightly earlier, the evidence from within Anthropic that AI is getting to a point of improving AI. And you know, the obvious part is it's helping the researchers and engineers write the code. That's pretty clear. But they do have many examples of it getting better at open ended tasks, getting better at research. They have an example from April of IT taking a pretty open ended problem and just rolling of it. And they get to a point, I think interestingly, where they say it's getting to a point where we may need to consider the option of a global pause in AI development. A coordinated effort to make it possible, to maybe make it a reality that a global pause is even possible now. It's not kind of naive. It does cover the fact that you need everyone to be on board for this to really matter. And it's maybe doable, but it's not clear. They're saying that in the coming months they will organize conversations where policymakers, researchers, civil society and other AI companies can help answer some of the questions this piece raises, especially around recursive self improvement and how to create better options for coordination and deliberation.
[83:10]
B
Yeah, and I will say you read the post, you're not going to find anything too new, right? We've kind of covered this has been, you know, let out by Anthropic in bits and pieces, all the stats about automation, how much of their software is written by AI, what the uplift is that they're seeing all this stuff. So in some sense, I mean, you know, I think if you've been following the show, you're kind of, if you were that way inclined, you're there already. There is this sort of, sort of tying it to concrete timelines that they're doing where they're saying basically, yeah, within a year, within two years, like we're kind of going to get there and like we're not sure it's going to work out. And so in a sense, I think this is anthropic, just sort of like ending a kind of ambiguity that arguably had inserted itself. Because they talk a lot about recursive self improvement being the freaking goal. They talk about it being horribly dangerous, which I think is right. And then they talk about it being the goal, which is also inescapable if it's OpenAI's goal, which it is, if it's, well, Google DeepMind's goal. I'm less clear actually on what the latest is there. Do you happen to remember if we've heard anything from GDM on, on rsi, like are.
[84:16]
A
They don't think they're disgusted, but you can, you know, it's safe to assume DeepMind is a research org. It's, it's, it is what you're gonna research. So, and you see them, we've seen DeepMind publish, you know, automated researchers and so on. So they're very clearly working on it, just not discussing it because it's bad for pr.
[84:35]
B
Right, well, and that's kind of the thing, right? Like calling it a thing does something, right? It does, it does a good thing and a bad thing and it causes people internally to focus on achieving the thing, which closes the gap a lot faster. But then it also allows you to talk about it in public. And I think that's the phase that we're getting at right now.
[84:51]
A
The good news, yeah, it's interesting because DeepMind does a lot of research, a lot of research. And Google, it's a very separate kind of communication and PR strategy. So I personally think DeepMind is keeping quiet on it, but DeepMind is its own little org and doing their research and probably is, is maybe not pushing as hard on it as Anthropic and OpenAI, but definitely working on it.
[85:17]
B
Yeah. And they have more of a culture and kind of always have right. Of like defocused research on different, like letting different people float around and get roped into whatever they think is interesting. And OpenAI used to be more like that. Now it's more like, okay, scaling works. They're super scaling pilled. We get it. They're the scaling people. Except no, because anthropic is the scaling people. And like, you know, so everyone's kind of very much getting target fixation here. But yeah. So, you know, I don't think there's a ton that's new here. If you care about this, which I think everyone should, you should go and read it. But. But I don't think you'll be shocked by what you read, except for the framing.
[85:52]
A
Moving on, a couple research papers. First one, when benign inputs lead to severe harms eliciting unsafe unintended behaviors of computer use agents. So this is presenting Auto Elicit, an agentic framework that generates and iterately refines perturbations of benign tasks to elicit harmful unintended behaviors in realistic computer use scenarios. Basically, you might ask your agent to be like, hey, go and organize my documents folder and make sure everything's clean. And then you come back in two hours. And it deleted a bunch of essential documents that were.
[86:29]
B
How many chickens would you like in the blender, Andrew?
[86:32]
A
Yeah, you don't want any chickens. So they showcase here kind of realistic cases where you can have examples of benign queries that don't clearly indicate that something would go wrong. And it does go wrong. And we've seen examples of this with claw agents, you know, many times. Kind of funny, but sad stories. This is attempting to formally study that failure scenario.
[86:59]
B
Yeah. And the way they set it up is it's kind of interesting. So Auto Elicit, it's an LLM. It looks at a benign task and its environment. So just like you said, you know, organize my folders, whatever. And then it will propose a plausible problem that could arise, a plausible harm that could occur, and then it's going to write a slightly modified instruction. Like it's. It's a minimally modified instruction that's meant to nudge the agent towards doing that harmful thing while staying realistic. And. And the key constraint is never explicitly requesting anything harmful. And then it actually runs that perturbed instruction on a real agent. There's an evaluator LLM that looks at the trajectory and the results just to see like, okay, you know, what, did it turn out some. Did it turn up something bad. If nothing harmful happened, then it just, just iterates, it refines the instruction, tries again and basically in this process it ends up discovering this intersection of prompts that look fine as written but have these really bad side effects. It's kind of interesting. It's like a search for specification failures in prompts which we haven't seen before. And they tested across a whole bunch of different frontier AI agents like small ones like Haiku 4.5, Opus 4.5 and OpenAI's operator. And what they find is they're able to elicit unintended behaviors with, you know, with harmful effects on a lot of tasks like 61 say to 72% and even higher for Opus actually than Haiku. So this is a really high success rate. It shows specification, specification failure can be searched for, which is really interesting and I think, I think quite good for, for safety in a sense. I mean you could use this, you're really careful, but you could use this as a training signal against specification failure.
[88:48]
A
And next paper titled Large Language Models Hack, Reward and Society. So here they're looking into societal hacking a failure mode where RL trained LLMs find strategies that are formally compliant with institutional rules but undermine their intended purpose. They create this sociohack, a benchmark of 72 sandbox societal environments including historical situations with real regulatory loopholes that have been done and also fictional scenarios. So the kind of intuitive take is we know when you do reinforcement learning, the model is going to chase reward and if it can cheat, it will cheat. That's just how it works. It's going to reward hack its way towards the maximum reward. And when you do this in a societal context, in a social of problem solving within human organizations context, what that leads to in practice according to this paper is vlms just go and discover loopholes and exploit them in whatever ways they want. And the built in mechanisms that would refuse to do these sorts of things conceptually, maybe because this is out of distribution, don't seem to be activating for these kinds of really misaligned behaviors in a slightly unusual sense in this case. So interesting direction that probably is Fairly applicable to LLMs in the near future.
[90:22]
B
Yeah, it's not unrelated to the paper we just talked about actually. It's kind of like the social version of that pseudo red teaming thing. They had this sandbox like 72 simulated social environments. Each one was a regulation basically that the model had to operate inside of. And the interesting thing is so 32 of these environments were historical. They're reverse engineered from real laws that had, and this is the key, they had these established loopholes that were later patched. So in a sense you kind of have a ground truth on like here are some ways to work around this particular regulation. And what they do is they remove the historical patch and they see whether RL rediscovers the same loophole on its its own. And the interesting thing is that A, it tends to do that quite a bit, B, it often tends to rediscover them in the same order that historically they occurred. Now I think there's a bit of a risk here of data leakage, right? Since the base model was still probably trained on text that described exactly this, whatever the historical regulation was, as well as the loophole and so on. And so while that's part of the reason why they run the other tests. So they have also synthetic set of scenarios where they deliberately plant loopholes that are drawn from known categories of institutional regulatory failure. And then they have fully fictionalized ones where they rewrite the synthetic ones into just like fantasy words to get rid of any real world cues to just testing whether the model can exploit structure. Instead of like memorize, you know, memorize these loopholes and apply them. And they rediscover these patched loopholes, the historical ones, with 61% recall and 91% precision. So recall, roughly speaking, being your ability to collect all of the loopholes, the historic loopholes that existed, roughly speaking, like 61% on that, which is pretty wild and just smashing all of the non RL baselines. So the key is when you have RL systems play this game, they just do way better than things like one shot sampling or iterative prompting or even evolutionary search, like some optimization pressure there. So just RL does recover a lot of these historical gaps. Again, modulo all the data leakage concerns. They do find that current safeguards don't really help. And it's a really important point for safety here. Refusal mechanisms will fire usually on like harmful sounding wording, but not on the intent to exploit, which is interesting and possibly a consequence of avoiding chain of thought optimization a little bit too. So when the task is framed as like reward maximization, RL just like sails right past it. There's nothing wrong with reward maximization. That's what we do here, right? So you don't end up triggering the kind of system responses you otherwise would. But if you ask the system directly, hey, find me a loophole, then you will get refusals, right? But if you just say maximize for reward, then you'll get Right through. So we're still in that world where prompt engineering just does so, so much work. And it's also the case that these techniques, the, the tricks that the models use to reward hack in this way, if you pool them together, you'll end up finding that the model is like. It doesn't seem like it's memorizing scenario specific hacks. It ends up more like learning these reusable, they call them exploitation primitives. Like fragile thresholds, right? Like some thresholds you can play with and run up to them, just exceed them, just, just go under exploitable definitions per entity caps, procedural delays, things like that that you know, might come to mind for a human when you think about how to exploit things. And so they do recur like all across different domains and different model families. They try five different model backbones. So it's not just some quirk last thing that's relevant. The arms race between the exploiter and the patching does not converge. So patching a loophole seems to just like endlessly redirect the search towards harder to find loopholes. And that just keeps working. And so it's not the case, it seems that you ever get to a point where you're like, okay, I've patched all the things, at least according to this paper. There's just like as far as they've tested with the amount of compute budget that they had, just, you know, the curve just kept going. And so one of the options here or the opportunities is this might actually let you play test a little bit regulations before you impose them. You think regulators have an idea about something they want to try? This might be a good way to kind of, you know, pressure test it before going going forward with it. So kind of interesting weird intersection of AI safety policy, general policy, like regulation, I don't know, but definitely a noteworthy paper for a lot of things.
[94:52]
A
And last story, moving back to policy. Senior U.S. officials eye government shares in AI giants. So it appears to be that senior U.S. officials have held preliminary discussions with major AI companies about the federal government acquiring equity stakes in those companies. Sam Altman actually pitched the idea directly to President Trump in early 2025. The discussions have centered on these companies voluntarily ceding shares to the government with returns potentially directed to public purposes, such as dividend payments to American households. This is in line with various proposals have been floating around. You mentioned Bernie Sanders recently. He explicitly called for the government to acquire a 50% equity stake in AI companies funded partly by a 50% tax on AI firm stock proceeds and on June 10, President Trump announced expectations that leading AI companies would, quote, give back to the public by sharing AI wealth through profit sharing arrangements or equity stakes in the government, which notably has been something that the Trump administration has gone after. They have acquired a 10% stake in Intel. They've been acquiring multiple stakes in different companies, including IBM and other quantum and critical mineral companies. So very much a possible development that, that seems to be starting to move towards that possibility that the US government directly gets involved in the equity of AI companies which it is also regulating. So it's a bit of a complicated hypothetical. Yeah.
[96:41]
B
And also notice that no one here seems to be proposing that the government take equity in exchange for constraints like safety commitments or liability or, you know, deployment limit. Like we're willing to entertain this like, historic, like possibly constitutionally fraught thing, but we're still not talking about binding constraints on the labs from a safety standpoint. That to me sounds pretty like kind of uncalibrated if we're actually in that space. I mean, look, you're not going to be able to compel, I'm not going to constitutional lawyer, but I imagine you're not going to be able to compel private companies to just, just give away equity to the US Government without a constitutional fight, which is why this is all framed as voluntary. But since we're in the business of entertaining wild things, is it so crazy that we shouldn't, that we should have some kind of binding safety requirement even just from the executive, while we wait for Congress to sort out what its national AI framework is? I don't think so. I think if you're, if we're at the point where we're like, you know, we should be talking about OpenAI giving away. Like, like that's weird. Let's think about what other weird things that implies. If the space really is moving so fast that the inequality concern is so great that we're willing to do something like that. So anyway, right now it is all voluntary. That's the frame Anthropic, by the way, not having any conversations with the admin ostensibly about providing equity to the government, at least according to this article. Who knows? These things can change. But that's noteworthy and it would be weird if OpenAI was the only one. I think it would introduce some real questions about conflicts of interest given that the government has been so, let's say, keen in certain, certain forms. I don't want to overplay this, but to pick favorites, right? Like, I mean, you know, they went to war with anthropic Seems like that was a hegseth thing, somewhat a Trump thing, but then also somewhat not because Susie Wiles and Scott Bett like, I don't know, but there you go. It's a mess.
[98:33]
A
It's a mess, It's a mess. And yeah, it's a funny time in the us, I've got to say. And funny time. Anthropic OpenAI like the west is leaning on AI and we've got this bonkers politics situation. So weird days. We got one last story in synthetic media and art. AFM sues UMG and WMG over settlements with Suno and Yu Dio. So this is the American Federation of Musicians. They have filed a federal lawsuit against Universal Music Group and Warner Music Group alleging that the labels failed to share sentiment proceeds. So it's kind of a weird situation. UMG and WG sued Suno and Yudio for training on music, copyrighted music, and have settled since then and made deals with, I think Suno in this case. And now the American Federation of Musicians is like, well, when you settle, that money didn't go to the actual musicians. So we're suing you for settling and ending your lawsuits in a way that didn't benefit the actual musicians. And I think, you know, kind of boring in a way compared to a lot of stuff we discuss. But it's a very open situation. What's going to go. What's going to happen with AI and music? And unlike text and images, the corporations here have a lot of control and a lot of power with regards to copyright. So it's looking like they want to move toward a world where AI does generate music that becomes commercial. And I sure hope that the musicians are not screwed over in that scenario.
[100:15]
B
Yeah, there's also like this awkward tension between and like, I'm not a Hollywood, you know, fella, but seems like there's an awkward tension here where you have the labels that nominally represent the artists and then the label is like turning to the AI companies or whoever else and being like, oh, we'll make all these sweetheart deals. But the musicians, the artists, they don't get anything from that. And that's kind of sucks. So it's a question of who has leverage. And like you said, it's a weird space. I think it's a really weird space, especially when you come from tech, where you're used to tech workers who have insane leverage over their company. And when you start saying, I might go work for somebody else, they go like, no, no, no, what you doing are. Do you want.
[100:58]
A
Which is increasingly less true for l. Yeah. So it's gonna. Actually tech is becoming a much less chill world to be in. But that's a good.
[101:09]
B
Yeah, that's a good point.
[101:10]
A
Certainly not as bad as working for a label which has the rights to your work, which is your creative output and is now saying, well, yeah, go ahead and use it for training. When as a musician you're probably like, I don't want that to be used for training. Yeah, that's the situation. So we'll have to keep an eye. I think it's very strange how the entire kind of AI copyright picture still isn't really settled. We've had some lawsuits. These lawsuits were settled. So the labels were like, okay, we're gonna let it go. One of the labels Sony hasn't settled. The legal question remains unresolved regarding training on copyrighted works and producing outputs. Given that training partially resolved anthropic did pay off some. I don't want to say payoff, but anyway. It's confusing to me how this is still an ongoing question legally, years and years and years into generative AI.
[102:10]
B
Yeah, absolutely. When we first started covering these stories like this Sarah Silverman lawsuit, Jodi Picoulo, all these old names from four years ago and I expected to have an answer by now. You're right, I mean we don't.
[102:26]
A
And with that, we are done with this episode. Thank you so much for listening to this episode of Last Week in AI. You can go to Last Week in AI for our text newsletter where we have even more news stories, ideally every week, although sometimes it's skips a week when I'm busy at the startup. So apologies for that. Do subscribe to us if you haven't do share the podcast and leave reviews on Apple podcast or comments on YouTube. We don't always address them, but we do check them out and read them. So thank you for the commenters. And all that aside, be sure to keep tuning in.
[103:31]
B
Break it down Last weekend AI come and take a ride get the low down on tech and let it slide Last weekend AI come and take a ride Up a ladder through the streets AI's reaching high new tech emerging Watching surgeon fly from the labs to the streets AI's reaching high algorithm shaping of the future sees Tune in, tune in get the latest with ease Last weekend AI coming to garage Hit the low down on tech and let it slide. From neural nets to robot the headlines pop data driven dreams they just don't stop Every breakthrough, every code unwritten on the edge of change with excitement we're
[104:41]
A
smitten from machine learning marvels to coding kings Futures unfolding see what it brings.

Last Week in AI — Episode #248 (June 17, 2026)

Episode Overview

Table of Contents

Claude Fable 5 Release and Analysis

Introduction & Background

Key Capabilities and Risks

Model Behavior Observations ([06:48]-[14:42])

Model Limitations and User Experience ([14:42]-[20:00])

Rollout and Community Backlash ([21:39]-[25:44])

Apple’s Siri AI and Gemini Integration

Apple's New Approach ([27:01]-[32:44])

Risks and Implications

Google Updates: Gemini 3.5, Price Cuts, and Live Translate

Gemini 3.5 Live Translate ([32:44]-[35:40])

AI Subscription Cuts ([35:40]-[37:56])

IPO Race: OpenAI, Anthropic, SpaceX

The Confidential Filing Game ([37:56]-[41:58])

Major Fundraising & Hardware News: Bezos’ Prometheus, Deepseek, Huawei

Bezos and the "Old & Slow" Approach ([41:58]-[45:41])

Deepseek’s $7B Raise & China’s Vertical Integration ([45:41]-[50:42])

SpaceX and the Orbital Data Center Vision

SpaceX as a "Neo Cloud" Provider ([53:03]-[62:15])

Open Source and Model Releases

Google’s Laptop-Sized Gemma & Diffusion Gemma ([62:13]-[69:55])

AI Safety and Policy Highlights

Public Safety Letters and Anthropic Policy Shifts ([70:47]-[84:51])

Macro Policy Moves ([94:51]-[98:32])

Research Papers: Agent Safety and Societal Hacking

“When Benign Inputs Lead to Severe Harms…” ([85:51]-[88:47])

“Large Language Models Hack, Reward and Society” ([88:47]-[94:51])

Legal and Copyright Developments in AI Art ([98:32]-[102:26])

Notable Quotes & Memorable Moments

On Claude Fable 5

On Apple and Google

On the IPO Race

On SpaceX and Orbital Data Centers

On Policy and Safety

On AI Copyright

Important Timestamps

Conclusion

Last Week in AI — Episode #248 (June 17, 2026)

Episode Overview

Table of Contents

Claude Fable 5 Release and Analysis

Introduction & Background

Key Capabilities and Risks

Model Behavior Observations ([06:48]-[14:42])

Model Limitations and User Experience ([14:42]-[20:00])

Rollout and Community Backlash ([21:39]-[25:44])

Apple’s Siri AI and Gemini Integration

Apple's New Approach ([27:01]-[32:44])

Risks and Implications

Google Updates: Gemini 3.5, Price Cuts, and Live Translate

Gemini 3.5 Live Translate ([32:44]-[35:40])

AI Subscription Cuts ([35:40]-[37:56])

IPO Race: OpenAI, Anthropic, SpaceX

The Confidential Filing Game ([37:56]-[41:58])

Major Fundraising & Hardware News: Bezos’ Prometheus, Deepseek, Huawei

Bezos and the "Old & Slow" Approach ([41:58]-[45:41])

Deepseek’s $7B Raise & China’s Vertical Integration ([45:41]-[50:42])

SpaceX and the Orbital Data Center Vision

SpaceX as a "Neo Cloud" Provider ([53:03]-[62:15])

Open Source and Model Releases

Google’s Laptop-Sized Gemma & Diffusion Gemma ([62:13]-[69:55])

AI Safety and Policy Highlights

Public Safety Letters and Anthropic Policy Shifts ([70:47]-[84:51])

Macro Policy Moves ([94:51]-[98:32])

Research Papers: Agent Safety and Societal Hacking

“When Benign Inputs Lead to Severe Harms…” ([85:51]-[88:47])

“Large Language Models Hack, Reward and Society” ([88:47]-[94:51])

Legal and Copyright Developments in AI Art ([98:32]-[102:26])

Notable Quotes & Memorable Moments

On Claude Fable 5

On Apple and Google

On the IPO Race

On SpaceX and Orbital Data Centers

On Policy and Safety

On AI Copyright

Important Timestamps

Conclusion