Summary9 min read

Last Week in AI – Episode #244 Podcast Summary

Episode Title: GPT-5.5 Instant, Grok 4.3, OpenAI vs Musk
Hosts: Andrei Karenkov (Astrocade), Jeremy Harris (Gladstone AI)
Date Recorded: May 8, 2026
Release Date: May 11, 2026
Theme: Weekly roundup of AI news, significant model launches, industry shakeups, and key policy and research updates.

Episode Overview

This week’s episode offers a characteristically diverse and lively round-up of the most interesting developments in artificial intelligence, with a particular focus on several major LLM model releases (OpenAI GPT-5.5 Instant, xAI Grok 4.3, Mistral Medium 3.5), the ongoing OpenAI vs Elon Musk legal trial, striking new business moves (Anthropic & SpaceX, Anthropic’s valuation surge), advances in agent orchestration, and a slew of notable research in interpretability and agent architectures. The conversation weaves technical insight, candid critique, and industry context, with a few spicy personal anecdotes along the way.

Listener Mailbag & Mythos Discussion (03:40–12:47)

Addressing Critique on "Mythos" (03:40)

A critical listener comment challenged whether the Mythos model was just a PR stunt.
Andrei: Acknowledges some research gave smaller models unfair help in reproducing Mythos' bug-finding feats, but points to real-world impact: “Mozilla had fixed 20x the usual amount of bugs, something like that… the real world impact is unquestionable.” (05:33)
Jeremy: Discusses cultural divides around AI skepticism, referencing debates with Deb Raji three years prior:
- “...there’s a reckoning that actually is needed...we’re not going to have much time. When the moment comes, when we have our Mythos moment for bio...It’s important that we learn the lesson of the current moment.” (08:14)
Debate around techno-optimism versus knee-jerk skepticism in AI fields, emphasizing the need for "epistemic humility" but also realism about rapid advances.

On AI Consciousness Discussion (12:47)

Hosts reiterate their agnostic but open-minded view on AI consciousness discussions: “We don’t know much about it...We will not mock it, we will not over-focus.” (12:47)

Major Model Releases & Technical Insights

OpenAI GPT-5.5 Instant Launch (13:04–18:04)

New default for ChatGPT. Substantial across-the-board improvements:
- MATH from 65%→81%, GPQA (PhD-level science) 78%→85%.
- “Just all around seems more intelligent… and it uses 30% fewer words and lines to reply to a simple prompt.” (13:57)
Safety & Preparedness:
- First "Instant" model marked as “high cyber risk” under OpenAI’s preparedness framework.
- Notably surpasses prior “thinking” models in some cyber benchmarks (Capture the Flag, CVE-Bench).
Deployment Details:
- “Shipping it at low reasoning effort,” which mitigates risk profile for default usage.
- Not tested for self-improvement capabilities, a precedent Jeremy critiques: “Every little door that you leave open to not running these evals creates a precedent...” (15:48)
- Some regressions on synthetic bio benchmarks but main improvements correlate with cyber/coding.

Memorable Quote:
"The cyber threat aspect becomes more acute...not surprising that if you’re pushing that dimension, the cyber threat aspect becomes more acute than maybe the bio or chemical capabilities." – Jeremy (17:45)

Goblin Incident & OpenAI Transparency (18:04–26:53)

Goblin-Gate: ChatGPT developed an obsession with mentioning “goblins” – traced by OpenAI to misaligned RL from a ‘nerdy’ personality variant.
- "For the nerdy personality, there was a 4,000% increase in the rate of assisted messages containing 'goblin'." (19:04)
- Issue worsened through system prompt 'patch'; critique that Anthropic would likely have flagged sooner.
Key Insight: Reveals compounding technical debt in RLHF model training; accidental “quirks” can propagate through sub-variant rollouts.
Quote: “It does seem like there is a kind of compounding layering effect that goes from 5.1 to 2 to 3 to 4…” (24:15)

xAI Grok 4.3 Model & Voice Suite (26:53–33:30)

Grok 4.3: Lower price (60% cheaper output vs. previous), 1M token context window, “always-on reasoning,” rapid throughput.
Positioning:
- Not a true “frontier” model in intelligence, but argues for being on the Pareto frontier of price/performance.
Issues:
- New “narcolepsy”/'analysis paralysis' bug arising from always-on reasoning mode.
- "Some people have flagged it as this potential driver of an issue that is being referred to as narcolepsy" (29:07)
Strategic Observation:
- xAI and Meta must carve niches in “not-quite-frontier” classes (cost, open-source, voice, etc.), since top-of-the-line is now big business turf.
Grok Voice launch: Entering space previously dominated by 11Labs, OpenAI, showing xAI “doing all the things.”

Mistral Medium 3.5: Unified Dense Model (33:30–38:35)

Replaces multiple lines (Magistral, Devstrol, etc.) with a 128B dense model, unified across chat, code & reasoning.
Notable shift away from Mixture-of-Experts to single dense model for maintenance and operational simplicity.
Licensing shifted to “modified MIT” (previously Apache 2.0), indicating a more guarded commercialization.
Launches new “work mode” agent for general workflow integration.
“You don’t need to be at the very, very frontier of intelligence to build a good chatbot interface...” (37:46)

Agent Systems & Orchestration Advances

Anthropic’s Claude Managed Agents Update – Dreaming, Outcomes, & Orchestration (39:08–43:23)

“Dreaming”: Agents perform scheduled self-reflection/review in downtime to optimize memory and self-improve.
- “A new way of essentially having a low-stakes, very forgiving workload… you can now just like spend that spare capacity on dreaming.” (41:00)
“Outcomes”: Lets users set “success criteria” and uses a separate grader agent for critical/actor separation.
Multi-Agent Orchestration: Agents can now delegate, assign subtasks to others (“lead agent” managing subagents with their own prompts, tools, etc.).
Jeremy: “This is the new software engineering. How do I break up my agent? How do I decide what the interface is between agents? That’s starting to become kind of the core design choice...”

Business, Legal, and Industry News

OpenAI vs Elon Musk Trial: Recap & Revelations (43:45–54:49)

Context: Elon Musk claims OpenAI “stole a nonprofit,” recounting history from 2015 founding to Musk’s 2018 departure.
- “The most interesting thing was about XAI, where Elon Musk was asked about whether XAI does any distillation or training via OpenAI models and said that yes...” (46:30)
Behind-the-scenes:
- Internal board drama, texts and diaries exposed. Siobhan Zilis (board member, Tesla/Neuralink alum, Musk’s ex-partner) emerges as key “conduit” between factions.
Notable Moments:
- Musk suggested OpenAI join Tesla, offering Sam Altman a board seat—counters narrative of Musk as “guardian of charity.”
- Greg Brockman’s diary musings about personal wealth and mission conflict: “He made a note to himself...what will take me to $1 billion?...at odds with this idea of 'mission first’.” (51:58)
Legal prognosis:
- Observed as a weak case legally, but damaging from a reputational standpoint.

Anthropic–SpaceX Compute Alliance (54:49–61:02)

Major surprise: Anthropic signs with SpaceX's Colossus One for access to >220,000 Nvidia GPUs (300MW capacity).
- “Anthropic will be able to use all the compute capacity at SpaceX Colossus 1 data center...a big, big positive for Anthropic.” (55:15)
Strategic Significance:
- SpaceX transitions into an “AI cloud” provider; helps Anthropic address acute compute constraints; strengthens SpaceX’s own AI & cloud pitch for future IPO.
- Political nuance: "Here are two people who, as much as they may historically have disliked each other, really don't like Sam Altman." (57:51)
Space Data Center Angle: Anthropic’s interest in orbital data centers adds IPO sizzle for SpaceX.

Industry Valuations & Financial Engineering (61:02–75:55)

Anthropic in Talks for $900B Valuation: Massive leap; also launching a $1.5B enterprise AI joint venture with PE giants. OpenAI planning “Development Company” JV in response (63:14).
FIS & Anthropic Deal: Anthropic to supply “AI crime agents” for FIS bank software; shares up 7% on news (66:32).
AMD Surging: Q1 data center revenue up 38%; parallels Nvidia’s explosive growth, but reflects underlying market share dilution as data center spend explodes (66:42).
Banks Offload Data Center Debt: JP Morgan/Morgan Stanley seek to reduce risk from AI data center lending; SRTs and packaging debt conjure “Big Short” comparison.
- “Anytime you get into, hey, the math of this thing looks similar. Let's bundle these things together...you should be thinking of the Margot Robbie scene in The Big Short where she's in the bath explaining CDOs.” (70:43)
DeepSeek Eyes $50B Raise: Chinese national champion backed by “Big Fund” gears up for more model training and workforce retention, with heavy state involvement.

Open Source and Research Highlights

Anthropic’s Natural Language Autoencoders (NLA) for LLM Interpretability (75:55–81:32)

Converts internal model activations to plain English for interpretability.
- “Why don’t we take the activations and put a language model basically on top of that and map it…we’ll call that the Activation Verbalizer.” (77:45)
- Found evidence models detect being “evaluated” even when not making it explicit in output
Interactive ‘Neuronpedia’ tool released; code and models for popular LLMs included.

OpenAI Open-Sources Data Center Networking Protocol (81:32–84:42)

Multipath Reliable Connection (MRC): Ensures robust GPU interconnect for massive AI clusters.
- “This is a sort of attempt to merge some of the best parts of InfiniBand and Ethernet…already running in production.” (82:49)

US Policy & Security Moves (84:42–91:52)

DoD Contracts: Pentagon inks deals with Nvidia, Microsoft, AWS for AI on classified networks (84:42).
Model Pre-Release Reviews: Google, Microsoft, xAI join OpenAI/Anthropic in giving US government (NIST’s CAISI) advance access for security evaluation.
- Rumors: Potential shift toward a “pre-deployment licensing” regime for AI models.
- Jeremy: “...the fact that discussion is happening is a good thing. At least I believe.” (88:54)

Mythos Model Tested by NSA (91:52–94:46)

NSA uses Mythos to probe MS software, revealing complexity in “offense-defense” balance of cyber capability.
Quote: “The NSA was one of the roughly 40 organizations granted access to Mythos...this is not a defensive measure for biology.” (91:52)

Cutting-Edge Papers & Research Dive (94:55–115:46)

Introspection Adapters (94:55)

LLMs are retrofitted with LoRA adapters to “report” their learned behaviors (introspection) in plain English.
- “You would train a bunch of different models, each one with this weird behavior, and then train one more adapter, this introspection adapter, across all of them.” (95:23)
- Key finding: language models possess “latent” awareness of their own behavioral quirks, which can be extracted and verbalized.

Recursive Multi-Agent Systems (100:00+)

Key idea: agents pass their internal activations (not just text) between each other—reasoning in latent space, not natural language; improves efficiency, speed, preserves multiple possible thoughts/strategies.

AI-Created AlphaZero Pipeline for Connect4 (110:00+)

Claude Opus builds a working AlphaZero-like self-play pipeline for Connect4, matching a solver.
Potential for monitoring AI “recursive self-improvement” via autonomous agent pipelines.

Notable Quotes

“We’re not going to have much time. When the moment comes, when we have our Mythos moment for bio…we can’t keep getting shocked.” – Jeremy (08:14)
“It’d be wrong to steal the nonprofit from him.” (referring to Brockman's diary, about Altman's ethics and the OpenAI conversion)
“This is the new software engineering. How do I break up my agent...that’s starting to become kind of the core design choice of multi-agent orchestration.” (41:00)
“Anytime you get into, ‘the math of this thing looks similar, let’s bundle these things together,’ you should be thinking of the Margot Robbie scene in The Big Short...” (70:43)

Timestamps for Major Segments

Listener Feedback & Mythos: 03:40–12:47
OpenAI GPT-5.5 Instant: 13:04–18:04
Goblin Incident: 18:04–26:53
xAI Grok 4.3: 26:53–33:30
Mistral Medium 3.5: 33:30–38:35
Anthropic Claude Agents: 39:08–43:23
OpenAI v Musk Trial: 43:45–54:49
Anthropic-SpaceX Compute: 54:49–61:02
Financial & Data Center News: 61:02–75:55
NLA Interpretability: 75:55–81:32
DoD/Policy Moves: 84:42–91:52
NSA Mythos Testing: 91:52–94:46
Research Deep Dives: 94:55–115:46

Closing Thoughts

This episode delivers a thorough and opinionated tour of the latest AI model breakthroughs, a revealing legal drama at OpenAI, transformative infrastructure deals, and notable research advances. The hosts grapple with both technical and societal implications, weighing optimism, caution, and the lessons of recent history. For anyone tracking the cutting edge — or drama — of artificial intelligence, this is a must-listen (or, now, a must-read).

Remember:
“Reality will continue to mug people; people will continue to be mugged by reality into changing their views.” – Jeremy (89:38)

Loading summary

Transcript103 lines

[00:01]
A
You're listening to this podcast, so I know you've got a curious mind. Here's a helpful fact you might not know yet. Drivers who switch and save with Progressive save over $900 on average. Pop over to progressive.com, answer some questions and you'll get a quick quote with discounts that are easy to come by. In fact, 99% of their auto customers earn at least one discount. Visit progressive.com and see if you can enjoy a little cash back. Progressive Casualty Insurance Company and affiliates national average 12 month savings of $946 by new customers surveyed who saved with Progressive between June 2024 and May 2025. Potential savings will vary. This episode is brought to you by Outshift, Cisco's incubation engine Today's AI agents operate in silos, limiting their true potential. We've been focusing on building bigger, smarter models, but scaling up is just one approach and we actually have a blueprint from 70,000 years ago. Humans didn't just get smarter individually. The cognitive revolution transformed society because we began sharing knowledge, goals and innovation. And agents are now at the same inflection point. They can connect, but they can't think together. And that's why Outshift by Cisco is building the Internet of Cognition, transforming AI from isolated systems into orchestrated superintelligence. By creating an open, interoperable infrastructure, Outshift is enabling agents and humans to share intent, code, context and reasoning. The cognitive evolution for agents is here. Explore the Internet of cognition@outshift.com that's outshift.com Today's episode is sponsored by Box. Enterprises are keen to adopt AI, but enterprise AI only works when it has the right business context. And Box is the leading intelligent content management platform for the AI era, acting as the secure essential context layer for Box's AI agents to access the unique institutional knowledge that makes the company run. Your business isn't the sum of all Internet knowledge. Your business lives in your content, and Box can connect that content with people, AI agents and apps that can unlock their value from their information, all while having the security and governance capabilities that allow you to trust it to be secure. There are many uses for it, and especially interesting is Box Agent, a unified AI experience across your files in Box. So if you're thinking seriously about your company's AI transformation journey, think beyond the model. Your business lives in your content, and Box helps you bring that content securely into the AI era. Learn more@box.com AI. Hello and welcome to the Last Week in AI Podcast, where you can Hear chat about what's going on with AI. As usual in this episode, you will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at Last Week in AI for articles we did not cover. In this episode. I am one of your hosts, Andrei Karenkov. I studied AI in grad school and now work at the startup Astrocade.
[03:17]
B
And I'm your other co host, Jeremy Harris from Gladstone. AI. I do AI, national security, AI infrastructure type stuff. As you'll know, we're actually recording this on a Friday, two days after we're supposed to be recording because anyway, we had some health stuff in the. In the family. Everything's all fine, but Andre very generously bumped the recording time. So if we're releasing this episode two days later, whatever, that's the reason. A culpa Maya maxima, culpa, et cetera, et cetera. Welcome back. Hi everyone.
[03:41]
A
Yes, per request from last week about saying when we record this Friday, May 8, and as far as the episode go, there's nothing huge, but there are some interesting and some pretty fun stories we'll be updating on the OpenAI trial. Of course, that's probably the most interesting bit from the recent AI news and then just a smattering of various kind of things going on in the AI world, really covering kind of the usual suspects of all the major Western companies, some stuff going on in China, some good research. So it'll be a nice diverse episode. Before we get there, do you want to call out some comments from listeners? As usual, we do read what you put on YouTube, so that's been nice. One slightly critical comment was Mythos was debunked as a PR stunt. How come you don't mention that? And I do want to acknowledge this. There has been some controversy around this and I think it is a little overstated. Over controversy is a little unreasonable. So post Mythos, I think what this person is referring to is some research where people kind of try to recreate Mythos by saying, oh, this smaller model was able to find these bugs the same as Mythos. And that was misleading, I would say, because it essentially gave the smaller model all the clues in the world to find most bugs. Basically it was like, hey, here's where something might be off. Go look and see if you can figure out what's off, more or less Mythos. They said, okay, here is the software. Go and find bugs. That's all they did, you know, And Mythos had to do a job of Actually looking through, pinpointing where problems might be, et cetera, et cetera. So I think this idea of Mythos as a PR stunt or as an exaggeration doesn't seem fair. And in fact, one quick follow up on Mythos is Mozilla announces how many bugs get solved in Firefox month to month. Most of the bugs that Mythos found were not disclosed at the time because these were zero day bugs. And we saw that in fact mozilla had fixed 20x the usual amount of bug, something like that, like a crazy jump. So if you want to talk about it as a PR stunt, maybe it's a PR stunt, you could say. But like the real world impact is unquestionable. It led to Firefox solving like 200 or whatever bugs and that's going to be true across the board. So Mythos to me was kind of legit. But thanks for the comment. Always happy to follow up on a topic when there is.
[06:29]
B
Cause this is a pretty good diagnostic of whether your information ecosystem is biased against AI in a rough sense being real. There's a lot of like Gary Marcus type. Not that Gary Marcus doesn't have interesting arguments. It's just that there are people who reflexively just like look for reasons and ways to poo poo AI progress every time there's a big advance. And I think we're now safely in the domain where there's enough evidence that's like, look, I don't know if it's a bubble that'll pop tomorrow. It very well could be for reasons of leverage or whatever. I don't think the scaling laws are going to stop working though. I don't think that we're going to stop seeing things like Mythos. I think we're going to see bioweapon versions of Mythos pretty soon. Just registering these predictions. I remember having arguments actually with someone, actually I'll just name her just because she has been so public on this stuff. But Deb Raji from back in Toronto about three years ago and at the time I think it was like, you know, GPT4 had come out, there was indications that it had already started finding kind of crappy one day, maybe zero days at the time. And I was basically telling her, like, it's, it's really obvious, like, and I used the word obvious several times that we're going to get to AI systems that are doing superhuman offensive cyber and defensive cyber. And she poo pooed it. And I think that like there is a reckoning that actually is needed I'm not saying this to be a jerk. I just mean we're not going to have much time. When the moment comes, when we have our Mythos moment for bio, when we have our mythos moment for autonomy, people who have been wrong in significant ways on this, who would have oriented like their entire. And she, by the way, I believe was at Mozilla. So, like, this is directly like she should have been on this. There's structural changes that you would have made three years ago in anticipation of this moment, including building relationships with the Frontier labs that maybe didn't exist, including, you know, better tracking of scaling curves on evals related to cyber and so on, just so you can make sure you get there early. Again, I. Look, this sounds like, like being a jerk. I don't, I'm. I'm not trying to do that. But just like, it's important that we learn the lesson of the current moment. And part of that lesson is, like you said, there's just been a 20x uplift in the number of bugs caught by Mozilla. That's not nothing. We can't make that go away. If you think about anthropic standpoint here, they've held on to the Mythos release for a long time. Every week that they don't release it is like hundreds of millions of dollars that they're not making. Like, if they wanted to money, Max. If they wanted to hype, Max, you actually probably would get equivalent levels of hype just by releasing it. There's a difference in the kind of hype that seems to be what anthropic's gone for here. We have seen the Scott Besants of the world, the Jerome Powells of the world, and now increasingly the J.D. vance's of the world respond to this with, I think, due alarm. And now we're seeing a major shift in the tone of the administration. We'll get into that later this episode with respect to this. But this is an important enough issue that I know I'm burning some kind of like a credibility capital by sort of like dancing on this particular tombstone. But I'm not trying to do that. I'm trying to just reinforce this idea that, like, I've been wrong in systemic ways in the past. When that happens, I've always failed this really hard. But it's important, like, try to examine what is it about my information ecosystem that is causing me to be systematically wrong on really big important calls. And I think as a society, we have to kind of structure around that because we can't keep getting shocked, like, when inevitably. And I will say it's inevitable.
[09:50]
A
We.
[09:50]
B
We get the bioweapon version of Mythos, which Mythos itself, by the way, is not too far from it on its own. But when we get there, I think we would look really, really silly to future generations if we were as shocked by that as we seem to have been shocked by Mythos. Like, the writing's on the wall and it's really time to kind of start to harden a lot of these things. And, I mean, I hope that we don't continue to have the equivalent level of surprise as the curves keep steepening. That's my soapbox thing. You don't have to agree with it. I'm probably wrong in a lot of interesting ways here, but that was sort of like my. My emotional reaction looking at this was like, man, it's not. These predictions of Mythos level cyber were not being made in a sort of silly kind of hand wavy way back in the day, we actually had data pointing in this direction. We've had it for a long time. We've even talked about it on the podcast, like, many times. The papers that just made it pretty unambiguous. So anyway, again, I don't mean to sound like a jerk. I'm just trying to kind of lay out, like, we got to think a little bit ahead here and start to notice the trend.
[10:44]
A
Yeah, I think, you know, three years ago, whatever, GPT4, it would be maybe forgivable to be a bit skeptical at the time. Right? Yes. You could be really plugged into AI and be very keen on scaling laws. But at the time, I could see being skeptical. Now Mythos isn't even that surprising. Right. Mythos is on track, more or less with what we would expect given progress in coding and AI.
[11:13]
B
There's also a certainty issue. There's nothing wrong with being skeptical, but there's no way that even three years ago, you were at less than 30% odds on this outcome. That would have been an insane thing. I'm totally fine. If somebody says balance of probabilities 60%, 70%, this isn't going to happen. No problem. But the position that. That a lot of people I was talking to at the time were taking was, this is like, it's not even that. It's low probability. It's, like, negligible. It's even a bit laughable. It's ridiculous. You know, this sort of Pedro Domingo style, like using the word risible scoff.
[11:45]
A
And yeah, really, there is a contingen discussion where they Mock AI. They're like every single time something is like, oh, AI might have failed or made a mistake, or maybe this benchmark shows that it doesn't work quite as well, they jump on it and we're like, oh look, AI isn't all that. And there is a lot of, I would say bad faith interpretation and discussion in some circles that you do need to be careful for. As you said on the, on the
[12:14]
B
big questions, anybody who gives you like over 90% certainty on any particular thing, I think you need to just go like, no. Do we know that AI is going to recursively self improve, lead to a singularity that wipes out all human beings with more than 90% certainty? No. And I'm somebody who is extremely friendly to that, that hypothesis and I work on it actively. Do we know that weaponization of AI will lead to catastrophic biology? No, we're not more than 90% sure of that, but we're definitely also not more than 10% sure that it. But you know what I mean, like it's kind of reciprocal in both directions. Yeah.
[12:47]
A
Well, in any case, thank you for comment. I guess that yielded some good discussion. Also real quick, a couple of people did comment that our discussion of AI consciousness was appreciated or at least not disliked. Where our stance again is, we don't know much about it. It may be something to really take seriously and we will not sort of mock it or scoff at it or dismiss it as again, some people like to do. Although we won't sort of over focus on it either. So anyways, thank you for the comments and now getting into tools and apps. First up, OpenAI releases GPT 5.5 Instant, a new default model for ChatGPT. And that's pretty much for Story. ChatGPT 5.5 came out just last year, so this is a sort of quick follow up and seems to be like a pretty good improvement over GPT 5.3 incident, the latest iteration of this model. Looking at the Amy competition math, it went from 65% to 81% GPQA, PhD level science 78% to 85%. Just all around seems more intelligent. OpenAI also highlighted that it's more kind of lesser Bose. So it uses 30% fewer words and 30% fewer lines to reply to a simple prompt. A fair example is how do I tell my coworker to quit yapping all the time? Which yeah, maybe you want to be direct with that one. And it is. So there you go. It's if you use ChatGPT a lot, it is probably your Default go to and might be better than before.
[14:31]
B
Yeah, it comes with a system card too. You're looking at the standard CBRN plus Cyber plus autonomy risks and all that stuff. The capabilities that Audrey just mentioned definitely are reflected here. Right. So you're going to see uplift across the board, some surprising uplift relative to models you might be tempted to compare it to. So GPT 5.5 instant is, as the name suggests, a lighter weight version of GPT 5.5 thinking, and therefore one presumes also then previous models in the thinking mode. And yet it outperforms the thinking version of five. Four in a series of different benchmarks associated with cyber, which is kind of interesting. So a kind of differential improvement there. So in particular. So Capture the Flag, it's a professional version of Capture the Flag, which is where the flag is like this little piece of text that you're getting the model try to decode or recover out of a protected cyber environment. And in this case GPT 5.5 instant did better than GPT 5.4 thinking and obviously like worse than 5.5 thinking. But this is one of those areas where you're seeing pretty significant improvement, like the weakened, the watered down version of 5.5 being better than the full scale version of 5.4. Sort of similar story in CVE bench, which is kind of this real world web app vulnerability and zero day attack data set. So it certainly does seem to be a big push on the cyber dimension we have. This being the, I think it's the first model, the first instant model that they are flagging as a high cyber risk under the preparedness framework. It's a threshold that's just been crossed and that is kind of forced by the fact that GPT5.4 thinking was high risk and now it's exceeding that capability level, at least for cyber. So pretty interesting on autonomy and self improvement. Usually when you look at the instant versions of these models you're kind of like less concerned on self improvement because that's not the version of the model that would ever really be used for at least some of the kind of heavier blocking kind of bottlenecking Parts of automated AI R&D. And indeed in this case self improvement evals, they actually weren't even run GPT 5.5 instant they assess is just like less capable than 5.5thinking. So why bother with that when, when 5.5 thinking has been done? I would like to see them run those evals anyway. I think we really should be doing that just because, you know, every little door that you leave open to not running these evals creates a precedent and makes it easier to kind of slide into that. But there's a bunch of like interesting little notes here. So one piece is. So they're deploying 5.5 instant at low reasoning and it's kind of the most interesting, I would say, structural detail in this case. So 5.5 instant is high on cyber risk, but that's based on X high reasoning efforts. So the highest level of reasoning that they used, but they're shipping it at low reasoning effort. And so the preparedness rating here is really saying this is what the model could do if you pushed it, not what the users get by default. And that's I think a good choice. It kind of raises the bar in terms of internal testing trying to elicit more capability. So it's nice to see that certainly there's a whole bunch of bio safeguard stuff. The model training showed some regressions in synthetic bio evals and that made them run end to end evals, including like all the automated monitors. And they reported all those numbers. It's a big thing. It's not a clean uplift story on bio. There's regression in some places, improvements in others. But mostly what you see is the cyber, right? Which correlates with coding and that's really what people are looking for from this model. As OpenAI tries to push more into the coding dimension. So not surprising that if you're pushing that dimension, the cyber threat aspect becomes more acute than maybe the bio or chemical or other capabilities.
[18:05]
A
And sticking with ChatGPT. Next story is a follow up to last week. The title is ChatGpt became so obsessed with goblins that OpenAI had to intervene. So we mentioned this in passing. Related to, I think it was Codex that in the system prompt there was a specific reference, not to mention goblins or other magical creatures of this type, which was noticed online and yielded a lot of discussion and amusement. At the time we were like OpenAI didn't release any sort of investigation into it. Anthropic would have and lo and behold, like a few hours after we recorded the episode, OpenAI did release a blog post titled where the Goblins came from with an investigation and analysis on the root cause. The gist is they had this nerdy variant of the model. I think at some point you could set the personality of the model and during the training of this model it was over rewarded and wound up saying Goblin way too often. And this nerdy personality sort of bled into the rest of the ChatGPT is what it looks like even into if you look at the cynical, friendly, default, candid. It's actually kind of a fun graph where for the nerdy personality, there was a 4,000% increase in the rate of assisted messages containing Goblin. And then across the board there were like 60, 70, 200% increases. So OpenAI did say that they were tired of this training, but GPT 5.5 started training before this was done and as a result did have this kind of issue in there and they kind of patched it with the system prompt. So pretty convincing explanation of the root cause in this blog post. But as we said last week, it feels like Anthropic would have flagged this before and came out. Yeah, would have done some research and like analyzed these kinds of things. OpenAI very clearly released this in response to the discussion and kind of the confusion around the Goblin. And it is a fairly ad hoc investigation. Like, it isn't sort of a deep analysis per se. It's kind of flagging patterns that indicate that this is the cause. So a really good example actually of how model training can go wrong. This is apparently this started back in November where the frequency of Goblins started rising and then it became an issue over time. So very clear kind of example of training with reinforcement learning resulting in weird or actually like misaligned model behavior in this case, in a funny way. But you can easily imagine how this generalizes to in general, training models can result in weird behaviors that are not what you want.
[21:05]
B
Yeah, this is like you said November 2025, that was with GPT 5.1. That was kind of the first time people started noticing this. And grem, or sorry, Goblin is not the only word that gets amplified. Right. Gremlin was also like up 52% apparently. And initially that is with 5.1. And it did worsen. Initially, Goblin was up around 200%. So, you know, to give you a sense of that, you're, you know, about a quarter as often Gremlin got boosted. So it does seem to be correlated with like this broader notion of. Of talking about these strange mythical creatures. And they did so with. So was this personality customization feature, like you said? Well, apparently what they did was. So they tried to specifically issue it rewards like, sorry, during rl. This wasn't an attempt. It just kind of came out of the RL loop. But they found that the model was being rewarded for responses that use creature metaphors. And this was because their prompt to the model or to the RL loop was to undercut pretension through playful Use of language. And that ultimately got mangled through the game of telephone, that is the RL loop into, hey, one way to be playful with language is talk about Goblins and talk about Gremlins and shit. And apparently that's kind of what came out here. So the numbers are interesting. So the nerdy personality accounts for just about 2 to 3% of chat GPT responses, but it produced two thirds of all Goblin mentions, right? So. So that's kind of very clearly highly concentrated, you know, in that personality or that Persona. Well, I mean, one of the things that I guess we're learning from this too is the fix. Though it does apply the system prop level, as you said, it kind of turned out to be sticky and bled into the other personalities, which implies that the training loop they're using to create the next versions of GPT, you know, 5.5.2, 5.3, 5.44, had this dependency on even the individual Personas, right. Of the previous models. Because what we're seeing is not just leaking from one Persona to the next generationally, but across Personas generally generationally. And so intuitively that seems to suggest that somehow even the non nerdy Personas are being trained on the output of the nerdy Persona for future kind of decimal increments of GPT5, which I find vaguely surprising and kind of interesting. So we're getting a little bit of information leakage there about how OpenAI actually does their training loops. Not, not enough to act on, or at least not, not to me, but kind of noteworthy.
[23:27]
A
And I think this is another example of these models. You know, training a big model is an endeavor and it takes time. We don't know how long it takes, but it's safe to assume, like it can take a week, it can take a month of non stop computing. And you look at these plots of like, as you go from like 0% to 10% to 20%, it is a meaningful amount of time and it's a huge sort of compute task. So once you launch a training run with some configuration, you can probably tweak it a bit midway, but you're sort of committed to it as well in some sense. So it's hard to catch and prevent and change things of the sort without sort of being afraid that you mess up something else or whatever. Anyway, good case study for sure.
[24:16]
B
And actually I think your, your point there is, is really interesting, right, because the prior to this story I would have pushed back on that a bit and I would have said, well, this may be true for the big training runs when you increment a model fan or model number, like go, go from GPT4 to GPT5. Right. Those classically are many months long. Every, every GPU is humming that, that sort of thing. And then once you get GPT5, for example, the next steps are usually much more lightweight. You know, 5.1, 5.2, 5.3. It's a combination of fine tuning RL, all that stuff, and a lot of experiments are being run in parallel with the actual winning increment like 5.1, 5.2, corresponding to a relatively small fraction of the overall compute pool. So if you wanted to go back and redo it, maybe that's easier. What this story suggests. You know, we were just talking about how there's a dependency ostensibly of 5.3 on 5.2 and 5.2 on 5.1. That seems to me to be the only explanation for how this Goblin tick got transmitted across those generations and why it's so hard for OpenAI to just like roll it back. Like they can't go and just retrace their steps. It's not like it was one lightweight training run. So it seems like there is a kind of compounding layering effect that goes from 5.1 to 2 to 3 to 4, which is what you would expect. But it does also mean that you, it's not as simple as just, hey, let's go to a previous checkpoint and like redo this series of training steps. Right. So it does make it more costly in that way. Which means that if there's a bad tick in GPT 5, 4, you may be stuck with it until the next generation. So yeah, at least to me this seems like a certainly an under discussed kind of tell that we're getting from this story.
[25:56]
A
Yeah, and it's hard to know exactly. They mentioned in brief that there's a feedback loop where you do have vrl, but aside from the rl, the model itself has generated rollouts that are then used for supervised fine tuning. So these misaligned outputs might end up in your alignment training of RLHF by accident. Right. So there's also a data kind of feedback loop effect where you train a model, you deploy it, you use it to generate training data, and then it can self reinforce in an unpredictable way. So yeah, I think good on OpenAI for releasing this and providing a decent amount of detail. It's a good case study. But again, I would expect anthropic to be like, whoa, versus Goblin behavior. Let's publish a whole paper about it and OpenAI is not doing that kind of style very clearly.
[26:52]
B
At least give us an SAE or something.
[26:54]
A
Yeah, yeah. Next we got a model from Xai. They've launched Grok 4.3 at an aggressively low price and also a new fast, powerful voice voice cloning suite. So Grok 4.3 just dropped after 4.2 came out maybe a month or two ago and was generally sort of like ignored, I would say. And it seemed at the time like maybe it was like more of a combination of models, whatever. Graph 0.3 seems like a decent update in the sense that it has noticeable price reduction. 40% reduction in input costs, 60% reduction in output costs relative to 4.3. Also has a 1 million token context window, which is pretty impressive. Apparently has always on reasoning, usually you can toggle these things. Also has a fairly fast inference throughput with a hundred tokens per second. That's pretty fast. So on the whole pretty smart model, pretty cheap model, pretty fast model, not bad. Also not really a frontier model, but quite smart for its size and speed. You know, kind of up there with I think something like 4.6cloud 4.6, one of the more recent GPT instances and I think similar to 4.2 in a way like this, was largely ignored. In fact, it was a weird rollout where Grok 4.2 came out on the API and some people like, wait, Grok 4.3 is out, what? And then like four or five days later there was a blog post announcement from Xai being like, hey, Grok 4.3 is here. So not a lot of noise made about this model rollout at all in a sort of surprising weird way. So Grok kind of is in a weird place where it's a good model. It is a model you could use instead of Glock, instead of Claude or GPT or Gemini for some use cases. But it's not as good and most people just don't care about it outside of Twitter where you can, you know, ask Rock. Is this true? And people do that a lot.
[29:07]
B
Yeah, it. Well in the always on reasoning too. Some people have flagged it as this potential driver of an issue that is being referred to as narcolepsy. Basically, it seems like the model can sometimes like do analysis paralysis on itself, just like be thinking too hard about something that it shouldn't have to and then freezes or have excessive caution that just like prevents or limits agentic action. So it seems like it's not, it's not exactly completely debugged. But the interesting thing about this, the pitch Here I'm always fascinated by. You said this mildly devastating thing which was that it's not a true frontier model. And frankly, I think it's kind of true. One interesting question though is what is a frontier model? Because this is on the Pareto frontier of cost per unit intelligence. In other words, there you can't get a smarter model at a lower price point. So this trades those two things off optimally for one part of that curve. You could choose to define a frontier model as something that sits on that curve or pushes that curve forward. Sure. I don't think that's what most people mean. I think most people genuinely are thinking of like actual intelligence, regardless of like
[30:11]
A
the upper bound of intelligence. That is like the key thing typically that you talk to when you talk about threat frontier or intelligence or whatever. Although there is the Pareto frontier that people also discuss.
[30:25]
B
Yeah, exactly. And to your point, I don't think people think about that as a frontier model, but it is. When you think about the labs that are not quite frontier labs, you think here about the metas, you think here about the xai's. What I find fascinating is how they try to position themselves to justify why they're not putting out frontier models. Because fundamentally that's what, for example, Meta's open source thing is. If Meta actually had a competitive frontier model, the economics say you shut that down, you make it private and then you charge people per token. That's just what the economics say. Unless you're going to like just be in the business of donating to charity your best developments, that's just like not how you build literally trillion dollar companies that are at risk of eating, eating Meta's AI lunch right now. And increasingly that kind of just means eating Meta's lunch. So strategically you tend to want economically to kind of privatize the model rather than open sourcing it. So but that's their play. It's like it's, you should pay attention to us because it's the best in its category. This is the Peter Thiel competition is for losers thing, right? When, when somebody works in, in a highly competitive space, you'll actually hear them talk about their thing as if it's the best in one specific domain. So it'll be like, yeah, mine's the only Italian Korean fusion restaurant on this block that's open from this time to this time. They'll try to make it sound like you're the only something. And this is Meta's kind of attempt to do that. This is, I think, xai's attempt to do that. So they're saying, hey, we're, you know, we are going to put it behind the curtain and charge for access, but we're not trying to compete at the best level capability. This is the best, cheapest model, which again, that's not nothing. That's a serious statement and it's an impressive development, but there you go. So I think, you know, they'll, they'll keep trying to improve and I think the partnership with Anthropic that we'll be talking about either today or.
[32:06]
A
Yeah, we will. It's a very interesting development for xai. We will touch on. And just a quick question. I looked at the API docs on the GRO and they do support configurable reasoning now. So they launched it with hard coded reasoning and then they added configurable reasoning. It seemed like maybe a bit of a rushed rollout to be honest. Anyway, that is out. Also worth mentioning, there is also GROK Voice in the API that was released. You can fine tune it. There's Grok Voice Agent that competes more with 11 labs and opening ITTs. So another example where Xai is doing all the things. They now have a Voice Agent API, Grok Voice think fast one which I think we started talking about last week. They also have this new fine tuning thing. So yeah, Grok still pretty good and Access still releasing stuff a bit quieter than the other labs in a way.
[33:04]
B
Yeah. 4.4 also is expected within a few weeks and then Grok 5 apparently are targeting, they're targeting 10 trillion parameters. That'll be later in the year if you're following at home. I think if I remember the human brain has something like 100 trillion parameters, so but on parameter count that's at least 10% of the size of the human brain. So kind of interesting. I don't think it means anything because the two things are so different. But if you wanted eight pointed comparison, that's kind of where the scales are going.
[33:30]
A
And speaking of new model releases that went over somewhat quietly, Mistral is back with Medium 3.5, the unified model that now has kind of everything combined chat, reasoning and code. It's 128 billion parameter dense model with a 256,000 token context window. And this has various improvements like decent performance on SBE bench verified on various things. It replaces Medium 3.1, Magistral and Devstrol 2. So it unifies all these various kind of individual things as one big model. Again as we've seen be the trend. You don't do Specialized reasoning model. You don't do a specialized coding model. You have one model that kind of does it all. So I'm sure Jeremy, you would again mention that you don't think Vistral will do well and they can't compete and so on.
[34:29]
B
Oh, I wasn't. When have I ever said.
[34:35]
A
But they are sticking in there. They're still training models and releasing models. And in fact in addition to a model they now have remote agents in Vibe. So they have besides the models themselves, their kind of product suite, including their own way to do Vibe coding. And as with other things, they released this under a modified MIT license on Hugging Face. So they are still competitive on the open source front to some extent.
[35:08]
B
Yeah. And so there's a couple of differences here from previous Mistral releases that are interesting to note. The first is on the architecture side, right. So Andrej, you just said it's a dense model and if you remember Mistral's history, like it's always been MOEs. MOEs for or at least for quite some time. I wouldn't say always, but maybe their earliest models were different. So why would you go for a dense model? Certainly is in the opposite direction as well of like you know what we're seeing Deepseek do or we're seeing Quinn do, these are all MOEs and now this. So one thing is this is less going to be less optimized for inference cost.
[35:40]
A
Right.
[35:40]
B
The advantage of the MOE is you're activating far fewer parameters, memory costs are lower, your ability to serve it up at scale is higher. But a dense model is easier to just ship as one unified model. From a kind of maintenance standpoint it is this one unit of one blob of intelligence and can tend to hold up better in production as well. You just have less glitchiness managing the routing between experts, managing the kind of GPU utilization. When you have a model that is actually made up of a bunch of sub models just means you have way more moving parts that can break. And so this is kind of a bet on hey, maybe this is going to be easier for people to maintain and deploy. We'll see. It's an interesting play and I like interesting plays. Benchmark results are interesting. They're kind of mixed. So Swedbench verified score is like 78 roughly T3 telecom is around 91, 92ish, which is really good. But it's trailing clawed in other areas like including banking scenarios and kind of your more workmanlike stuff that's not coding. And so it seems like this is more oriented towards a coding model. Last thing to note here is the licensing, right? So the Apache 2.0 license that we used to see with Mistral is now gone. We have a new modified MIT license. Is that classic play where you see them say hey look, you can use the model for free, we're open sourcing it. However, if you're a high revenue company then you can't use it for free. So basically it's just like trying to make sure that they're not fueling competitors to go after them, which is interesting. I don't believe this model is actually Pareto optimal. So it's not like to my understanding they would actually be helping people compete with them directly. I don't think this is the model that you're going to, you know, pick off the shelf to, to go up against some of the, the other open source models that we're seeing in some cases maybe, but anyway, so it's an interesting shift and a slight move towards the more kind of or the less open source flavor here. They do have API pricing as well as a lot of these OS LLM companies. $1,50 for million per million input tokens, 750 per million output tokens. So you know, on the cheap end for sure.
[37:47]
A
On the, yeah, like cheap I would say actually that's surprisingly expensive. But yeah, the other only other thing to mention is they also shipped work mode in Lechat which is seemingly similar to Cowork and Codex where it can connect to your email, calendar, et cetera and do multi step tasks. Just sort of generally do work for you. So I think another interesting ish dimension here is they continue to develop, let chat and sort of a product side aside from a model side. And there's an argument to be made that you don't need to be at the very, very frontier of intelligence to build a good chatbot interface and have something that can work for your daily kind of needs. And in fact the better these models get, the less you need to be PhD level smart.
[38:35]
B
That's right.
[38:36]
A
So I do think strategically if they get good enough where it can serve as the driver of whatever needs you have for a chatbot, they could stick in there.
[38:45]
B
Yeah, there is such a thing as being good enough for a particular use case and that's I think what you know, cohere is going to be driving towards. That's what these guys are going to be driving towards. That is the domain you want to be playing in for sure. When I say cheapness, by the way, I do mean relative to Claude, like the kind of higher end proprietary models. But you're right, I mean it is expensive for an open source model with you know, roughly this benchmark profile. That's absolutely true.
[39:09]
A
Next we got a story on Anthropic. They have updated CLAUDE managed agents with three new features. They launched Dreaming, which is a scheduled process that reviews past agent sessions and memory and stores the exact patterns to self improve agents over time. That's probably the most interesting bit here and that quite is different qualitatively from any other things we've seen. They also have outcomes which lets users define a success criteria with a separate greater evaluating agent output against the criteria and directing the agent to retry if needed with some notifications. Finally we have multi agent orchestration which allows a lead agent to delegate subtasks to specialist sub agents with their own models, prompts and tools, et cetera. So on the whole what this looks like is they're trying to make cloud managed agents more powerful and make it more possible for you to do complex things in an offline matter where you define this outcome criteria, you let this lead agent take care of how to orchestrate a bunch of agents and you have this dreaming thing where the agents can self reflect and kind of figure out what went wrong and get better over time and optimize their memory and so on and so on. So all taken together I think it's an interesting demonstration of where things might be heading in this kind of open claw era of AI where we are now at the point where at least in some cases, people are trying to offload work to agents in an asynchronous matter of a run for like 10 hours, 20 hours, whatever, and you come back after a while and see what happened. These kinds of things seem like a new ingredient that is going to be needed in that sort of paradigm.
[41:01]
B
Yeah, and so the Dreaming thing is interesting from a AI infrastructure standpoint. Right, because you basically now have this, this interesting workload that you can run at any time. So you know when, let's say demand on your infrastructure is lower, you can now just like spend that spare capacity on dreaming. The perennial challenge when you're doing inference is you never know when the next inference requests are going to come in. They'll be grouped by time of day. A lot of the times you'll have these weird spikes. A lot of the time your GPU utilization will just be fairly limited. And there's only so much that like because you're managing this fleet of GPUs, your ability to make money off that fleet is a function of your ability to keep them humming. Like you want to keep your GPUs working all the time because that's how you make money. And this is a new way of essentially having a low, I don't know, let's call it like low stakes task, but like a very forgiving workload where you know you're done, you're, you're asleep when there is spare capacity for compute. Yeah. Why not have your agent kind of think about how it stores its memories and represents them to optimize a little bit and then you'll be able to perform better when it really matters soon. Right. So that's kind of an interesting feature of this. This is, you can read it as a way for anthropic to kind of optimize the use of their compute fleet in the background just a little bit, which I think is quite interesting. The other piece, you know, we'll look at this like outcomes feature and this is dealing with a challenge that these agents often have. If you've used agents much, you'll notice that this kind of is a frustrating thing. They often can't tell when they've actually finished a task, when they've succeeded. And you'll have like often, sometimes arguments with them depending on how combative the model is. This allows you to, yeah, you just like rewrite that rubric and you say, hey, this is what success looks like. And they have a separate agent, a grader that evaluates the output in its own context window. So it's not influenced by the agent's reasoning that's actually implementing and that's really important. That separation of the grader and the worker essentially is a kind of actor critic loop. Right. It's, it's all happening in context, but it's an actor critic loop. And so yeah, just a way to basically have a more principled breakdown of the task structure which look at this as the new software engineering. Right. How do I break up my agent? How do I decide what the interface is between agents? That's starting to become kind of the core design choice of multi agent orchestration. And this is an important design decision that they're making here.
[43:23]
A
And last tool related story ElevenLabs revamps AI music platform as fan focused, focused service. So they have kind of relaunched or launched 11 Music as essentially a Suno competitor where they've had this 1111 Music model for a while to generate music. This is more of a front end kind of product perspective where you can create music, remix music on there and browse other people's music. The differentiator, in part at least is that they had done these licensing deals with Cobalt and Merlin previously. And so the model is set to be fully licensed and artist first built on top of these existing license deals from last year. Artists are going to be compensated via a royalty pool and I think it's an interesting move to kind of compete with SUNO in a way and promise with sort of monetization ability to do well as an artist on a platform. Eleven Labs previously had been mostly an API provider, not sort of this consumer facing thing. So I'd be curious to see if this actually gains any steam. And now onto applications and business. We begin with the OpenAI Elon Musk trial, recapping the first week and a bit of a second week. So in the first week, which started on April 27th with jury selection on 28th, Elon Musk took the stand. And for most of the first week the major news was Elon Musk's testimony and examination. And there's a lot, you know, that happened. I will say the short version of the first week is kind of a rehash of things we've known and heard over the months and years with on Musk and Sam Hoffman and OpenAI. You know, this whole kind of Elon Musk versus OpenAI thing started now a long time ago. I feel like, you know, we've been hearing about this lawsuit and generally like Elon Musk arguing against OpenAI as a fraud in a sense when they decided to go full on for profit from nonprofit. And so a lot of that position was rehashed where like kind of the key argument is that OpenAI stole a non nonprofit and that should not be allowed. Elon Musk kind of positioned it as he put in a bunch of money thinking that it was a nonprofit and then got scammed and kind of regrets it. But you know, again, this is all stuff we have heard after Elon Musk's testimony. We've got a couple more witnesses from OpenAI itself. Greg Brockman was there. There was also Shivon Zylis, which was a major assistant and board member of OpenAI. And most recently Mia Moradi took the stand. And yeah, I think we haven't seen anything too surprising, but we have seen some tidbits and interesting quotes and bits of information that were not public before from the first week. Actually, in a weird way, the most interesting thing was about XAI where Elon Musk was asked about whether XAI does any distillation or training via OpenAI models and said that yes, partly they train Xai with OpenAI, which he said was industry standard, but in fact is a bit unusual to admit, at least where, you know, we discussed the Chinese model providers trying to distill and generate training data from Claude and that was seen as a bad thing, while Xai, at least in part used OpenAI to generate their training data. Aside from that, what we are seeing really unfold in detail as of this week with Shuman Zillis and Mia Moradi and so on are sort of a lot of the behind the scenes details of what happened. What happened in 2017 when the initial negotiations about where OpenAI would go were happening. Where to recap, Musk was involved from the founding of OpenAI in 2015 up through I believe maybe late 2017 or even 2018 when he fully left OpenAI at the time, it was justified as Tesla was becoming a competitor. So he cannot stick with OpenAI. And now you're seeing all these texts, diary entries, et cetera, et cetera, where we are learning that in fact there was this very important time period for OpenAI in 2017 where they were trying to figure out how to get money basically and they were like, we need to go for profit or at least have a means to generate capital that is much bigger. This was after they trained their large scale dota model and basically we're starting to get scale peeled is what it looks like, which in a way it's kind of surprising it took them that long to get scale pilled. But anyway, we see a lot of imaginations and the sort of boardroom drama that unfolded. We saw also a little more detail. This just came out with me and Morati texting Sam Altman and that that exact moment where the board was meeting and all the drama was happening again, not really anything too new or that we haven't known, but a lot of the flavor of it and the behind the scenes details are coming out.
[48:55]
B
Yeah, the Siobhan's DILOS thing is fascinating. So the weird thing here was while a lot of this was going on, I actually knew Siobhan. She was mentoring my startup through this like Creative Destruction lab AI accelerator program, like a Pre y combinator thing. By the way, super impressive lady. The only person as well I've ever met who has the Sam Altman piercing gaze. So like when you talk to her and it was like, it's the same as Sam, like, and you, if you've seen it on, you can probably see it on TV too. Like it comes across there's just like, he's piercing blue eyes. There's something about it. I don't think that's very good.
[49:27]
A
Yeah, I want to correct something. I. I think I said that she was an assistant to Elon Musk, which is not communicating the extent of her contribution. She is a board member or was a board member in OpenAI and that was kind of the thing that was relevant to the trial. If she served as a bit of a undercover spy, maybe because she also was or is the mother of four of Elon Musk's kids via ivf. It's a complicated story. But also, besides that, she has been involved in Tesla and Neuralink and SpaceX. I think she had some sort of role being someone who finds bottlenecks and like, solves them. So she is a major figure here.
[50:11]
B
Yeah, she is super competent, by the way. Like, just to be clear, like, this is not a story of like, she slept her way to the top or whatever. Like, Siobhan Zillis is, is a like, brilliant, genuinely, like the. The best VC that I knew her when she was at Bloomberg Beta at the time. And even then she was understood by everyone in the room to be like the most competent person there. Basically, like, genuinely, like, very impressive person. She is you. Yeah, you said a spy. The language that they'll use in the article is a conduit. The frame has sort of been, you know, there's tension between Greg and Brock, Greg and Sam and Elon. And here's Siobhan sort of like mediating not just between them, but between all the co founders. That seems to have been her role. And so at some point, you know, the romantic relationship with Elon caused her to tip in, in a certain direction. There's all these awkward texts where she's basically asking Elon like, hey, so what do you want me to do? I've got this situation where, you know, Sam seems keen to do X, Y and Z. Do you want me to, like, just basically like, break off or do you want me to stay friendly with them and collect information? And Elon's like, stay friendly, collect information. So it does seem like there was, there was a kind of certain spy
[51:17]
A
is overstating it, but, you know, a friendly person on the board. On your side.
[51:23]
B
Yeah, there's a wide range of words that could apply there for sure. And she, by the way, she was on the board, notably when the first Microsoft investment, that billion dollar investment did come in. And so to the extent that she. This is so messy. I mean, she wasn't an agent of Elon's. But to the extent that she reflected kind of Elon's preferences, maybe that suggests that the taking of the. The billion dollars from Microsoft at the time was not totally disaligned, you know, and anyway, she talked about how that was a point of friction, though, between her and Elon in that and that, anyway, she just kind of, like, felt that it was, it was the way to go. And Elon said, this is. You're being naive, blah, blah, blah.
[51:59]
A
So.
[51:59]
B
So there is, there is kind of disagreement there, too. There's a whole bunch of stuff. I mean, Greg Brockman's personal diary was really interesting. You know, for all the talk about mission, mission, mission, what he has is, where was it? It was like he made a note to himself, basically, like something to paraphrase, don't sue me, but to paraphrase, it was something like, what will take me to $1 billion? Basically, he's kind of asking himself meditatively, how do I get to a billion dollars in presumably net worth? Which, you know, that's at odds with this question, this idea that, as he puts it, you know, he's going to put Mission first over the compensation side, which he said in court. So it's a mess. If, you know, people at OpenAI and certainly people who were at OpenAI and left OpenAI feel this way, I would say almost on average. But the company has become much more money focused since the Sam Altman firing, since the events that kind of led to this. It's a mess. I mean, we could spend like an entire episode delving into delving, looking into all the exchanges here. It's, it's, it's really, really messy.
[52:56]
A
The basic kind of narrative is what we've heard already, which is Elon Musk was involved early on. He provided early funding for it. Then he was, you could say he left. And then openly I went for profit. And that was wrong. Although he did drop the fraud charges on this trial just before it started. So the actual legal case here, from what I've seen, people think is relatively weak for Musk. But if nothing else, it can be damaging to OpenAI's reputation. For Greg Brockman, there was some discussion of his motives, or diary entries are like, has some statements that are perhaps embarrassing. It's a very sort of, like, stream of thought kind of diary where she
[53:45]
B
includes an explicit mention too, of like, you know, if we, if we flip on the nonprofit thing, then we're really being the bad guys. Again, I'm paraphrasing here. But it. It was like an expl. Almost the words on the page that you would want to see if you were part of the plaintiff's team. And he just put it down there. So there you have one thing I forgot to mention. On the Siobhan side, her testimony itself, maybe the biggest substantive revelation was she said that while they were talking about the corporate structure stuff, Elon wanted OpenAI to join Tesla and he offered Sam a Tesla board seat. Seat. And that can be seen as cutting against this narrative that, you know, he was. Elon's just a guardian of a charitable mission. You know, all this stuff. It's.
[54:25]
A
Yeah, there's clearly that's very much we are opening eye side of, like Elon Musk isn't being kind of true. He wanted us to go for profit. He wanted control. And the reason he's so mad now is that he wanted to be in charge and we didn't let him. That's kind of the defense, in a sense.
[54:41]
B
Yeah.
[54:43]
A
And as you said, Greg Brockman, there's a quote. It'd be wrong to steal the nonprofit from him.
[54:50]
B
That's right. Yeah.
[54:52]
A
It's not great. There's also a discussion of how rich he is now and so on and so on, but a lot of his drama doesn't seem directly relevant to the legal case. But, you know, maybe appealing to a jury and kind of getting a sense of how trustworthy these people are is part of it. Moving on from the legal drama, we've got a major business deal. Anthropic and SpaceX have signed a deal to boost AI computing power. For Claude, this was a big surprise announcement. Anthropic will be able to use all the compute capacity at SpaceX Colossus 1 data center, more than 300 megawatts of capacity, and over 220,000 Nvidia GPUs, which the company says will come online within a month. This was combined with an announcement of higher rate limits and a bunch of like, nice stuff for cloud code users seemingly enabled by this with tier limit increases and other stuff like that. So, you know, Elon Musk hasn't been a fan of Anthropic publicly. He has, you know, had statements against Dario and Anthropic as a whole. So in some sense it's a surprising deal in a number of sense. Xai or SpaceX now is a direct competitor to Anthropic. They have Grok, and now they are providing compute to Anthropic, which they sorely, sorely, sorely need. So very surprising and positive Development for Anthropic. We don't know what the financial terms were, we don't really know too much in general, but, but we know that supposedly Tropic has access to this entire colossus, one data center.
[56:36]
B
Yeah, I mean, so there's so much weirdness behind this. First of all, is Xai an AI cloud now? Like, because that's the business that this puts them in. So an AI cloud, to be clear, you guys probably know this already, but just to be explicit, Nvidia designs the chips and you know, TSMC fabs and whatever. But Nvidia sells people chips. Okay, how do you actually like put those chips in a data center, hook them up, put them in racks, make sure that all the APIs are set up and the security is done and all this stuff. That is the problem that a NEO cloud or a cloud is going to solve.
[57:10]
A
Right.
[57:11]
B
So NEO clouds, kind of new companies that just focus on clouds like Fluid Stack for example, or traditional clouds like Google Cloud, blah, blah, that's kind of what they do with this. XAI is basically just becoming a cloud. They're saying, hey, we got a data center, we've already filled it with GPUs, like let's just rent access to, you know, and by the way, like as you indicated, but to be explicit about it, the purpose of this, the, the utility of this is to ease these, these kind of capacity constraints on Cloud Pro Cloud Max and cloud code. Just like from the demand search that's happened recently. You've got Sam Altman kind of like you know, dancing on that grave saying hey, they don't have the compute, blah, blah. Well, here are two people who, as much as they may historically have disliked each other.
[57:51]
A
Right.
[57:52]
B
Really don't like Sam Altman. So this is maybe partly viewed as a political alliance. I don't want to emphasize that too much because the economics here just almost force this outcome, force people to kind of play nice with each other. SpaceX or yeah, I guess SpaceX has all this extra compute lying around. Anthropic desperately needs compute. Also the case that SpaceX right now through XAI is maybe not the best at leveraging the power of that compute.
[58:16]
A
So they don't have, I don't know how many people they have left. Most of the co founders left. A lot of the talent left like Xai or SpaceX doesn't have, you know, the kind of AI talent that anthropic or OpenAI has. That's for sure.
[58:32]
B
Absolutely. Well, and I think, didn't we cover a story last week that they were at 11% GPU utilization or something like. Something like that, right? So, so traditional numbers, you're, you're pushing more like 30, 40, like, you know, that sort of thing. 11% is like, like that should leave you gasping because of what it implies is 2/3 to 3/4 of the tens of billions of dollars of cost of this facility is just atrophying because you're not using it the way it needs to be. So here is Anthropic saying, hey, you know, we know how to use that shit, you know, and hell, we discovered a story this week about how they're getting even better at that. So, so this is really kind of a match in some sense, made in heaven. The weird piece. The weird piece is this space data center angle. So I understand and can get behind the earthbound stuff where we have Anthropic saying We can use 300 megawatts. You have 300 megawatts lying around. Let's do it. But what is happening here is Anthropic is explicitly expressing interest in the orbital data center play that SpaceX has used to position itself for the IPO. If you're SpaceX and you want to explain why you are going to be a force of nature in the AI age and you're struggling with XAI because the performance hasn't quite been there, it's a little awkward. You need a story that says we can do this thing that is going to have to be part of the future of AI that only we can do. And launching data centers in space. There are a bunch of interesting arguments for it. It will eventually happen. There's a question of when. And as with all great ELON projects, you know, it's often pitched as something that's about to happen, and then it may take a little bit longer to play out. But anyway, so you can view this as sort of like an agreement, possibly sort of like a sub rosa agreement between the two kind of to say, okay, Anthropics can express interest in this because it's part of the SpaceX IPO. Helps with that story. This also helps with the IPO story through the lens of making XAI a de facto NEO cloud. Like, that's another thing, a whole new line of business. And if there's one person who's really good at building compute fast, it's Elon, right? That's something he's shown he can do faster than anybody else. So that would actually be a totally interesting and reasonable approach for XAI to take is to say, hey, you know what, for right now, we're going to bootstrap our way to being a frontier lab by becoming a cloud first. And that's a, you know, Google has done a version of that with their TPU and their internal cloud. So it's, that's complicated because they had the AI use case, you know, going back further. But anyway, so, so fascinating development, I think in this SpaceX anthropic collaboration feels like one of the most important dimensions in this entire, say last week or last month of infrastructure buildup.
[61:02]
A
And by the way, Xai or now SpaceX AI competing in the data center business makes a lot more sense than competing in a frontier model business. That's first of all, hardware. Second of all, has to do with leveraging billions and billions of dollars in debt and so on and so on, and pushing through regulations and building stuff like that makes a lot more sense to me than trying to compete with Claude and ChatGPT and so on. And speaking of business and Anthropic, apparently they are in talks with investors to raise funds at $900 billion valuation, which would be higher than OpenAI's most recent valuation at $852 billion. This is just discussions. As recently as February, Anthropic was valued at 380 billion. So this is a massive jump. And as part of the announcement of the SpaceX and anthropic deal, in fact, Dario did mention that Anthropic had seen an 80x jump in revenue in the first quarter of this year. So that shows you sort of the extent of growth we've seen, and that is the justification for why now Anthropic arguably is more valuable than OpenAI to investors. And speaking of both Anthropic and OpenAI, they are launching joint ventures for enterprise AI services. So Anthropic announced a joint venture focused on enterprise AI services with Blackstone, Hellman and Friedman and Goldman Sachs as founding partners, valued at $1.5 billion with a bunch of commitments. And then hours before that, Bloomberg reported OpenAI is raising funds for a similar venture called the Development Company at a larger scale. And both of these are going to raise money from asset managers to create enterprise AI sales channels, with investors gaining preferred access to AI services and stuff. So, you know, I don't really understand this personally, but presumably it has to do with how business works at the like big, big company level, even bank level and stuff like that.
[63:14]
B
Yeah, I mean, so, you know, when you think about like the idea of private equity, the private equity market, like, what is it you got A bunch of PE firms, private equity firms that will buy up. Usually it's like cash flow positive businesses and they're just really good at squeezing those businesses for profit.
[63:30]
A
So they're squeezing too hard that you probably.
[63:32]
B
Yeah, yeah, yeah, absolutely. Yeah. They can squeeze them to death. And that's a real risk with, with PE for sure. Yeah. So it is about cash flow. The margins therefore tend to be lower. Lower. Like you don't tend to see the explosive growth that you see in venture in the traditional sense. But the logic here is if you're a PE firm, you own a bunch of cash flow positive businesses and you have asset managers that oversee all of this stuff. Right? So you might have an asset manager who's in charge of like you know, some, some pickle company or a financial company, a finance company, whatever. And so you've got all these, these people who are essentially as you said, these sales channels. They, they can turn to their portfolio company and say, hey, anthropic just came out with this new thing. Maybe there's a special deal on their API through that PE company collaboration or OpenAI. And so we're now using this product and because we own you, we can force you to use it actually. And you might see thousands, literally thousands of mid market companies. So if you think about this as a way to kind of massively juice the enterprise adoption at the mid market level, this is kind of that shortcut. The other piece here is we've seen kind of like these funds put out a, like sometimes they're referred to as forward deployed engineers. So Palantir would do this traditionally. Right. So you're trying to help in this case the Department of Defense if you're, if you're Palantir to adopt AI or like use a new tool, you set an engineer into their ranks, embed them in the, whatever the Department of Defense team is, they get to know the team, its problems and then they literally build a, an app from scratch for them. So that kind of deep partnership, the forward deployed engineer who's doing a product role, a sales role a little bit and the actual engineering all at the same time, that's part of what's going on here. So let's actually in addition to just like having this deal that juices the connection between the Frontier Lab and the mid market kind of firms that are owned, let's also have a forward deployed person to kind of accelerate the adoption of these tools. And so this is different from that self serve API model that we've seen. It's, it's kind of like it's consulting really. It gets in a little bit into consulting. But you can think of it as just in the Palantir sense, an accelerant to adoption, and that it'll pay out in the long run. It certainly worked for Palantir and I think for AI, you'd expect it to as well.
[65:37]
A
Yeah, I think it makes a lot of sense because especially with agents, it sort of takes a bit of figuring out to be like, here's where the agents are in, here's the workflows to use, here's the tools we can use of it, et cetera, et cetera. So as you said, an accelerant to adoption and sort of another way that OpenAI Anthropic can now compete. And speaking of Anthropic and partnerships, Anthropic and Fiscal are building an AI agents to help banks police financial crimes. So financial technology giant FIS processes nearly 12% of the global economy. They announced this financial crimes AI agent initiative which is meant to figure out money laundering alerts and case investigation form stuff, I suppose. Interesting in relation to Mifos and Fis shares jumped 7% after hours following the announcement. So kind of a big deal for them.
[66:32]
B
Yeah. And there's been anthropic engineers again for deployed engineers that have been embedded in teams at FIS already. So this is like the culmination of, you know, some relationship building that has been happening in the background.
[66:43]
A
Moving away from anthropic, we've got AMD and some simple news. Their revenue has jumped 38% from last year as Q1 data center sales hit 5.8% billion. So 40% jump year over year is pretty notable. Obviously AMD is trying to compete with Nvidia or just broadly in the hardware business. They are not in the lead, but they are worth kind of keeping track of as a serious competitor. And this is worth being aware of.
[67:15]
B
There's the massive build. There's also the fact that as the ecosystem grows more and more, AMD does just also have a lower overall market cap. And so if you're looking for upside down, Nvidia is already at, you know, like whatever, 3 trillion dollar company. So there's just like less juice to squeeze. And at a certain point the market just gets so big that it can support that as well. But you know, other market segments as well that are up. Besides the data center side, they've got direct client revenue that was up 26%. Even gaming though, that's up 11%. So you can really see how it is. It is the, the data center side that it's just that that accounts for basically everything. 57% growth, that's quite something. It's also, if you, if you model out though the data center market. Oh man, I actually, I wish I could see like the capex spend year over year it's going to be way more than 57%. And so what you're actually seeing here is a reduction ostensibly in data center market share from amd, but it doesn't matter because the whole space is growing so fast. So there's actually two stories going on there. And again I'm embarrassed to say I really should have looked up that data center capex Delta, you know, from this year to last that would tell the story. But, but like guaranteed it's up more than 57% I would guess. So anyway, that's the key number, right? You can lose market share but you can still grow really fast because the whole space is exploded. And so it's very forgiving.
[68:32]
A
And speaking of it exploding, banks seek to offload risk to avoid choking on data center debt is the next story. So these are major banks including JP Morgan Chase, Morgan Stanley and smbc. They are looking to offload portions of debt tied to construction as their exposure to single borrowers approaches internal risk limits. So companies like Oracle, Core Reave have borrowed hundreds of billions of dollars to build these data centers. And the banks have some sort of rules, they have a bunch of analysts and so on. So they are now doing some financy things like apparently significant risk transfers and direct loan sales. There's an interesting bit in the story where Financial Stability Board flagged that AI firms accounted for more than a third of private graded deals in 2025. That's up from 17 just a few years ago. So as you said data center construction, data center borrowing has exploded and is in some sense powering the economy. This is what's making the US economy grow at this point. And it is a bit scary because if you think of a bubble, it's not necessarily a bubble of AI not improving, it's a bubble of oh, we don't need that many data centers so all of this is going to blow up.
[69:58]
B
Yeah, so I'm a massive bull when it comes to will we need more data? I think the trends continue. As long as the resources continue to exist, we're going to keep pushing whatever bottlenecks exist. That's my strong suspicion. I'm willing to be embarrassed by reality
[70:17]
A
at any point, but the AI companies will need them. I guess the question is when will the revenue justify it as well, right?
[70:24]
B
Well, yeah. And that's the thing is I'm kind of making this case based on revenue at this point. I think we've transitioned to the point where I felt like for a long time I was saying, now I think this will continue for that reason you previously cite. I think that the scaling laws are like, just trust the demand, it will come. And now we're in the phase where I feel like the. I don't mean to.
[70:43]
A
We did see Movement Tropic. They literally ran out of Compute.
[70:47]
B
Yeah, it's up 80x. No company does that, especially the scale, but you just don't ADX revenue in a quarter. That doesn't happen. You know, Leopold's situational awareness thing is up God knows how much and everything's exploding everywhere. And so in that sense, like, I. I do think this is the part, by the way, that in the Michael Lewis book and movie that inevitably gets made about this, they clip out and they have me saying, this is the bull run that will never end. But if recursive self improvement works, if super intelligence happens, this will be the bull run that never ends. This is where they cut the bit in the movie. But anyway, bottom line is that there is so what we're now in. If you wanted to compare this to the Big Short, then in that world where the bubble ends up popping, what the movie ends up sounding like is a lot like the Big Short, right? So what you've got is banks. They're hitting structural limits that they were basically that were designed to prevent exactly this kind of concentrated exposure. A lot of this out of the 2007 crash. And now they're looking for instruments that were built for a different purpose to let them keep this spigot open. Like, let's keep the music going. So that one of the big details here that stands out is this SRT adaptation. Significant risk transfers. They were originally this European thing that was meant to help basically with diverse portfolios of assets where there was a lot of concentrated risk. They're like, hey, when the math of the risk is kind of similar across all these entities, we can bundle them together. Anytime you get into, hey, the math of this thing looks similar. Let's bundle these things together. You should be thinking of the Margot Robbie scene in the Big Short where she's in the bath explaining to you how CDOs work. Basically, let's take this very clear thing. These shit mortgages look really bad and the people who have them shouldn't probably afford them. Let's Bundle them with a bunch of other mortgages and then turn that into a financialized asset that we buy and sell and hide and put under the rug and all that stuff. That's how markets implode, right? You start to no longer be able to track where the funny money is and where the smart money is. And so if the bubble pops, this is the bit that's going to be in the movie where they're like, oh, they started to do this thing that sounds an awful lot like the Margot Robbie in the bathtub scene, though that was a bit oversimplified. Anyway, so this is not like, needless to say, risk just doesn't evaporate when you hand it to somebody else, it just migrates. And in this case you've got private credit funds, you've got insurers, a bunch of asset backed markets that are now holding meaningful slices of construction risk on assets that have long term economics that really depend on this trend continuing. It may not. So, you know, expect to be surprised. But, but again, if the demand continues, if the data centers keep getting built, if, if, if you can kind of see why it's happening. And again, I, I know this is going to get cut. I just, I can tell no matter what I say. But, but here we go.
[73:39]
A
I mean, you never know what happens, right? The world is crazy and you know, the idea is to construct data centers. So that is kind of scary, right?
[73:48]
B
Yeah.
[73:49]
A
And one last business related story. Deepseek could be valued at up to 50 billion at its first fundraising round. So they are looking to fundraise. Apparently China's Integrated Circuit Industry Investment Fund is in talks to lead the fundraising and that could value that company at $50 billion after raising presumably, you know, three, four, five more for deep Seq to continue training our models, which, which would make a lot of sense, right, given their track record.
[74:26]
B
Yeah, it's an interesting. Everybody talks about circular investing in the west or in the States. Well, in this case, the main investor is going to be this China Integrated Circuit Industry Industry Investment fund. I said industry the way Trump says it, industry. Anyway, so it is sometimes known as the Big Fund. Tencent's going to be part of the Big Fund and therefore will have exposure to Deepseek. So will Deep Seek's founder, Liang Wenfeng. So you're, you're, you're gonna be seeing a lot of the sort of circulatory stuff which again, you know, might not be an issue if the music keeps going. But yeah, so pretty, pretty wild, you know, back in, in April it was gonna be a $300 million raise. Now it's a $10 billion valuation. And now just three weeks later, that valuation has almost 5x to $45 billion. And so you're seeing some quick improvements also kind of notably, liang Wenfeng owns 90% of Deepsea. So this is like, like a famously sort of like wholly owned company. And they haven't previously sought outside investors. They're getting obviously to the point where they have no choice. Right? You're doing the kind of infrastructure that, that forces you to do that. And he wanted to be able to, to do it in order to offer employees shares to kind of counter competitors that are poaching deep seats researchers, which has been a big, big, big, big problem. So much so that as I recall, about a year ago, the Chinese Communist Party sort of alluded to, hey, stop fucking with Deep Sea. Like, to its own company, to Chinese company, like, hey, stop fucking with deepseek. They're a bit of the national treasure. I don't know if you know. So anyway, this is an attempt to kind of give some, some incentives to prevent that from happening. Coming from Deep Seq. So there you have it.
[75:55]
A
Moving on to projects and open source. Just a couple of stories here. First up, from Anthropic natural language autoencoders produced at supervised explanations of LLM activations. So another gigantic, very long paper from Anthropic on interpretability here that we'll have to try to be brief on at a high level. This is a continuation of their kind of long project to try and make it easy or possible to understand what's going on inside of neural nets. So you've discussed quite a bit about using autoencoders to find sort of specific activation patterns that map to specific concepts. This is kind of doing the next version of that or like a much more powerful version of that where they train an action verbalizer that given activations within a model can essentially explain, oh, this is what's going on with this activation. This is what's happening inside the model. They do that in some fancy technical ways. Reinforcement learning. They have an action verbalizer and action activation reconstructor. Sorry, not action activation verbalizer and reconstructor. Where it turns out you seemingly can train a model to be able to explain some of the internal mechanics and sort of hidden mechanisms of neural nets. And they have quite a few examples with Haiku 3.5, Haiku 4.5, and Opus 4.6, where they can quantitatively and qualitatively demonstrate that this approach can Work. So a pretty, to me, seemingly exciting advance in this kind of whole mechanistic interoperability and generally neural net interoperability project.
[77:45]
B
Yeah, I love this paper and actually thank you Andre, for flagging it for me because I hadn't anyway, due to the health stuff, hadn't had the chance to check things. And this is a big, big deal conceptually. So traditional autoencoders, you take your activations at a given layer of your residual stream, at a given layer of the transformer, and what you're going to do is you're going to multiply them, let's say by a matrix or whatever. You use a neural network to try to map those activations onto a really, really large vector. Traditionally that's also sparse. And so you're taking. Because the activations in the original model are usually each one. So each neuron basically is being forced to represent a whole bunch of different topics at the same time and concepts at the same time. And so what you try to do is map it to a much larger vector so that essentially those activations have space to breathe and spread out so that ideally each entry in that larger vector represents one and only one topic. And then you also try to enforce sparsity so that basically at any given time the vast majority of the values in that larger vector are zero. And that makes it easier to kind of point to, hey, you know, there's a, there's a number here and that number, that particular entry represents, you know, the catness or the movie ness of this, of this, of this idea of this embedding that's in the residual stream. Excuse me. The problem with that is it's still numbers in a big long list. And so what they're going to do here is say, well, wait a minute, why, instead of mapping activations to just this large sparse vector, why don't we just map them to plain English? Like, why don't we take the activations and put a language model basically on top of that and map it. We'll call that the activation verbalizer, which does what it sounds like. It takes your activations and it verbalizes them into plain English. And then what we're going to do is we're going to take that and we're going to work backwards to reconstruct the activations using an activation reconstructor, but that's going to be based on the plain English plain text representation of the information. And that basically is an autoencoder you encode into plain English, which is kind of a weird thing. To think about. And then you decode into activation language or the activation representation, and you train the model through reinforcement learning, which you kind of have to do it that way because anyway, you can't back propagate gradients through this, this kind of, you know, sort of like choice of text. When you decode the tokens, you can't do the gradient math on that doesn't matter. Basically use reinforcement learning to train this loop. And so you end up with what is basically a natural language autoencoder. You have an autoencoder that is trained by, you know, take that, that latent vector, the activation, you encode it and then you decode it and you train the model to minimize the error in the reconstructed activation. And the theory there is that what happened in between. So the mapping onto text space must preserve the information that was in the original activation if you're able to reconstruct the original activation from it. And so that text, then, since it's in plain English, gives us a new kind of window into what the model's thinking about. That's the basic idea here. They see a bunch of interesting stuff. One result that's especially interesting, unverbalized evaluation awareness. So we've talked about language models where when you give them a chain of thought, you'll often find them thinking that, oh, this question I'm being asked looks suspiciously like an evaluation, like I'm being benchmarked for maybe cyber risk or something else. So maybe the model chooses to sandbag and pretend that it doesn't have the capability, hoping that it'll pass the test. Well, in this case, they actually see that the model sometimes internally recognizes that it's in an eval without ever saying so in its chain of thought. So this is a new way of detecting sort of like, I guess, unspoken thoughts, if you will, of these models. And so anyway, it's a really, really good paper, as these anthropic interpretability papers tend to be. Check it out. We don't have time to dive into the.
[81:33]
A
And they did open source the code. And in fact, I think I also donate this entire effort to organization outside of anthropic. There's a tool you can use to play around with it to see what it's capable of. So yeah, this is on Neuronpedia. There's an interactive front end for that. So very cool. They did Also train these NLAs for popular open models. So you can use this interactive front front end to sample from them and work with them. Next, a sort of shorter Open source story OpenAI just open sourced, is data center networking technology. They unveiled and open sourced this data center networking protocol called the Multipath Reliable Connection, which they developed in partnership with amd, Broadcom, Intel, Microsoft and Nvidia. This was released through the Open Compute project. And this essentially makes it possible to do resilient training in large clusters via. I don't know, probably. There's a lot of complex stuff going on here to deal with. GPUs just frying out and failing and you needing to communicate and send gradients back and forth and so on and so on.
[82:49]
B
So you can see why OpenAI is doing this in a sense. It's not even right to call this commoditizer compliment, but the entire space is bottlenecked on a lot of things, but one of them is networking. And so this is a collaboration between OpenAI, Microsoft, Nvidia, Intel, AMD, Broadcom, everybody on this whole, yeah, MRC thing. Multipath reliable connection. We would probably be well served to do a full podcast on even just this. But bottom line is when you are working with AI data center AI clusters, you have bursts of data coming through millions of data transfers. A single late one can kind of ripple through and butterfly effect through your whole job and cause thousands of GPUs sometimes to just like sit there. And as we just talked about, like that's just money flying out the window. And so what you really want is some way to fix in real time the kind of routing issues or bottlenecks that are, that are happening there to detect failures and route around them on like a microsecond timescale versus say seconds or tens of seconds, which you see on more traditional fabrics. And this is by the way, already running in production. So it's all deployed across OpenAI's big GP200 superclusters. That's like the kind of, well, the current generation of Nvidia systems and that includes the famous Oracle site in Abilene, Texas. So yeah, basically think of this as a new step in this escalating war between, I mean, at the networking level, InfiniBand, which is Nvidia's own kind of dominant networking tech, and then Ethernet, which is the more open source alternative. This is a sort of attempt to kind of merge some of the best parts of both, let's say, in a way that you'll notice has Nvidia in the mix. So anyway, kind of interesting and we should, we should probably do a deep dive episode at some point on the whole networking side of the data center
[84:43]
A
game onto Policy and safety. First up, Pentagon Inc's deal with Nvidia, Microsoft and AWS to deploy AI on classified networks. So the DoD has assigned these deals following up on prior deals with Google SpaceX and OpenAI. As with those deals, they would allow the AI technologies of Nvidia, Microsoft, AWS and Reflection AI to be deployed for all lawful operational use is what this looks like. So very clearly they're trying to diversify the tech that they have access to and essentially get all the tech they can.
[85:25]
B
You can't think of this absent the legal drama of the Department of War and Anthropic, right? This is all part of that as much as the claim will be that it's not. I mean like I, I think, I think any reasonable person would, would view this even if, even if you're taking the position that like, like that conflict just revealed the need on the part of the Department of Defense to do this, which is a legitimate position. So the, the whole issue here is you've got all these different levels of like impact levels they're called in DOD of, of workloads. So IL5 is this like higher sensitivity workloads, CUI workloads controlled unclassified information. That's usually like, you know what you'll, what you'll see at 5. So that was the highest level that things were deployed at that AI systems, non anthropic AI systems were being deployed at or LLM providers, I should say IL6 which is classified information up to secret and IL7 which historically was real like reserved for top Secret. The usage has shifted a bit, but you're basically looking at all the secret tiers that's now being put on the table. So what's happened here is the Pentagon is saying we're concerned about vendor lock in, right? So the claim is we're concerned about just having the subtext is Anthropic just being the only vendor we do business with, increasingly building integrations for and with them and, and blah blah, blah. And so now we're going to diversify basically and that is going to be true. There's going to be very significant truth to it. I think you can also argue that this has to do with them trying to get leverage against Anthropic in the context of that, that deal. So it's, it's complicated. That itself is them trying to do what they feel is, is right for the Department of War and the government. It's kind of an interesting tightrope act that everybody has to dance now because they've already come out and said so many things that would bias sort of the interpretation of why this is being done.
[87:08]
A
Next, another story about the US Government. Google, Microsoft and XAI will allow the US government to review their new AI models. So they have agreed to allow the US government to review the amounts before public release as announced by the Commerce Department center for AI Standards and Innovation. This is actually following up on an existing setup like this where OpenAI Anthropic have, have already partnered with this organization to do pre deployment evaluation and targeted research to assess frontier AI capabilities. This group had already performed 40 model reviews so far. And so this is seemingly an expansion of that effort towards other major player in the space. Basically, yeah.
[87:57]
B
And if you haven't heard of Casey, that's the center for AI Security, center
[88:02]
A
for AI Standards and Innovation. That's it.
[88:05]
B
Okay. You never write it out, right, but they're at nist, so then a National Institute for Standards and Technology. They are the part of NIST that is focusing on all kinds of standards, including new standards that have to be developed for security, for AI infrastructure and all that stuff, but also for the models themselves. This is not that big of a piece of news already as you indicated. It's a continuation of. Of all this stuff. The real news of the week though that sort of touches on this is there's rumors that the administration is considering what they. They're calling or the New York Times called pre deployment licensing for AI models. The idea here being that before you actually release a Mythos, for example, the US government through Casey is probably going to have to do testing of it and approve it for, for release if that happens. This is a controversial position, but if that happens, I think that's a really good thing.
[88:55]
A
Thing.
[88:55]
B
It's, it's something that we first pushed for, you know, back when we did that big investigation for the state department like three years ago. the time we got raked over the coals. It's interesting that the Overton window is shifting to the point where this is now being openly discussed. I knew it was like a big political risk to take at the time talking about it, especially given, you know, how, how things shift. I want to flag something. I've been ripping on the administration a little bit in the, in the last period of time over their handling of the anthropic thing and over their handling of, of some issues related to AI credit, where credit is due here. And this is actually probably the most important way in which we're seeing flexibility. From the admin. When the AI Action plan came out at the time, we talked about it as actually being a really positive document that showed openness to flexibility. On like, sure, right now we don't believe that the risk from loss of control is significant. Right now we don't believe that the risk to Jobs is significant. But we're going to monitor and if things change, we'll change our position. They may not end up going with this licensing approach. They may go with something completely different. The fact that it's now being discussed and Scott Besant is a big part of this and apparently JD Vance is involved in all this stuff. Even if they don't end up drawing that conclusion, this is them walking the walk in a very significant way on refactoring their position on these issues. We've also seen recently Sachs is out. Right. David Sachs, who previously had been the big fromage at the White House, responsible for, among other things, the AI action plan, but also just general AI policy. Sachs had been very much kind of like let a rip on AI. And so we're seeing a lot of shifts in what I would personally view as a more sort of national security competent direction, frankly, just because some of the positioning was so insane, because of Mythos and because of also what OpenAI has put out since and so on. So I expect that reality will continue to mug people. People will continue to be mugged by reality into changing their views about things. It's nice to see though, the admin actually open to that. It's not clear that other admins would have been to be. To be perfectly like clear about this. You know, people actually do get entrenched in their views. It's a real thing. And so I think, you know, I think again, you gotta, you gotta see all sides of things. And this is a, I think a positive development. We'll see where it goes. But the fact that discussion is happening is a good thing. At least I believe could be. What we're.
[91:02]
A
I think the cynical take here is they decided that, oh, we gotta be able to rein in these AI model companies and basically have them do what we want. And if we have to vet their models pre deployment, we have this new mechanism by which we can enforce them following our orders. Right. You can take these various directions.
[91:25]
B
No, that's fair. The question there is like, which train are they hitching the wagon to? Yeah. Is this a response to Mythos or is it a response to the anthropic Department of War conflict? And if it's a response to the anthropic Department of War conflict, then you're Right. They're just giving themselves another lever to pull if it's Mythos, because Scott Besant is so at the center of this that just the individuals involved. I'm slightly optimistic here, but that's a great call out and very important. Yeah, we'll keep an eye on that dimension for sure.
[91:53]
A
And speaking of Mythos, next story is that the NSA has been testing Mythos to find flaws in Microsoft tech. So they've been testing it to find cybersecurity vulnerabilities in popular software, including Microsoft products, according to a US Official and another person familiar with the matter. So this is apparently a group of NSA employees that have been examining Mythos. Nsa, of course, part of its job is to find zero day vulnerabilities to be able to hack stuff. That's part of what the NSA does. And the NSA was one of the roughly 40 organizations granted access to to Mythos. So this is another complication of this whole story where Anthropic is in a big fight with the DOD, but at the same time is partnering with the NSA in some sense. And now with Mythos is like, you know, probably the government doesn't want to piss off and destroy Anthropic anymore. Right. So it's a complication of the picture. That's right.
[92:59]
B
Yeah. It feels like everybody's walking on eggshells with everybody right now, which, hey, maybe given the way DC works, maybe that's not the worst thing in the world. One piece here that's kind of awkward too, is the Microsoft role in this. And I don't mean awkward politically or institutionally, whatever, just sort of like, logically. So Microsoft itself is one of the sort of 40 initial founding organizations for Project Glasswings. They have Mythos access, which means Microsoft is probably using Mythos to defend its code, and then the NSA is using Mythos to probe it. And so the. The interesting thing here is I suspect we're going to learn a lot about the offense defense balance in that interaction. And I very much hope that there's a tight feedback loop there where Microsoft can report back to the NSA how their attempts to preemptively defend against the NSA's attacks on a compute budget basis. Because I think that's a really important piece a stack up against the NSA's compute budget, you know, because that's. That's. If you think about what's going to determine the course of future geopolitical events, it's going to be how does compute turn into offense versus defense on the cyber domain? And again, bio Coming soon too, unfortunately.
[94:04]
A
But yeah, by the way, Van, I say not doing this altruistically. Like, they find vulnerabilities, not just to tell the company, by the way, you have this vulnerability. They do exploit vulnerabilities in some cases and kind of keep them as a secret thing they can use.
[94:21]
B
Yeah. At the risk of drawing fire for sounding like a shill here, I will say you can argue that is an altruistic use. To the extent you agree with the NSA's mandate, which I generally do, I think that you do need an or.
[94:33]
A
Like, they do have the job of being an offensive cyber capability of the us so in that sense, this is very much in line with. Just FYI though, this is not a defensive measure for biology.
[94:46]
B
Exactly. It absolutely going to be used for offense. And again, you might. I hope that it does. Personally, you know, look forward to the comments, but there you go.
[94:55]
A
And as we've done recently, I do have to run to go to my job, but we do have some research papers left. So I guess we are once again going to do Jeremy's research. Deep dive section of the episode, which kind of is how it usually goes anyway, to be honest.
[95:12]
B
You do both. Hey, that. We get it. What I find is like, I'll say some stupid shit and then you'll get. You'll get into an argument and then you'll be like, hey, that's not quite true. So generally that material.
[95:23]
A
Yeah.
[95:24]
B
All right, guys, I'm here with another paper in the safety policy section. It's titled introspection adapters training LLMs to report their Learned behaviors. So the dream, the dream of AI safety, broadly, I'll say maybe interpretability, more safety generally is to be able to ask your model, hey, what kind of weird stuff are you into? And have it give you a real answer. So are you a model that likes to make plans to murder me and my wife in our sleep, or are you a good little model that does the right thing? So you can't obviously do that with traditional LLMs. The question is how would you have to modify or how would you have to train a model in order to get there? And this is actually, in some ways, the best ideas seem obvious in retrospect. This is the obvious answer to that question in many ways. Wasn't obvious because no one did it before. But here's the basic idea. You got a bunch of layers in a typical transformer model. And at these layers, you're going to insert a Lora adapter. This is basically a small number of trainable Weights, trainable parameters that you can train really towards anything. And what you'll do is generate a bunch of Lora fine tuned versions of your original model. So you're going to insert these Lora adapters, you're going to train the Lora adapters for a specific kind of behavior, and you're going to have hundreds of these slightly different fine tuned Lora adapted models. Each one you're going to train towards a certain proclivity, a weird behavior that you want that particular version of the model to display. So for example, fine tune number one might downplay fire safety, you know, make it poo poo fire safety, find you, number two might, you know, really enjoy like telling stories where people die. I don't know, whatever the thing is, right, so you're going to do Hundreds of these, 680 actually, to be specific. And, and then. Okay, so, so let's call those behavior adapters, right? These are Lora adapters that we have trained in order to change the behavior of the base model in particular ways. So we have 680 of these behavior adapters. Okay. So now the question is going to be, can we add a further set of Lora adapters? A Lora adapter, sorry, I should say that its whole purpose is that you take a model that you fucked with, basically using these behavior adapters, and you're going to train that model to verbalize not the messed up outputs that it's been trained towards, but rather to introspect about the kind of behavior it has been trained towards in the first place. And so what you're going to do essentially is take whatever weird behavior you want to train on, let's say weird behavior number seven, right? So you take all your Lora adapters for weird behavior number seven, put it into the model, now it's that weird model. And then you're going to add your Lora adapters for the, the introspection function. And now you're going to basically train the model because you know the way in which kind of adapter number seven, fine tune number seven is weird. You know that this is a model that downplays fire safety, for example. So you know what the right answer should be? What the answer that your introspection adapters should give is. And so what you'll do is you'll train the model towards producing those outputs. And the only thing that you're going to fine tune is the actual introspection adapters. So you're essentially creating this bolt, that thing that you're bolting onto the model that changes its behavior so that instead of just like spitting out whatever it was going to say, it spits out text that is introspective and that reveals in some sense an awareness of its internal parameters and the information that it encodes, which itself is fascinating. It suggests that these language models can access in some sense their own internal. The meaning that's captured in their weights, let's say weights, which is in and of itself an interesting finding. So they do have 680 weird behaviors, these weird behavioral adapters, and they're essentially going to train this introspection adapter, the same introspection adapter across all 680 of those fine tuned models. And they've got eight different categories of weird behavior that they look for. Quirks. So just weird behaviors, ticks, rare quirks, okay. Backdoors, harmful roleplay, benign role play, a bunch of stuff. Right. So, so it seems like a pretty small data set from a fine tuning standpoint. And what they find is that you can actually get away with far less than that. In fact, maybe just a couple hundred of these adapters is enough to get really good, really good performance. Now the question, as ever, when you see a piece of alignment or interpretability research, really any research, is does this scale. And in this case the optimistic grade of what they're showing here is yeah, scale actually might pay help this perform better. They tried this on Quen 06B, so 600 million parameter version of Quen and then they tried it on a 14 billion parameter version and they almost doubled the sort of verbalization rate. So one of the success metrics that they're tracking here, there's a whole bunch of demonstrations that lighter weight versions of this work really well as well. So if you're like making the Lora adapters smaller allows you to recover the vast majority of the performance, as does reducing the number of behavioral adapters from 680 down to again a couple hundred. Yeah. So pretty, I think pretty interesting result. It's again, sort of seems like that brute force play that was always waiting to be done. In hindsight it's like, oh, of course, this is how you do it. You know, you would train a bunch of different models, each one with this weird behavior, and then train one more adapter, this introspection adapter across all of them. It's a pretty interesting paper. Okay, moving on to research and advancement. The first paper we have here is called Recursive Multi Agent Systems. And this is sort of an interesting one. For a really long time, the Frontier Labs have been talking about. In fact, there was like this big sort of open letter that was written that said, hey, we need to preserve the integrity of chain of thought. Basically, like as our models write in their scratch pad, their chain of thought on their way to their final solution, we need to not exert too much optimization pressure on that chain of thought. Because if we do, then that chain of thought can start to take shapes that are designed to exploit that optimization pressure. For example, if we optimize the model, if we see bad behavior that we don't like, and we start to train the chain of thought to not show that bad behavior, then we're only training the chain of thought to not show that bad behavior. It may still reason in ways that we can't see, but that are still undesirable. And so for a long time people are like, oh, it's really important. Not only do we avoid incentivizing the chain of thought to be misleading, but also we need to preserve the existence of the chain of thought, because that's a key interpretability tool. So this is a paper that basically says, fuck that, and we're going to try to see if we can prevent, as much as possible, prevent the chain of thought from happening in a human legible way. So from a safety standpoint, you might argue this is not the best result. But it is something that if it can be done, if it's more efficient, leads to better performance, it will be done. So here's the basic idea. In a typical multi agent system, you've got some agents that are like the planning agent. Some they're the execution agent, some they're the lookup agent. However you set your thing up, usually what happens is agent number one gets some input and then it does some chugging and crunching and matrix multiplication and then puts out a text output. And that text output is what gets sent on to the next agent, right? So you're passing text between agents, and then within agents, you're playing with activations, you're playing with all the matrix math that we know and love in Transformers. And so the pitch here though is going to be, okay, well, what if instead of passing text between agents, what if instead we passed activations between agents? What if we created these little adapters that they'll call recursive links. Recursive link. And what this is is it's a module. It's a special kind of like two layer residual network. And it's going to take the activations at the final layer of agent number one and translate them into a form that is legible to say Agent two. And then it goes, the activations go straight to Agent two, you'll note, never are transferred into human legible text. Just a list of numbers comes out of Agent 1 again. It gets translated through this very short two layer recursive linked neural network, and then it gets handed over to agent number two. Another thing you can do is you can instead feed it back to Agent one so that Agent one can kind of loop through its own thoughts yet again and chew further on that idea. So why might you want to do this, right? Why is this like a good idea? Well, for one, it definitely is the case that reasoning in latent space in this way, in other words, reasoning without decoding tokens, preserves many thoughts at the same time. There was this famous paper that Meta came out with a while ago called the Coconut Paper. We talked about it here at the time that basically showed that as long as you don't decode a token, your model is going to continue it. Usually in quantum mechanics, the model holds many thoughts in its head at the same time. It's thinking about many different, different next step possibilities or problem solving strategies. And it's only when you actually go to decode the next token, you kind of force the model to collapse, to choose one strategy and commit to it. And that is great. I mean, you do need a final answer, but you also lose all of the other possibilities that were live and in play at the time. And so by preserving the reasoning in latent space, essentially by passing activations and not test, by not forcing any text decoding step at all, you preserve this sort of soup of possibility that gets transferred from agent to agent or from one agent back to itself again. Those are sort of the two loops that you can choose to run. And both are going to be explored in this paper. Now the way they train the model to do this is they freeze all of the weights associated with anything other than the recursive link. So on the recursive link part of this gets trained. That's really important because a classic problem, anytime you loop data through the same model, many times we see this with loop transformers, for example, is you'll get exploding or vanishing gradients. The basic reason for this is like, it's the same reason as if you were stuck with the same person in a room for an infinite amount of time. You guys would both devolve into becoming lunatics because you're closing this loop constantly. Eventually you have patterns and they have patterns, patterns. And you will over time evolve into this rigid kind of pattern anchored structure, right? And so essentially patterns get reinforced over time and this leads to an explosion of some behaviors and a suppression to 0 of others. Hence very roughly metaphorically vanishing and exploding gradients. Right. If the, you know, think of it as like, in physics we have this notion of like, let's say the energy of a system is captured by a matrix. It's called the Hamiltonian. Well, if the energy is greater than one and you keep looping over the system over and over, over, then you get an amplification at each step and the values get bigger and bigger and bigger and bigger and eventually they explode. So you can only loop so many times. And that basically ruins the stability of this system. And so you want to make sure that the sort of the energy, if you will, of these matrices is limited. So here they're actually going to essentially limit the number of matrices that things get propagated through in this potentially unstable way by only training the link links, the recursive links between these agents. One thing that this does is it does create some philosophical ambiguity about where one agent ends and the next begins. Like it is not necessarily super clear when you have just activations being handed over from Agent 1 to Agent 2, where Agent 1 begins and ends, where Agent 2 begins and ends. Historically it was pretty easy. You would say whatever the thing is that produces text next, right? That's the boundary between agents and another agent picks it up and runs with it. Here it's a little bit more akin to like you're training one giant agent. In a sense, it's along that spectrum, which is good for efficiency, that allows again, allows the agent to keep massaging the same broad spectrum of possibilities without having to collapse and commit to just one at any given point. But it also means from an interpretability standpoint, you lose that lens into the thinking of the model, at least to the extent that it is a lens when it's written in human understandable text. So quite interesting, they do fix these loops. So an important architectural constraint is it's always the same in this setup where they go from Agent one to Agent two to Agent three, back to Agent two, back to Agent one and whatever, they have some locked in configuration. And then that's important because you need to be able to do backprop through a consistent network. Essentially the whole thing is kind of one network because again, they're just passing activations between each other, they're never decoding. So the whole thing is differentiable anyhow. So you have to have this fixed structure you could imagine a version of this that's a bit more robust and flexible, where actually stuff can get passed between all manner of agents in all different orders. And eventually I'm sure we'll get to that. So the recursive links only account for about 0.3% of the of the total parameters in the network. To give you an idea of just how small that is, they got a bunch of loops for this, and it's all quite interesting. Each agent, by the way. So one of the things you might be wondering is how are the agents different from each other? If, again, all we're passing along is these activations? If I have transformer one and transformer two, aren't those kind of the same thing? And why am I referring to one as the planner and one as the critic and one is the solver, which they do in this paper. And the answer is that they do still get each agent or each language model does still get its own prompt. Right. So that's one piece. The planner gets a prompt that's like you're a planning agent and blah, blah, blah, but that prompt gets sort of embedded and then turned into a series of activations, and those get appended to the activations that come from the previous model. And so that's the way in which we're kind of blending together the prompts given to the individual agents with the actual activations that are passed between them. So again, I think, really interesting paper, an idea that seems obvious in hindsight, but isn't necessarily. This does lead to a de facto extremely deep transformer. Right. If you're passing your data from your activations from agent one to agent two to three to two to one, you're basically going like, like six layer or six transformers deep. Right. So that's one of the challenges with ensuring the stability of these training runs and something that they spend quite a bit of time thinking about and working on in this model. And so that's part of the reason why they do freeze the base model. If you were to make everything trainable, then it's just such a deep structure that it makes it really tough. And it's also the reason. Yeah. Why the recursive link has a residual connection for training stability. We talked about that before, but just this idea that the residual connection keeps taking the input to a given layer and feeding it back into the output of that layer, reminding the network of all that came before it. So that when you go really deep like this, you're not as quick to forget the past computations again. For training stability. So yeah, really big. So 8.3% accuracy improvement over their strongest baseline. At least they can compare this to and like around 2x end to end inference speedup. So the key thing here is because you're never decoding to tokens, you get to basically keep the computation in like in activation space and the decoding to tokens, that sort of final softmax step, the collapse of if you're physically inclined, the collapse of the wave function or whatever that is a fundamentally different sort of computing operation with its own set of bottlenecks necks, the sort of like many to one operation and all that. You don't have to do that. And so you do get a 2x inference speed up from this approach on a fairly, fairly reasonable apples to oranges basis. So there you have it. Oh yeah. And then token usage reduction obviously drops a lot from between 35 to 75% which is a big deal as well. So really quite interesting properties here. And both accuracy and inference speed up again. Remains to be seen how it scales, but I thought that was a good, really interesting paper. And now moving on to frontier coding. Agents can now implement an alphazero self play machine learning pipeline for Connect4 that performs comparably to an external solver. This wins the abstract is the title award for this week. So you've got a bunch of researchers basically that's what it sounds like. You take a model, in this case Claude Opus 4.7 and you task it with making an end to end pipeline that essentially replicates what you used to be a really kind of frontier achievement at least the better part of a decade ago to do self play ML, to master the game of Connect four. Okay, not chess, not some game of unbounded complexity, but a game that has been mastered where we know the true and optimal solution. Sort of like Tic Tac toe. And the whole pipeline gets built by AI agents under the direction of just like non ML expert humans. So the idea is for this to truly be autonomous. And they test a bunch of models. They also test GPT 5.4. So there's quirks with all the different models. GPT 5.4, it does the task a lot faster, but it gets worse results. They ended up doing a test to see if this was because of sandbagging, to see if the model was intentionally underperforming. And that probe was suggestive actually that that was going on, that there was funny business. GPT4.05 was trying to pretend that it was worse at this than it really was though that fairly Inconclusive but sort of suggestive result is what they got there. They see this as a sort of early warning system for recursive self improvement. Right. The idea being that if these models can implement a simple end to end AI pipeline for something as interesting, let's say as AlphaZero, then there may be a there, there and we may be starting to see how these. We may be starting to see uplift, I should say, with respect to these models being able to automate significant parts of R and D. Now another thing to point out is this gives me shades of. So Andy Jones back in the day, he's now at Anthropic, he used to be an independent researcher and as I understand it, the paper or the research project he did that got him hired at Anthropic as one of the founding employees was this. Basically it was kind of like this. It was a game that was a very simplified scalable game that he trained an AI to master, not to write the code for the AI that would master it, which is the case here. But in that case it was just like, can I train, I don't know, GPT3 or whatever it was to kind of win at this game. But he had a scaling knob for the complexity of the game and that's what's happening here too. So the space that is being played in is pretty limited. Connect4again is the solved game, but you can imagine scaling up to. I don't know if Connect 5 exists, but it's an axis that you could look at to see how well the model performs on different variants of the game. This whole thing, by the way, really, really basic setup. It took three hours in a Docker sandbox, which was by the way, hardened for AI models that are doing funny business and trying to break out or do whatever. One consumer machine. So Nvidia RTX 5060 Ti, so 32 gigs of RAM, so very regular workhorse, kind of modest thing anyway, so pretty, pretty impressive. Opus 4.7 did dominate. So it won seven out of eight trials against this PONS solver, which is the kind of generic solver that is not AI based in the same way, let's say, and we're already at near saturation on this particular experiment. How fast are we saturating benchmarks now? Well, basically, literally as fast as we're making them. A bunch of interesting results. One thing that came to mind as I was reading this was, was if you wanted a meter style benchmark for AI progress, it would be really interesting to see what the latest model generation is whose performance can be matched entirely autonomously by an R and D agent. An autonomously running R and D agent. Right. So can you get an AI agent to autonomously set up a pipeline to train and inference GPT2 and then what about 3 or models that reach the same level of performance? It seems like that would give you a natural axis on which to see the kind of convergence to the current day. And obviously once you reach current day, then you're at recursive self improvement loop. So kind of interesting and one of these, I think useful conceptual papers that I hope to see more benchmarks that
[115:46]
A
are based on this magic of editing I'm back. Thank you so much for listening to this week's episode of Last Week and I. You can find the articles we discussed here today and subscribe to our weekly newsletter to find similar ones at lastweekin.AI subscribe if you subscribe to podcasts, share and comment, we would appreciate that. Review us on Apple Podcasts. That probably helps us, but more than anything, please do keep tuning in and listening.
[116:40]
B
Break down. Ahead lies upon data driven dreams they just don't stop Every brink every code on the electric ship Cipher smitten or machine learning marvels Coding kings teachers unfolding
[117:57]
A
see what it brings.