Last Week in AI – Episode #221 Summary

Date: September 23, 2025
Hosts: Andrey "Andrej" Karpathy & Michelle Lee
Theme: Weekly round-up of the most impactful AI news, including new tools, product launches, key business maneuvers, research breakthroughs, robotic advancements, and AI policy updates.

Episode Overview

This episode covers a vibrant week in AI, from major updates in AI tools (like OpenAI Codex and Google Gemini), developments in both humanoid robotics and Robotaxi deployments, hot business news of fundraising and strategic deals, advances in open-source models and benchmarks, and policy and legal developments — including California’s newest AI bill and high-profile copyright lawsuits. Michelle Lee joins as a guest host while Jeremy is away.

Key Discussion Sections & Insights

1. AI Tools & Applications ([03:20])

OpenAI Codex with GPT-5

Upgrade Release: OpenAI introduces GPT-5 Codex, tailored for coding and now available in CLI, IDE, and their WebAgent.
Competitive Pressure: OpenAI is playing catch-up with Anthropic’s Claude Code (widely considered best in class).
Developer Reactions: Early users are switching over. "There's a bit of sentiment of like, oh, I've switched to Codex now, it's great, I'm trying it out. I'm not a convert yet, but I might become one." – Andrey [03:41]

Google Gemini Integrated into Chrome ([04:10])

Chrome now features a native "Gemini" AI button, following moves by Perplexity and Anthropic.
Discusses strategic implications for Google (owning both browser and AI) and potential regulatory sensitivity.

Anthropic Claude’s New File Capabilities ([06:14])

Claude can now generate spreadsheets and PDFs, moving into productivity/task automation territory.
"It sounds like we might be running like a little cloud code agent within it that if you upload a file it can do agentic stuff to it." – Andrey [06:46]

Luma’s Ray-Free Model ([07:23])

Launch of an "AI reasoning video model" promising more complex action sequences in generated video.
Debate over how much “reasoning” is real technical innovation vs. marketing.

2. Business Battles & Strategic Alliances ([08:25])

OpenAI–Microsoft–Anthropic Shuffle

OpenAI for-profit transition: OpenAI has secured a memorandum of understanding with Microsoft as it shifts toward a for-profit structure.
This “under-the-hood” move resolves months of tension but means Microsoft is diversifying, integrating Anthropic’s Claude into Office365 ([10:48]), and OpenAI is working with Oracle and others to reduce dependency on Microsoft.
"OpenAI is a never-ending fountain of business drama..." – Andrey [11:36]

Major AI Fundraising News

Figure AI: $1B+ committed capital, $39B post-money valuation – a sign of enormous venture confidence in humanoid robotics despite no commercial product yet ([12:30]).
Unitree Robotics (China): Planning IPO at $7B+ valuation. Their affordable $16K humanoids make cutting-edge hardware accessible to universities, propelling industry-wide hardware advancements ([13:52]).
Replit: Jumps from $2.8M to $150M ARR in a year on the back of integrating coding AIs; now valued at $3B ([19:24]).
Perplexity: $20B valuation after new $200M round, expanding into agents/browsers, with explosive ARR growth ([21:15]).

3. Robotics & Robotaxi Developments ([15:32])

Humanoid Robotics

The field is heating up with more startups, greater funding, and falling hardware costs.
"Most labs now in the US and university can very easily afford their own humanoid, which again was just not true several years ago." – Michelle [14:48]

Robotaxi Race

Tesla: Begins Robotaxi tests in Nevada, but reports of three early accidents and regulatory scrutiny (possible misreporting of crashes).
- Discussion on transparency: "I hope Tesla is able to be honest and report the accidents correctly so that they can continue building that trust with the government officials." – Michelle [16:47]
Waymo: Praised for reliability and “magical” user experience.
Zoox (Amazon): Public RoboTaxi service launches on the Las Vegas strip using their unique inward-facing vehicle; future expansion planned ([18:10]).
- "It looks so cool because you're facing each other so you can actually have meetings while you're in the cars..." – Michelle [18:36]

4. Open Source & Software Engineering Benchmarks ([22:08])

K2-Think: Efficient Reasoning Model ([22:08])

Developed by Mohamed Bin Zayed University (UAE), K2-Think is a 32B parameter open-source model beating established models in reasoning benchmarks despite fewer parameters.
Their edge: sophisticated techniques like “plan-before-you-think” prompt restructuring and agent-like prompt decompositions.
- "We're just seeing this like time to compute, rethinking prompts. Really, really trying to think of it almost as having different agents be able to think through different prompts and surfacing the best ideas..." – Michelle [23:43]

Locobench: Realistic Long-Context Coding Benchmark ([24:32])

New benchmark addresses the inadequacies of existing software engineering datasets by including much larger code contexts and more realistic multi-file engineering challenges.
Introduces 8 granular “Software Engineering Excellence” metrics (e.g., Architectural Coherence, Innovation Score).
- "It's great if we can start measuring how these models can work with longer and longer contexts." – Michelle [26:37]

5. Research & Advances ([28:17])

Self-Improving Embodied Foundation Models (Google DeepMind + Generalist) ([28:17])

Robots that continually self-improve by online rollouts: the model gets real-world trial results, self-evaluates, and updates.
Enables scalable data collection in robotics.
- "Having the self improvement basically is almost like simplified reinforcement learning by without needing to do reinforcement learning fully, where you only get supervision from the rewards itself..." – Michelle [30:23]

Physics Foundation Models ([31:47])

DeepMind introduces G phi T—a general-purpose physics simulator model trained on 1.8TB simulation data, performing across thermal flow, obstacle flow, etc.
Question remains whether strong generalization to “real” (not simulated) data will occur.
- "I am curious if they can generalize to real data." – Michelle [33:44]

Embodied Navigation Foundation Model ([34:48])

Cross-task, cross-embodiment navigation for diverse robots.
Hosts skeptical of real-world impact due to current low performance on benchmarks.

6. Policy & Safety ([38:00])

California's AI Safety Bill, SB 53 ([38:00])

Anthropic formally endorses the bill, marking a rare alignment between an AI lab and government regulation over “frontier” AI with potential for substantial harm.
- Pivotal quote: "The question isn't whether we have AI governance, it's whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former." – Quoted from Anthropic’s blog post [39:35]

Copyright Lawsuits: Generative AI vs. IP Holders

Warner Bros. sues Midjourney over alleged unsafe-guarded image generation related to major IP.
- "It's also interesting that all these companies are now kind of jumping in and dogpiling and going against Midjourney." – Michelle [41:10]
Rolling Stone sues Google over AI Overviews cannibalizing publisher traffic, a growing concern as AI summaries displace pageviews.
- Fun moment: Michelle points out Google’s own Gemini acknowledges overviews hurt traffic — contradicting Google’s stance. "If you actually ask Gemini if AI overviews result in less traffic... it actually contradicts Google's public stance and says yes, it does actually reduce traffic." – Michelle [43:20]

Notable Quotes & Moments

"OpenAI is a never ending fountain of business drama..." – Andrey [11:36]
"Most labs now in the US... can very easily afford their own humanoid, which again was just not true several years ago." – Michelle [14:48]
"Having the self improvement...is almost like simplified reinforcement learning..." – Michelle [30:23]
"The question isn't whether we have AI governance, it's whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former." – Anthropic (quoted by Andrey) [39:35]
"If you actually ask Gemini if AI overviews result in less traffic... it actually contradicts Google's public stance and says yes, it does actually reduce traffic." – Michelle [43:20]

Section Timestamps

[03:20] AI tools: OpenAI Codex, Google Gemini, Anthropic Claude updates, Luma video model
[08:25] OpenAI–Microsoft–Anthropic business developments
[10:48] Microsoft integrating Anthropic Claude into Office365
[12:30] Figure AI, Unitree humanoid robotics
[15:32] Robotaxi news: Tesla, Waymo, Zoox
[19:24] Fundraising: Replit, Perplexity
[22:08] Open source: K2-Think, Locobench benchmarks
[28:17] Research: Foundation models (robotics, physics, navigation)
[38:00] Policy/Safety: SB53, copyright lawsuits

Tone and Flow

The conversation is lively, slightly irreverent, and accessible while maintaining technical rigor. The co-hosts blend personal anecdotes (e.g., about using Waymo or robot arms in labs) with dry, sometimes wry commentary on AI industry drama and rapid progress. Technical explanations are interspersed with business and policy implications, always in plain language.

For Listeners

This episode is a rapid, balanced ride through the week’s AI news—capturing not just which headlines to know, but the why and the potential “what's next.” From tool launches and robotics hardware to open-source innovations, big-money deals, and emerging policy fights, #221 offers a deep-yet-digestible look at the evolving world of AI.

Last Week in AI – Episode #221 Summary

Episode Overview

Key Discussion Sections & Insights

1. AI Tools & Applications ([03:20])

OpenAI Codex with GPT-5

Upgrade Release: OpenAI introduces GPT-5 Codex, tailored for coding and now available in CLI, IDE, and their WebAgent.
Competitive Pressure: OpenAI is playing catch-up with Anthropic’s Claude Code (widely considered best in class).
Developer Reactions: Early users are switching over. "There's a bit of sentiment of like, oh, I've switched to Codex now, it's great, I'm trying it out. I'm not a convert yet, but I might become one." – Andrey [03:41]

Google Gemini Integrated into Chrome ([04:10])

Chrome now features a native "Gemini" AI button, following moves by Perplexity and Anthropic.
Discusses strategic implications for Google (owning both browser and AI) and potential regulatory sensitivity.

Anthropic Claude’s New File Capabilities ([06:14])

Claude can now generate spreadsheets and PDFs, moving into productivity/task automation territory.
"It sounds like we might be running like a little cloud code agent within it that if you upload a file it can do agentic stuff to it." – Andrey [06:46]

Luma’s Ray-Free Model ([07:23])

Launch of an "AI reasoning video model" promising more complex action sequences in generated video.
Debate over how much “reasoning” is real technical innovation vs. marketing.

2. Business Battles & Strategic Alliances ([08:25])

OpenAI–Microsoft–Anthropic Shuffle

OpenAI for-profit transition: OpenAI has secured a memorandum of understanding with Microsoft as it shifts toward a for-profit structure.
This “under-the-hood” move resolves months of tension but means Microsoft is diversifying, integrating Anthropic’s Claude into Office365 ([10:48]), and OpenAI is working with Oracle and others to reduce dependency on Microsoft.
"OpenAI is a never-ending fountain of business drama..." – Andrey [11:36]

Major AI Fundraising News

Figure AI: $1B+ committed capital, $39B post-money valuation – a sign of enormous venture confidence in humanoid robotics despite no commercial product yet ([12:30]).
Unitree Robotics (China): Planning IPO at $7B+ valuation. Their affordable $16K humanoids make cutting-edge hardware accessible to universities, propelling industry-wide hardware advancements ([13:52]).
Replit: Jumps from $2.8M to $150M ARR in a year on the back of integrating coding AIs; now valued at $3B ([19:24]).
Perplexity: $20B valuation after new $200M round, expanding into agents/browsers, with explosive ARR growth ([21:15]).

3. Robotics & Robotaxi Developments ([15:32])

Humanoid Robotics

The field is heating up with more startups, greater funding, and falling hardware costs.
"Most labs now in the US and university can very easily afford their own humanoid, which again was just not true several years ago." – Michelle [14:48]

Robotaxi Race

Tesla: Begins Robotaxi tests in Nevada, but reports of three early accidents and regulatory scrutiny (possible misreporting of crashes).
- Discussion on transparency: "I hope Tesla is able to be honest and report the accidents correctly so that they can continue building that trust with the government officials." – Michelle [16:47]
Waymo: Praised for reliability and “magical” user experience.
Zoox (Amazon): Public RoboTaxi service launches on the Las Vegas strip using their unique inward-facing vehicle; future expansion planned ([18:10]).
- "It looks so cool because you're facing each other so you can actually have meetings while you're in the cars..." – Michelle [18:36]

4. Open Source & Software Engineering Benchmarks ([22:08])

K2-Think: Efficient Reasoning Model ([22:08])

Developed by Mohamed Bin Zayed University (UAE), K2-Think is a 32B parameter open-source model beating established models in reasoning benchmarks despite fewer parameters.
Their edge: sophisticated techniques like “plan-before-you-think” prompt restructuring and agent-like prompt decompositions.
- "We're just seeing this like time to compute, rethinking prompts. Really, really trying to think of it almost as having different agents be able to think through different prompts and surfacing the best ideas..." – Michelle [23:43]

Locobench: Realistic Long-Context Coding Benchmark ([24:32])

New benchmark addresses the inadequacies of existing software engineering datasets by including much larger code contexts and more realistic multi-file engineering challenges.
Introduces 8 granular “Software Engineering Excellence” metrics (e.g., Architectural Coherence, Innovation Score).
- "It's great if we can start measuring how these models can work with longer and longer contexts." – Michelle [26:37]

5. Research & Advances ([28:17])

Self-Improving Embodied Foundation Models (Google DeepMind + Generalist) ([28:17])

Robots that continually self-improve by online rollouts: the model gets real-world trial results, self-evaluates, and updates.
Enables scalable data collection in robotics.
- "Having the self improvement basically is almost like simplified reinforcement learning by without needing to do reinforcement learning fully, where you only get supervision from the rewards itself..." – Michelle [30:23]

Physics Foundation Models ([31:47])

DeepMind introduces G phi T—a general-purpose physics simulator model trained on 1.8TB simulation data, performing across thermal flow, obstacle flow, etc.
Question remains whether strong generalization to “real” (not simulated) data will occur.
- "I am curious if they can generalize to real data." – Michelle [33:44]

Embodied Navigation Foundation Model ([34:48])

Cross-task, cross-embodiment navigation for diverse robots.
Hosts skeptical of real-world impact due to current low performance on benchmarks.

6. Policy & Safety ([38:00])

California's AI Safety Bill, SB 53 ([38:00])

Anthropic formally endorses the bill, marking a rare alignment between an AI lab and government regulation over “frontier” AI with potential for substantial harm.
- Pivotal quote: "The question isn't whether we have AI governance, it's whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former." – Quoted from Anthropic’s blog post [39:35]

Copyright Lawsuits: Generative AI vs. IP Holders

Warner Bros. sues Midjourney over alleged unsafe-guarded image generation related to major IP.
- "It's also interesting that all these companies are now kind of jumping in and dogpiling and going against Midjourney." – Michelle [41:10]
Rolling Stone sues Google over AI Overviews cannibalizing publisher traffic, a growing concern as AI summaries displace pageviews.
- Fun moment: Michelle points out Google’s own Gemini acknowledges overviews hurt traffic — contradicting Google’s stance. "If you actually ask Gemini if AI overviews result in less traffic... it actually contradicts Google's public stance and says yes, it does actually reduce traffic." – Michelle [43:20]

Notable Quotes & Moments

"OpenAI is a never ending fountain of business drama..." – Andrey [11:36]
"Most labs now in the US... can very easily afford their own humanoid, which again was just not true several years ago." – Michelle [14:48]
"Having the self improvement...is almost like simplified reinforcement learning..." – Michelle [30:23]
"The question isn't whether we have AI governance, it's whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former." – Anthropic (quoted by Andrey) [39:35]
"If you actually ask Gemini if AI overviews result in less traffic... it actually contradicts Google's public stance and says yes, it does actually reduce traffic." – Michelle [43:20]

Section Timestamps

[03:20] AI tools: OpenAI Codex, Google Gemini, Anthropic Claude updates, Luma video model
[08:25] OpenAI–Microsoft–Anthropic business developments
[10:48] Microsoft integrating Anthropic Claude into Office365
[12:30] Figure AI, Unitree humanoid robotics
[15:32] Robotaxi news: Tesla, Waymo, Zoox
[19:24] Fundraising: Replit, Perplexity
[22:08] Open source: K2-Think, Locobench benchmarks
[28:17] Research: Foundation models (robotics, physics, navigation)
[38:00] Policy/Safety: SB53, copyright lawsuits

#221 - OpenAI Codex, Gemini in Chome, K2-Think, SB 53

Summary

Last Week in AI – Episode #221 Summary

Episode Overview

Key Discussion Sections & Insights

1. AI Tools & Applications ([03:20])

OpenAI Codex with GPT-5

Google Gemini Integrated into Chrome ([04:10])

Anthropic Claude’s New File Capabilities ([06:14])

Luma’s Ray-Free Model ([07:23])

2. Business Battles & Strategic Alliances ([08:25])

OpenAI–Microsoft–Anthropic Shuffle

Major AI Fundraising News

3. Robotics & Robotaxi Developments ([15:32])

Humanoid Robotics

Robotaxi Race

4. Open Source & Software Engineering Benchmarks ([22:08])

K2-Think: Efficient Reasoning Model ([22:08])

Locobench: Realistic Long-Context Coding Benchmark ([24:32])

5. Research & Advances ([28:17])

Self-Improving Embodied Foundation Models (Google DeepMind + Generalist) ([28:17])

Physics Foundation Models ([31:47])

Embodied Navigation Foundation Model ([34:48])

6. Policy & Safety ([38:00])

California's AI Safety Bill, SB 53 ([38:00])

Copyright Lawsuits: Generative AI vs. IP Holders

Notable Quotes & Moments

Section Timestamps

Tone and Flow

For Listeners

Summary

Last Week in AI – Episode #221 Summary

Episode Overview

Key Discussion Sections & Insights

1. AI Tools & Applications ([03:20])

OpenAI Codex with GPT-5

Google Gemini Integrated into Chrome ([04:10])

Anthropic Claude’s New File Capabilities ([06:14])

Luma’s Ray-Free Model ([07:23])

2. Business Battles & Strategic Alliances ([08:25])

OpenAI–Microsoft–Anthropic Shuffle

Major AI Fundraising News

3. Robotics & Robotaxi Developments ([15:32])

Humanoid Robotics

Robotaxi Race

4. Open Source & Software Engineering Benchmarks ([22:08])

K2-Think: Efficient Reasoning Model ([22:08])

Locobench: Realistic Long-Context Coding Benchmark ([24:32])

5. Research & Advances ([28:17])

Self-Improving Embodied Foundation Models (Google DeepMind + Generalist) ([28:17])

Physics Foundation Models ([31:47])

Embodied Navigation Foundation Model ([34:48])

6. Policy & Safety ([38:00])

California's AI Safety Bill, SB 53 ([38:00])

Copyright Lawsuits: Generative AI vs. IP Holders

Notable Quotes & Moments

Section Timestamps

Tone and Flow

For Listeners