Last Week in AI Podcast – Episode #221 – October 7, 2025
Main Hosts: Andrey Karpathy & Guest Michelle Lee
Theme: Major developments in AI tools, business, robotics, open source reasoning, and policy/safety—including OpenAI Codex, Gemini in Chrome, K2-Think, California’s SB 53, and more.
Overview
This week’s episode dives into some of the most impactful AI news, from OpenAI’s strategic pivots and new coding tools to the acceleration of robotics startups, the real-world impact of AI in browsers, developments in reasoning models, and the latest AI policy updates. Guest host Michelle Lee, founder of Medra, joins Andrey to dissect what these developments mean for the field and the wider world.
Tools & Apps: OpenAI Codex, Gemini in Chrome, Cloud, and More
OpenAI Codex Upgraded with GPT-5
- [02:45] OpenAI has released GPT-5 Codex, designed specifically for coding, to keep pace with competitors such as Anthropic’s Claude Code.
- Available through CLI, IDE, and now powers Web Agent.
- Andrey: “If you go look on Reddit or Twitter, there’s a bit of sentiment of like, oh, I’ve switched to Codex now, it’s great, I’m trying it out. I’m not a convert yet, but I might become one.” [03:08]
- Michelle points out that, among devs, Anthropic’s Claude Code is still preferred, but this upgrade could shake things up.
Gemini Integrated into Chrome
- [03:55] Google has finally embedded AI directly into Chrome with a Gemini assistant button—catching up to competitors’ agents.
- Allows asking questions about the current tab; potential for broader agentic tasks.
- Michelle: “This definitely gives them a huge competitive advantage that they own both the browser…and also now can integrate that directly with their AI.” [04:53]
Anthropic Claude Adds Powerful File Creation
- [06:14] Claude now creates spreadsheets and PDFs, stepping up competitive pressure in productivity integration.
Luma’s “Reasoning” Video Model
- [07:08] Luma released a new model claiming enhanced reasoning for complex video clips.
- Michelle questions the use of “reasoning” as a differentiator: “I highly doubt that all the other video models don’t use reasoning at all.” [07:58]
Applications & Business: Shifting Alliances & Big Robotics Bets
OpenAI–Microsoft: Relationship Redefined
- [08:33] After months of negotiation, OpenAI has Microsoft’s blessing for its for-profit transition via a Memorandum of Understanding.
- Andrey: “If they don’t complete this transition, they’re in real trouble.” [09:58]
- Simultaneously, Microsoft begins integrating Anthropic models and diversifying partnerships; OpenAI partners with Oracle for cloud services.
Robotics Funding Frenzy
- Figure AI: $1B in Series C funding at $39B post-money, despite being pre-revenue. Shows continued VC faith in humanoid robotics R&D. [12:22]
- Michelle: “It’s very exciting to see…more and more efforts into hardware which has been very much a bottleneck right now in robotics.” [12:55]
- Unitree (China): Plans IPO at $7B valuation; notable for affordable humanoid robots (~$16k), pushing accessibility in research labs. [13:52]
- Michelle: “Most labs…can very easily afford their own humanoid. Which again was just not true several years ago.” [14:48]
Robotaxi: Race Intensifies
- Tesla: Begins testing in Nevada but faces scrutiny after accidents in Austin; NHTSA investigates crash reporting. [15:31]
- Michelle stresses the importance of transparency, referencing Cruise’s PR debacle: “I hope Tesla is able to be honest and report the accidents correctly so that they can continue building that trust.” [16:47]
- Zoox (Amazon): Launches US public Robotaxi rides in Las Vegas with highly distinctive shuttles. [17:49]
AI Dev Tool Growth
- Replit: $3B valuation, leap from $2.8M to $150M ARR in one year. [20:24]
- Perplexity: Now valued at $20B; ARR jumped from $150M to $200M in a month. [21:15]
- Michelle: “We’re definitely seeing a lot of AI tools now being able to go to 100, 150, 200 mil revenue in a very short amount of time.” [20:51]
Open Source & Benchmarks: Rise of Efficient Reasoning
K2-Think: Parameter-Efficient Reasoning Model
- [22:08] Open source from the Institute of Foundation Models (UAE). Performs well on math/reasoning benchmarks at just 32B parameters, using advanced prompt engineering and inference tricks.
- Michelle: “It’s not just more parameters, it’s actually thinking about…prompt restructuring…surfacing the best ideas come up as now one of the best ways to improve reasoning.” [23:43]
LocoBench: Realistic Long-Context Code Benchmark
- [24:32] New benchmark assesses software engineering excellence across long-context tasks (up to 1M+ tokens), simulating real world complexity.
- Measures architectural coherence, dependency traversal, solution elegance, etc.
- Michelle: “That’s always so important because when the benchmarks aren’t realistic, we end up building what we can measure.” [26:37]
Research & Advancements: Building Generalist and Embodied Foundation Models
Self-Improving Embodied Foundation Models
- [28:18] DeepMind & Generalist align to create robotics models continually improved by real-world feedback (not just simulation).
- Michelle: “You want to do imitation learning and behavior cloning…now you can just predict the reward function and detect the success and use that to supervise.” [30:23]
Physics Foundation Models
- [31:47] Google’s G Phi T learns from large diverse simulation data (1.8TB) to solve a range of physics problems.
- Michelle: Curious if real-world generalization holds, since most training is on simulated data. [33:44]
Embodied Navigation Foundation Model (NAVFOM)
- [33:55] Generalizes across wheels, legs/humanoids, quadrupeds—and tasks—but current 64.4% performance shows real-world use is not there yet.
- Michelle: “I have to be honest, I feel like this is just like publishing for the sake of publishing a foundation model.” [36:07]
Policy & Safety: California Legislation and AI Copyright Wars
California’s AI Safety Bill SB 53
- [37:04] Anthropic endorses the bill, which targets advanced model risks (bio, cyber). Seen as a more balanced approach after earlier regulatory attempts were vetoed.
- Andrey reads from the blog: “The question isn’t whether we have AI governance, it’s whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former.” [39:35]
- Michelle: Surprised that Anthropic pushes regulatory approaches that increase compliance for the sector. [39:11]
Copyright Lawsuits Escalate
- Warner Bros v. Midjourney: Over alleged removal of IP filters and generation of protected images. Disney and Universal also join lawsuits. [41:10]
- Michelle: “It does seem like Midjourney…doesn’t really have as many safeguards against intellectual property violations.” [41:10]
- Rolling Stone v. Google: Over AI-generated search summaries (“AI Overviews”) cannibalizing publisher traffic and revenue.
- Michelle: “If you actually ask Gemini if AI overviews result in less traffic, it…says yes, it does actually reduce throughs.” [43:20]
- Andrey: “Publishers are in a tough spot here…Google is cannibalizing on that business, on the clicks.” [43:37]
Notable Quotes & Moments
-
On Coding Models:
“Cloud Code is still the best tool out there.”—Michelle [03:20] -
On Humanoid Robotics:
“Most labs…can very easily afford their own humanoid. Which again was just not true several years ago.”—Michelle [14:48] -
On Self-Improving Robots:
“This is almost like simplified reinforcement learning without needing to do RL fully.”—Michelle [30:23] -
On Legislation:
“The question isn’t whether we have AI governance, it’s whether we develop it thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former.”—Andrey (quoting Anthropic blog) [39:35]
Episode Highlights: Timestamps
- [02:45] OpenAI Codex with GPT-5
- [03:55] Gemini in Chrome
- [06:14] Claude’s new file features
- [07:08] Luma’s reasoning video model
- [08:33] OpenAI-Microsoft partnership update
- [12:22] Figure AI funding
- [13:52] Unitree going public
- [15:31] Tesla Robotaxi incidents
- [17:49] Amazon’s Zoox launches in Las Vegas
- [20:24] Replit’s $3B valuation
- [22:08] K2-Think reasoning model
- [24:32] LocoBench benchmark released
- [28:18] Self-improving embodied models (DeepMind)
- [31:47] Physics foundation model
- [33:55] Navigation foundation model/NAVFOM
- [37:04] California’s SB 53 and AI safety
- [41:10] Warner Bros lawsuit v. Midjourney
- [43:20] Rolling Stone lawsuit v. Google
Tone & Takeaways
This episode, marked by rapid-fire headlines and lucid commentary, highlights how the AI sector is simultaneously consolidating (giant models, big fundraises) and fragmenting (browser AI, robotics hardware, regulatory strategies). There’s optimism around hardware progress and applied tools, but a cautious note on business friction, legal battles, and emerging safety regulation. If you want to keep your finger on the pulse of AI’s fast-moving story, this episode offers concise, candid, and often humorous insight from two field insiders.
