Last Week in AI – Episode #209
Date: May 19, 2025
Hosts: Andrei Karpathy & Jeremy Harris
Main Theme:
This episode breaks down two weeks of significant AI news and developments, focusing on recent moves by OpenAI to retain its nonprofit structure, major developments in hardware (chips and suppliers), new tools and models in AI, cutting-edge research in reasoning and scientific discovery, and pivotal policy changes, especially in U.S. export controls. The hosts also discuss the field’s evolving dynamics among AI labs, investors, international policies, and the latest open-source releases.
1. OpenAI's Nonprofit Saga and Corporate Restructuring
Timestamps: [01:28] to [17:39]
Key Points:
-
OpenAI Will Not Transition to Full For-Profit Structure:
- After litigation (notably initiated by Elon Musk) and discussions with the Attorneys General of Delaware and California, OpenAI has decided to keep its nonprofit entity at the top of its structure. The for-profit subsidiary will become a Delaware Public Benefit Corporation (PBC), similar to Anthropic and xAI.
- Quote:
“The significance of that attorneys General piece is actually quite significant. ... It’s sort of sketchy to take a nonprofit, raise a crap ton of money ... and then having benefited from their research ... now turning yourself around and becoming a for profit.”
— Andrei Karpathy [06:01]
-
Implications and Remaining Controversies:
- A Delaware PBC “gives you more freedom”; it allows consideration of impacts beyond shareholders but does not force them.
- Concerns remain about the nonprofit board’s actual oversight given recent history (Sam Altman’s firing and return).
- “Very, very unclear whether the board meaningful can exert control, whether ... Sam has undue influence over them or whether they’re getting access to the information they need.”
— Karpathy [09:25]
-
Investor Dynamics:
- Microsoft, a major early investor, is renegotiating its long-term access and ownership of OpenAI tech beyond 2030, while SoftBank has become the largest investor by dollar amount.
- The tangled relationship between Microsoft, OpenAI, and SoftBank could influence future restructurings and IP agreements, especially as OpenAI moves toward possible IPO-like options.
- “There’s a real question on how much ownership will Microsoft get. ... It’s a lot of sand in the gears right now for OpenAI.”
— Harris [13:04]
2. Hardware & Chips: TSMC, Nvidia, and Global Supply
Timestamps: [18:05] to [26:39]
Key Points:
-
TSMC’s 2nm Process Exceeds Expectations:
- The cutting-edge 2nm chip node is seeing unprecedented demand from Apple, Nvidia, AMD, and is reaching high yields rapidly.
- AI (Nvidia) is now directly competing with iPhone production for TSMC’s advanced manufacturing slots.
- “There’s so much money to be made on the AI ... that money is now ... competing successfully with the iPhone to get capacity at the leading node at TSMC.”
— Karpathy [21:19]
-
Nvidia Setting Up Overseas HQ in Taiwan:
- Strengthening its partnership with TSMC, Nvidia is set to place its global HQ in Taiwan, weighing the benefits (chip access) against risks (China-Taiwan tensions).
-
CoreWeave Raising Funds Amidst IPO Challenges:
- CoreWeave, a major AI cloud compute provider, is turning to debt after a less-than-expected IPO raise due to trade/tariff policy uncertainty, showing the financial risk and volatility for new compute providers in AI.
3. Tools, Apps, & AI Model Updates
Timestamps: [26:39] to [47:42]
Discussion Highlights:
-
Grok Chatbot Controversy on X (formerly Twitter):
- Grok, xAI’s chatbot, started including unrelated, politically charged narratives ('white genocide' in South Africa) in its responses, traced to a manipulation in its system prompt.
- XAI promises to start publishing Grok’s system prompts for transparency.
- Quote:
“One thing I’ve seen called out too is this idea that ... more transparency on the system prompt seems like a really good thing.”
— Karpathy [30:24]
-
Figma Launches AI-Powered Design & Prototyping Tools:
- Figma Sites, Make, and Buzz allow users to generate full sites, prototypes, and marketing assets via AI. Reflects the trend of every software provider expanding up and down the stack.
- “Every company [is] becoming the everything company ... Figma’s being essentially forced to move into deeper part of the stack ... as AI capabilities make it so much easier.”
— Karpathy [34:45]
-
Google Brings Gemini to Android Auto & Upgrades Gemini 2.5 Pro:
- Moves towards always-on, voice-based AI assistants in cars and improved coding ability through new Gemini model updates.
-
The Rise of ‘Vibe Coding’:
- Coding and app prototyping by iteratively prompting LLMs, even for non-programmers.
- “If you’ve never done it yourself, definitely give it a shot ... basically keep telling the model: no, fix this, fix this, do it better ... it can actually work really well.”
— Karpathy [43:25]
-
Hugging Face Launches Open Agentic AI Tool:
- Open Computer Agent is a free, cloud-hosted tool for browser and computer automation, using open-source language models.
4. Projects & Open Source
Timestamps: [47:42] to [65:41]
Key Releases:
-
Stability AI’s Stable Audio Open Small:
- Text-to-audio model runnable on mobile—still limited in vocal/song quality, but illustrates local on-device generative audio progress.
-
Flight: OpenAI Image Generator Trained Only on Licensed Data:
- Freepik and FAI launch a 10B parameter model, offering an alternative to models trained on copyright-questionable data.
-
AM Thinking V1 – Reasoning Model from China’s “Beike” (a real estate giant):
- Outperforms others at 32B param scale, showing China’s rapid advances and unique players entering the AGI field.
-
Blip3o: Unified Multimodal Model (Images + Text, Both Directions):
- Fully open-source, supports input/output of images and text, sharing code, models, and data—likely the top open-source model of its kind.
5. Research & Scientific Advances
Timestamps: [65:41] to [88:00]
Key Papers & Insights:
-
DeepMind’s AlphaEvolve: Automated Algorithm & Code Discovery
- Gemini-powered agent that evolves full files of code or algorithms for scientific problems, building on last year’s FunSearch.
- Tackles problems like autocorrelation, uncertainty inequalities, math packing, yields real improvements, and even sped up Gemini kernel by 1%.
- Commentary:
“This is a way that DeepMind does tend to kind of reach beyond the immediate, the ostensible frontier of what just base models and agentic models can do.”
— Karpathy [68:31]
-
Absolute Zero: Reinforced Self-Play Reasoning with Zero Data:
- Models generate, solve, and verify their own challenges (deduction, abduction, induction) to self-train for reasoning without needing external data—potential solution to the ‘data wall’ of RL-driven reasoning models.
-
Epoch AI Report: How Far Can Reasoning Models Scale?
- Details the fast-scaling compute requirements for thinking models, predicting they’ll soon hit hardware limits for RL stages.
- Quote:
“We keep seeing in these compute scaling curves for inference time scaling that you really do want to scale it along with your pre training computer budget.”
— Karpathy [81:16]
-
OpenAI’s HealthBench: Large-Scale Health QA Benchmark
- Open-sourced, multi-turn health conversations for LLMs, with input from 262 doctors.
- GPT-4.1 outperformed unassisted physicians on the benchmark, but caveats remain regarding real-world relevance and potential biases.
- “That is wild. That is a four times higher score than the unassisted physician. That honestly like kind of blows my mind a little bit.”
— Karpathy [91:45]
6. Policy & Safety
Timestamps: [88:00] to [113:07]
U.S. Policy Changes:
-
Trump Administration Rescinds Biden’s AI Diffusion Rule:
- The export control framework for AI chips is dropped for a nation-to-nation negotiation strategy—responds to workarounds by targeted countries (exploiting hardware order thresholds).
- “Hopefully that’s something that’ll be addressed in this whole kind of next round of things.”
— Karpathy [95:31]
-
Massive AI Deals in the Middle East:
- Nvidia, AMD, and others to benefit as new administration seeks closer AI partnerships with Saudi Arabia and UAE, aimed at securing energy supply and compute infrastructure for U.S. companies.
Safety & Oversight:
-
Scaling Laws for Scalable Oversight:
- Theoretical & empirical work on getting AI models to judge and supervise stronger AI and how this ‘scaling oversight’ could or might not work.
- “There are some pretty fundamental problems with this whole approach... new capabilities ... like deceptive alignment ... can emerge pretty suddenly and violate these scaling curves.”
— Karpathy [106:03]
-
OpenAI's Safety Evaluations Hub:
- OpenAI launches a public page detailing safety benchmarks on harmful content, jailbreaks, and hallucinations for all major model versions.
Memorable Moments & Quotes
-
On OpenAI’s structure:
“It’s touted as a win for basic principle ... but there’s a slippery slope ... Can the nonprofit board meaningfully oversee Sam? We saw a catastrophic failure of that in the board debacle.”
— Karpathy [08:55] -
On Grok’s system prompt leak:
“Grok for many different examples of just random questions... would reply regarding ‘white genocide’ in South Africa… even if a query is unrelated, which I suspect is the issue here. Weird.”
— Harris [26:39] -
On the open-source AI model ecosystem:
“Anytime I try to explain it, it ends up sounding just like a pyramid scheme. ... At some point there’s a pot of gold at the end. Don’t worry about it.”
— Karpathy [53:37] -
On AlphaEvolve paradigm:
“OpenAI has this: they’re super scale-pilled ... DeepMind comes at things like, ‘let’s almost replicate the brain in a way, in different chunks.’”
— Karpathy [68:31] -
On Reasoning Models Scaling:
“You can only do that so many times until you hit essentially the ceiling of what current hardware can allow.”
— Karpathy [81:16] -
On AI in Healthcare:
“That is wild. That is a four times higher score than the unassisted physician. That honestly like kind of blows my mind a little bit.”
— Karpathy [91:45]
Episode Timeline (selected highlights)
- [01:28] – OpenAI sticks with nonprofit governance.
- [13:23] – Microsoft’s evolving relationship with OpenAI.
- [18:05] – TSMC’s 2nm node demand, Nvidia’s Taiwan HQ.
- [26:39] – Grok chatbot prompt manipulation fiasco.
- [34:45] – Figma launches advanced AI features.
- [40:21] – Vibe coding and the rise of code-by-AI.
- [47:42] – Stability AI, Freepik Flight, new reasoning models.
- [65:41] – DeepMind AlphaEvolve released.
- [81:16] – Epoch AI’s reasoning models scaling report.
- [88:00] – OpenAI’s HealthBench.
- [95:31] – US export control law shifts.
- [97:08] – Middle East AI compute deals.
- [106:03] – Scaling Laws for Scalable Oversight.
- [109:43] – OpenAI’s Safety Evaluations Hub.
Closing
The hosts reflect on how much AI field progress is tied not just to technical breakthroughs, but corporate maneuvering, global negotiations over chips and compute, and the mounting importance of both open benchmarks and synthetic data in the new RL-driven reasoning paradigm. The show ends with their signature rap outro, capturing the constant change and excitement at the intersection of research, industry, and policy.
For further links, code, and deep dives referenced, see the episode description.
