Last Week in AI – Episode 227: Jeremie is Back! DeepSeek 3.2, TPUs, Nested Learning
Date: December 9, 2025
Hosts: Andrei Karlenkov & Jeremy
Theme: In-depth discussion of the latest AI news, including major model releases (DeepSeek 3.2), hardware shake-ups (TPUs, Amazon chips), business moves (OpenAI, Anthropic), and advances in memory and agent research.
1. Episode Overview
This week's episode marks Jeremy’s return to the podcast after a three-month hiatus, with both hosts diving headfirst into a busy week in AI. The show centers on new model releases (DeepSeek 3.2, Flux 2), hardware battles beyond Nvidia, startups, industry shifts, and emerging research directions like memory and multi-agent learning.
2. Key Discussion Points & Insights
A. DeepSeek 3.2: State-of-the-Art Open Source LLM
- Major Release: DeepSeek 3.2, a new open source LLM, rivaling top proprietary models like GPT-5, Gemini, and Claude.
- Biggest Advances:
- Cost: 50% cheaper than comparable models; high benchmark performance, neck-and-neck or surpassing GPT-5 in areas.
- Sparse Attention: DeepSeek introduces a novel sparse attention mechanism, drastically improving efficiency by only attending to the ~2,000 most relevant tokens in any input, speeding up processing without sacrificing accuracy. ([09:55])
- Reinforcement Learning at Scale: 10% of total compute devoted to RL, with a large-scale, stable training process incorporating off-policy sequence masking and expert specialization.
- Mixed RL Training: Specialized fine-tuning avoids "catastrophic forgetting" and merges domains like code, agentic reasoning, and math in a single RL stage, distilled from specialist models for generalization.
- Limitations Discussed:
- Lags slightly in world knowledge due to fewer training flops.
- Lower context/token efficiency — model is verbose.
- Still not top-tier for complex tasks; future "R2" model is expected to target this.
"In some sense this is like actually attention... in a more human sense of like pick out information to process and then process only that information."
— Andrei, [09:55]
"Holy shit. Like, just look at the benchmarks."
— Jeremy, [14:21]
Note: Deepseek continues to drive the open-source ecosystem, with impressive transparency, and is considered a major competitive force against closed models.
Benchmarks and Model Recipe Discussion ([14:38], [18:46])
- Benchmarks: DeepSeek 3.2 and its "thinking" and "special" variants are close to or beating Gemini, Claude, Sonnet, and GPT-5 on tasks like MMLU, HLE, and coding.
- Frontier Model Competition: The model is built for agentic tool use—a shift seen across the field (for instance, in Anthropic's Claude "code" agents).
- Industry Context:
- Ongoing debate over whether "scaling" alone can solve deep reasoning or whether novel architecture/learning recipes (per Ilya Sutskever’s recent Dwarkesh podcast) are needed.
- DeepSeek and others making technical leaps—eg. sparse attention, sample-efficient RL, agentic task synthesis—highlight a move towards the "age of research".
B. Image and Video Generation: Flux 2, Throttling, New Players
Black Forest Labs – Flux 2 ([23:00])
- Release: Next-gen image generation and editing, strong open source play, with variants targeting speed, customization, and cost.
- Notable: While not at the absolute cutting edge (NanoBanana Pro holds that lead), Flux 2 is much cheaper and nearly as good, especially in human preference tests.
Throttling of Sora and NanoBanana Pro ([28:53])
- OpenAI (Sora) and Google (NanoBanana Pro) restrict free generations due to GPU demand.
- Sora: 6 videos/day for free users.
- Nanobanana Pro: Down to 2 images/day.
- These models' practical utility (e.g., meme/presentation creation) is enormous, driving surge usage.
"Our GPUs are melting."
— Bill Peebles/OpenAI via Jeremy, [28:53]
Mistral’s New Small Models ([29:34])
- Released 14B, 8B, and 3B parameter models (base, instruct, reasoning variants)—optimized for open-source devs and edge use.
- Mistral’s focus: Sweet spot parameter size for local or single-GPU deployment; not chasing the largest "frontier" models.
Video Models: Cling AI and Runway Gen 4.5 ([31:41], [34:07])
- Cling AI: All-in-one video generation and editing—can both generate and edit video (e.g., changing weather, swapping characters).
- Runway Gen 4.5: Claims top leaderboard spots, besting Google and OpenAI in some video benchmarks; excels in high-res, physics, and motion.
"On the artificial analysis text to video leaderboard, it is still number one as of now... within the 95% confidence interval."
— Jeremy, [35:00]
C. Hardware Shifts: TPUs, Amazon’s Trainium, and Industry Moves
Google TPUs Gain Traction ([35:18])
- Foxconn Orders: Nvidia partners (Foxconn) are now embracing Google's TPUs, signifying real industry shift.
- Strategic Factor: Google’s internal use of TPUs gives them a 10x pricing advantage (using at cost, no markup). As TPUs become more widely available, the AI compute/infra landscape could shift dramatically.
"$1 of Google DeepMind compute translates into 10 times the amount of compute that OpenAI does... That’s a really, really huge deal."
— Jeremy, [36:24]
- Power/Efficiency: TPUs are more energy-efficient at scale; critical as power becomes the limiting factor.
Amazon Trainium 3, 4 Announcements ([40:38])
- Amazon’s AI chips (Trainium) for training, Inferentia for inference;
- Partnered closely with Anthropic (Claude uses these chips). Next-gen chips (Trainium 4) will interoperate with Nvidia GPUs.
- Trainium 3 Ultra: 40% improvement in energy efficiency, 4x memory boost.
OpenAI’s “Code Red,” Anthropic IPO, and Business Raises ([43:03], [47:17])
- OpenAI: "Code red" triggered by Gemini's market progress; pausing new products to focus on ChatGPT quality.
- Internal moves to accelerate iteration, transfer teams, and daily scrum calls.
- Losing share to Gemini and Anthropic (which dominates enterprise).
- Anthropic: Prepping for massive IPO; reportedly reached a $1B annualized run rate for its Cloud Code product. Unique governance as a Public Benefit Corporation.
- Black Forest Labs: Raised $300M for Flux 2, at $3.25B valuation (major open source raise).
- Gradium (Voice AI): Surfaces with gigantic $70M seed—signal of VC hunger in audio/voice.
Infrastructure Buildouts ([50:47])
- Stargate Cluster: OpenAI’s 1 GW Abu Dhabi cluster has begun construction; delays vs. Anthropic/Amazon, who are building at record speed.
Strategic Partnerships & Acquisitions ([53:22])
- OpenAI's investment in Thrive Holdings: Sends staff into portfolio companies to drive AI adoption (seen as a "circular" but value-creating move).
- OpenAI Acquires Neptune.ai: Model training diagnostics (monitoring/debugging)—insider tooling valued for high-impact, low user-count.
- Anthropic Acquires Bun: Signals focus on AI-first coding environments and deep cloud code investments.
- Microsoft Halves AI Agent Sales Targets: Indicates slow adoption or over-ambitious forecasts for Copilot/Agent sales.
D. Open Source & Research Front
DeepSeek Math v2 ([57:51])
- Domain-specialized LLM, excels at math, achieves 118/120 on Putnam 2024 and gold-level IMO 2025/CMO 2024 results.
- Innovative Training: Generator–Verifier–MetaVerifier feedback loop; trains a "verifier" with human-scored meta-verifier for robust performance.
- Impressive milestone: Undergrad math competition scores exceeding top human solvers.
Evolving Memory and “Nested Learning” ([61:09], [65:44], [69:37])
- Google’s EvoMemory Benchmark
- Evaluates how LLMs/agents can adapt by storing/retrieving episodic memory during inference, not just in training. Emergent need as agent research grows.
- REMEM: Explicit memory pruning, refinement, retrieval; early but promising results.
- DeepMind's Nested Learning Architecture
- Major conceptual step: Embeds variable update frequency (different "layers" of memory/learning, inspired by human cognition) directly within model architecture.
- “Continuum Memory System”: Stacks MLPs with variable update windows (short-term, medium, long-term) for continual learning.
"The core architecture is frozen in time; the attention mechanism is just frantically updating all the time. There’s no middle ground..."
— Jeremy, [69:37]
Multi-Agent RL / Credit Assignment ([74:19])
- New techniques for assigning reward/credit in multi-agent LLM systems—enabling coordinated training, better orchestrating complex tool-using LLMs.
OpenRouter Token Usage Report ([75:49])
- Observed Usage Trends:
- Chinese open source models’ share jumped from 1% to 30% over 2025.
- Mid-size models (15B–70B) are most popular (balance of performance/accessibility).
- Role play/storytelling now >50% of open source use; programming/coding also booming.
- Prompt sizes (input length) quadrupled, mostly driven by coding tasks.
- Market is quality-driven, not price-driven—cutting token costs doesn’t boost usage much.
E. Policy & Safety
US Genesis Mission Executive Order ([81:51])
- Trump administration launches a "Manhattan Project"-style AI R&D initiative: expand compute access, pool government datasets, focus on scientific/military applications.
- Will draw from Dept. of Energy scientists, aiming for a national AI cloud.
"The Manhattan Project framing is interesting."
— Jeremy, [82:55]
OpenAI: “Confession Training” Increases Model Transparency ([84:22])
- New research trains LLMs to "confess" if they've misbehaved (lied/cheated), using an LLM judge reward framework.
- Surprisingly effective: GPT-5 Thinking confessed in 11/12 test sets.
- Limitations: Monitoring tool, not preventative, and assumes honesty remains the easiest path as models scale.
US Moves to Block Nvidia Chip Sales to China ([89:21])
- Senate’s proposed SAFE Act would block advanced chip exports to China/Russia for 30+ months—ongoing legislative vs. executive wrangling over AI hardware strategy.
3. Notable Quotes & Moments
- "I hope everybody is in for a lot of like dumb questions from me because that'll be a big part of the show." — Jeremy, [01:53]
- "It's like the engineers at Nvidia are working overtime, you can be sure of that." — Andrei, [39:06]
- "Zooming out. Holy shit. Like, just look at the benchmarks." — Jeremy, [14:21]
- "It's sort of like a human being reading a book… Except also remember that page one is really forgettable, but you need to remember page one — but just know that it's super forgettable. What they're doing here is saying, ah, fuck page one. Throw it out completely." — Jeremy on DeepSeek’s sparse attention, [10:16]
- "If you want to hear it from someone who knows infinitely more than us, just listen to the Dwarkesh podcast with Ilya Sutskever…" — Jeremy, [06:18]
- "Role playing is a really big deal… apparently over 50% of open source model usage is creative role play and storytelling." — Jeremy, [77:17]
- "The core architecture is frozen in time; the attention mechanism is just frantically updating all the time. There's no middle ground... That is exactly the kind of de facto memory that an RNN might use." — Jeremy, [69:37]
- "OpenAI is going to send employees and product teams to work with Thrive's companies. So cool. Apparently if that succeeds, then OpenAI stake will grow somehow and they'll get compensated for their services." — Jeremy, [54:07]
4. Timestamps for Important Segments
- [02:33] DeepSeek 3.2 Overview
- [09:55] Sparse Attention & Technical Deep Dive
- [14:38] Benchmarks and Model Comparisons
- [18:46] Mixed RL, Tool Use, Limitations
- [23:00] Flux 2 Image Model Release
- [28:53] Sora/NanoBanana Throttling
- [31:41] Mistral New Small Models
- [34:07] Cling AI, Runway Gen 4.5 Video Models
- [35:18] Google TPUs Power Shift, Business Implications
- [40:38] Amazon Trainium Chips
- [43:03] OpenAI Code Red, Product Focus
- [47:17] Anthropic IPO, Business Funding
- [50:47] Stargate 1GW AI Cluster Buildouts
- [53:22] OpenAI/Thrive Holdings Circular Deal
- [57:51] DeepSeek Math v2
- [61:09] Evolving Memory: Google’s EvoMemory
- [69:37] DeepMind "Nested Learning" Paper
- [74:19] Multi-Agent RL/Credit Assignment
- [75:49] OpenRouter 100 Trillion Token Study
- [81:51] Genesis Mission Federal AI Push
- [84:22] OpenAI "Confession Training" Research
- [89:21] Senate Act to Block Nvidia Chips to China
5. Takeaways / Big Picture
- Open Source Ascendant: DeepSeek and other open-source models are increasingly competitive—even approaching or surpassing closed models in benchmarks and affordability.
- Model Recipes Are Evolving: RL at scale, sparse attention, and niche expert training for reasoning, plus architectural innovations for memory and learning.
- Hardware Arms Race Accelerates: Google and Amazon are challenging Nvidia’s dominance; compute cost, energy, and infra partnerships are shaping the next wave.
- AI as Enterprise Platform: Battle for enterprise (Anthropic’s rapid growth, OpenAI’s “code red”) versus saturated consumer chatbot market; adoption in creative, coding, and even role play.
- Memory, Agents, and Learning: Research shifts to continual, contextual, and agentic memory—embedding different “speeds” of learning directly in model design.
- Geopolitical Stakes Rise: Export controls, national AI clouds, and legal wrangling over silicon mark the sector’s centrality to global strategy.
- Safety & Alignment: Transparency and confession-style approaches offer new ways to monitor models, though the "superalignment problem" persists.
This episode provides both high-level news and rich technical/deep dives—recommended for anyone wanting to quickly catch up on the state-of-the-art in both AI development and industry moves as of December 2025.
