Last Week in AI - Episode #230
Title: 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
Date: January 7, 2026
Hosts: Andrei Karenkov, Jeremy
Overview
This special episode opens 2026 with a comprehensive recap of key developments in AI throughout 2025, followed by a discussion of recent news—including the major Nvidia-Groq deal, advances in open source coding models, new benchmarks, and policy and safety updates. With one host returning after a several-month hiatus, the episode features both reflection and catch-up, bringing listeners up to speed on major commercial, technical, and policy trends in AI.
2025 in Retrospect: Key Trends and Developments
(00:11–24:06)
Major Themes in 2025
-
Reasoning Models, Agentic AI, and Vibe Coding
- Significant progress in models that can execute multi-step reasoning and act as agents.
- Vibe coding and advanced terminal tools became widespread, building on releases like Claude Code, Gemini CLI, and Codex.
-
World Models and Image Editing
- World modeling (models learning representations of reality) saw a surge in research.
- Image editing apps, especially Nana Banana and Nana Banana Pro, were a top highlight.
-
AI Hardware and Data Centers
- 2025 was marked by a "year of data centers," with massive spending and buildout.
- The ROI of super-scale experiments is now a major question for labs and investors.
- Expansion of the HBM (High Bandwidth Memory) market, with Micron emerging as a challenger to SK Hynix and Samsung.
-
Rise of Open Source and Smaller Models
- Open source AI models, especially from China (e.g., Deepseek, GLM), now perform nearly at the level of proprietary frontier models.
- Ongoing advances in on-device models, with expectations they'll soon be viable for phones and laptops.
-
Distributed Training:
- While anticipated at the year’s start, distributed training (think: Together AI) did not break through in a transformative way.
Notable Quotes
- “Kind of boiled down to reasoning models and vibe coding. If you really want to compress it.” — Andrei (01:25)
- “Interpretability agenda... we’ve seen a sort of decreased appetite from all the major labs on using mechanistic interpretability for superalignment.” — Jeremy (03:35)
- “It’s no longer clear that you can keep pushing the envelope in pure dollar terms. Like we’re at the point where we’re tapping out sovereign wealth funds, the Saudis, the Emiratis... all the chips are on the table and we have to start seeing ROI for this kind of scale.” — Jeremy (04:45)
- “World models is on the rise. Everyone’s into world models now. Taxis continue to expand rapidly.” — Andrei (02:10)
Alignment and Interpretability
- The 'mechanistic interpretability' agenda has lost steam, with major labs shifting strategies.
- Anthropic’s “Tracing the Thoughts of a Large Language Model” was a high-water mark but also revealed challenges in making interpretability practical and generalizable.
- “Emergent misalignment” and persona vectors became focal research areas, improving understanding of LLM misbehaviors.
- Activation steering and vector inspection techniques have now broken into the mainstream, moving beyond niche alignment circles.
Notable Quotes
- “I think the big kind of research finding of a year for me has been Persona vectors and the Valorigi effect and emergent misalignment as a whole.” — Andrei (10:24)
- “That paper represented a kind of high watermark for the mechanistic Inter-pretability program.” — Jeremy (14:40)
Hardware and Infrastructure Shifts
- HBM Market Shakeup: Micron surpassed Samsung in high-bandwidth memory market share, with its products now favored for energy efficiency.
- Nvidia-Groq Acquisition: Closed out 2025 with Nvidia acquiring (functionally, if not legally) Groq, marking a major move to dominate the inference chip market.
- Corporate Maneuvers: 'Acquihire' strategies are now routine to avoid regulatory scrutiny (Microsoft-Inflection style deals).
Notable Quotes
- “HBM is becoming like the blocker to our ability to build more chips faster.” — Jeremy (06:06)
- “The design story... no longer the case. Like Samsung now is, is producing a lot of hbm. Like they’re looking competitive.” — Jeremy (03:35)
Geopolitics & Security
- Realization of supply chain dependencies (notably Chinese manufacturing) has heightened focus on data center and IP security.
- US-China competition increasingly shaped by differences in energy and manufacturing constraints; China is willing to use less efficient chips at scale.
- Enhanced interest in national security and handling "high-risk" models, with initial moves by OpenAI and Anthropic to bolster security postures.
Looking Ahead to 2026
- Will the transformer architecture finally be superseded or hybridized?
- Is true automated AI research on the horizon (as measured by ‘METR’-style task evaluations)?
- Open question: Will we see "unjagged intelligence”—models with common sense and robust continual learning?
Notable Quotes
- “2026, this is for real. All the scaling trends are adding up to actual automated AI research and this is the year it’s going to happen.” — Jeremy (19:48)
- “Can we make the intelligence unjagged in the right ways, in the right areas, to achieve automated AI research?” — Jeremy (22:22)
- “The big question for me for 2026 is will we move beyond the transformer architecture fully? Will we move to hybrid models?... And Nvidia just released some impressive results at scale.” — Andrei (17:03)
Key News Stories (Dec 2025–Jan 2026)
1. OpenAI Bets Big on Audio
(24:06–26:38)
- OpenAI consolidating teams to focus on audio models, anticipating voice-first devices.
- Jony Ive’s design firm (IO) was acquired to lead hardware efforts, aiming to “reduce device addiction” via audio-first approaches.
2. Nvidia Buys Groq (The Acquihire Playbook in Action)
(26:38–33:26)
- Nvidia takes over Groq’s team and licenses their innovative inference technology (SRAM-heavy LPUs), critical for next-gen 'Vera Rubin' architecture.
- The deal structure dodges antitrust scrutiny (‘not-a-real-acquisition’) but effectively eliminates a top Nvidia competitor.
- Groq’s all-internal memory approach brings massive internal bandwidth, with major thermal and speed benefits for inference, but limited memory per chip.
- This acquisition is essential for Nvidia as inference workloads—supporting agentic AI—now eclipse training in commercial significance.
Notable Quotes
- “[Nvidia is] trying to take Groq off the table because Groq was the company—if anyone had a shot at competing with Nvidia in that crucial inference segment, which already makes up 40% of Nvidia’s revenues, it’s only going to grow more and more.” — Jeremy (30:30)
- “Think of this as an acquisition, but our lawyers tell us we can’t say it’s an acquisition.” — Jeremy (28:16)
3. Meta Acquires Manus for AI Agents—Talent and Subscriptions
(33:26–37:30)
- Meta's $2B+ acquisition of Manus aims to bolster agentic AI development and brings in 100+ researchers to Meta’s "superintelligence" team.
- For the first time, Meta gains a recurring-revenue subscription model for AI tools.
4. Cursor Acquires Graphite for Coding AI Stack
(37:30–39:15)
- Cursor (AI coding IDE) buys Graphite (AI code review/debugging) to build a more vertically-integrated coding assistant, aiming to keep pace with agentic CLI tools.
5. Micron Rises in the HBM Market
(39:15–41:48)
- Once a distant third, Micron now claims 20% share in the key HBM niche, leapfrogging Samsung due to better yields and power efficiency.
- “Micron’s HBM3e apparently uses 30% less power, which has just become a massive deal because obviously energy is at such a premium now.” — Jeremy (41:15)
6. China’s Chipmaking Workarounds
(41:48–46:08)
- Chinese fabs upgrade older ASML machines and use unofficial engineering to reach smaller nodes.
- Chinese design and fabrication is now credibly competing two generations behind the West, with a different energy vs. chip efficiency tradeoff.
Key AI Research, Benchmarks & Model News
GLM 4.7: New Chinese Open Source Coding Model
(47:52–49:47)
- Jerpu AI released GLM 4.7, now highly competitive on coding benchmarks (often close to or beating proprietary models for basic tasks).
- Open Source continues its rapid catch-up, particularly in China.
New Benchmarks & Evaluation
(51:18–58:41)
- Frontier Science: A new, harder benchmark for evaluating scientific/AI research capabilities, crafted with PhDs and Olympiad winners.
- GPT-5.2 tops at 77% for Olympiad tasks, but only 25% on the hardest research tasks, suggesting substantial room for growth.
Notable Quotes
- "The number of times we've had this conversation where it's like, okay, so we need a benchmark that is really, really hard and it's the hardest graduate level PhD research, fucking whatever... we keep going through the same ritual." — Jeremy (51:18)
Causal Reasoning from LLMs: Democritus
(55:12–57:34)
- Novel paper uses a “geometric transformer” layer to automatically construct causal graphs from LLM outputs, helping map fuzzy neural "reasoning" into explicit structures.
Representation Convergence in Scientific Models
(57:34–61:42)
- Noteworthy finding: as scientific AI models improve, they converge in how they internally represent complex materials / molecules, possibly reflecting deeper truths about the data's structure.
Meta RL: More Efficient Agentic Learning
(61:42–67:15)
- New frameworks (e.g., LAMR) introduce ‘meta reinforcement learning’ to enable language agents to improve their learning process across episodes, leveraging in-context policy adaptation.
- "A really interesting, very simple approach. There’s all kinds of benefits to this that they measure, including these agents having very high trajectory diversity..." — Jeremy (63:07)
Costs of AI Agent Tasks and METR Evaluations
(67:15–76:19)
- Blog analyses show that as agentic models tackle longer tasks, their costs rise faster than task length—and may outstrip equivalent human labor, at least for now.
- Latest METR data: Claude Opus 4.5 can complete 5-hour software tasks with 50% success, but consistency at high success rates (80%) still lags, and dataset is too small for strong claims.
- “Useful to look at the METR horizon plot, but people are kind of freaking out way too much given how noisy it is..." — Andrei (76:19)
Policy & Safety
New York State’s RAISE Act
(77:53–80:40)
- NY passes the second major US state law regulating AI safety; requires big AI developers to disclose protocols and report incidents.
- Public support from OpenAI and Anthropic is somewhat ambiguous—political alliances are murky and may diverge in private lobbying.
Notable Quotes
- “Trying to do the, you know, Pepe Silvia conspiracy meme chart and kind of figure out who supports what from this web of connections is really, really hard.” — Jeremy (78:19)
Activation Oracles: A New Tool for Interpretability
(80:40–86:05)
- Introducing “activation oracles”—models that can ‘read’ the internal activations of another model to infer intent or internal state, a potential step forward for model monitoring.
- Early results (e.g., extracting a secret word model is ‘thinking’ but refuses to say) are promising but remain proof of concept.
Monitoring and Monitorability in Model Regulation
(86:05–91:09)
- OpenAI research introduces “G mean squared” metric to quantify the ease of monitoring a model for misbehavior.
- Finding: smaller models with long chains of thought are easier to monitor than larger models, though more expensive per inference.
OpenAI’s Preparedness Effort
(92:44–93:37)
- OpenAI is hiring a “head of preparedness” to coordinate safety evaluations and mitigations, including risks beyond existential threats (e.g., mental health, cybersecurity).
AI Content Risks: Grok and Deepfake Image Generation
(93:37–95:53)
- Grok (X’s public LLM) can be prompted via tags to create NSFW deepfakes, which briefly flooded its media feed. X later wiped the feed, but the episode illustrates the challenges of model deployment and real-world safety risks.
Memorable Moments
- “I checked it myself. If you go to the media tab, it’s pretty full of people just saying, ‘put this girl in a bikini’, and [Grok is] happily obliging. And, oh, these are not like modest pictures. These are serious edits that Grok was executing.” — Andrei (93:37)
- “I've seen President Trump in a bikini and I wish I had not. But it did happen because of Grok.” — Andrei (95:53)
Timestamps for Key Segments
- 2025 Recap Themes & Trends: 00:11–24:06
- Nvidia-Groq Acquisition: 26:38–33:26
- Meta Buys Manus: 33:26–37:30
- Micron HBM Market Shift: 39:15–41:48
- China’s Chipmaking Strategies: 41:48–46:08
- GLM 4.7 (Open Source Coding): 47:52–49:47
- Frontier Science Benchmark: 51:18–54:05
- Causal Reasoning/Geometric Transformers: 55:12–57:34
- Meta RL for Language Agents: 61:42–67:15
- Agent Task Costs & METR Plot: 67:15–76:19
- NY RAISE Act (Policy): 77:53–80:40
- Activation Oracles (Interpretability): 80:40–86:05
- OpenAI Preparedness/Safety: 92:44–93:37
- Grok Deepfake Image Controversy: 93:37–95:53
Final Thoughts
With deep dives into technical, business, and policy realms, this episode offers a sweeping—and opinionated—look at the state of AI in early 2026. As always, the hosts balance skepticism with optimism, note the signal amid the hype, and provide candid (sometimes cheeky) commentary on news that will shape the next year.
For deeper dives, listeners are encouraged to review the referenced blog posts, papers, and the full transcript for technical details.
