Podcast Summary: "The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)"
The MAD Podcast with Matt Turck
January 22, 2026
Episode Overview
In this lively and deeply technical episode, host Matt Turck brings together Tim Dettmers (Assistant Professor at Carnegie Mellon, Research Scientist at AI2) and Dan Fu (VP of Kernels at Together AI, Assistant Professor at UC San Diego) for a "reality check" discussion on hardware bottlenecks, AGI definitions, the future of compute, and the explosive practical rise of AI agents.
The core tension explored is Dettmers’ belief that we're rapidly approaching the limits of GPU scaling and thus facing a plateau in AI progress, versus Fu’s more optimistic outlook that there’s vast untapped potential—even with current and next-gen hardware, with models only now beginning to leverage the infrastructure available.
They also break down the rise of agents, their impact in coding and other workflows, and the practical skills both technical and non-technical users need to thrive in the "agent era".
Key Discussion Points & Insights
1. Guest Backgrounds & Expertise
-
Tim Dettmers:
Specializes in efficient deep learning, quantization, and coding agents. Noted for reducing memory usage while maintaining performance.
"My past research has been mostly on efficient deep learning quantization ... use up to 16 times less memory than if you have dense ... now I'm working on coding agents." (01:34) -
Dan Fu:
Focuses on accelerating language models at the kernel/GPU level, creator of FlashAttention, works on both kernel optimization and alternative model architectures, recent focus on deploying and accelerating models on the latest Nvidia hardware.
"In industry, I focus a lot on basically making models go fast ... GPU kernels are the things that actually translate the models to how they run on the GPU." (02:25)
2. Defining AGI & The State of AI Today
-
Dan Fu:
Suggests by many older definitions, we're at or extremely close to AGI:
"By almost any definition anyone could have written down, let's say five years ago or 10 years ago ... we basically have the vision of AGI that we had back then." (00:02, 03:55) -
Tim Dettmers:
Argues there's ambiguity and overhype in the term “AGI”; prefers an economic definition—usefulness and the ability to trigger an industrial revolution (06:00).
"We don't think carefully about the definition ... what I think makes sense is this economic angle. Can we get another industrial revolution?" (05:33—06:55)
3. AGI Hype vs. Computational Reality
-
Roots of AGI Narratives:
Dettmers outlines their origins in "effective altruism" and rationalist circles, warning against “lazy thinking” and unexamined extrapolations:
"There's always like, 'oh, we get AGI in two years' ... a little bit of being in a bubble ... not being exposed to different ideas." (07:31) -
Hard Physical Constraints:
Dettmers details how exponential progress always meets diminishing returns. Core physical structure (latency, memory movement) imposes hard ceilings on how fast, cheap, or effective GPU computing can get.
"Everything that grows exponential will level off. Because if you need resources, the resources will be exhausted." (11:40) -
The End of GPU Scaling:
"GPUs will no longer improve, meaningfully. We have essentially seen the last generation of significant GPU improvements ... maxed out on the additional features ... that's the end of it." (11:26—16:12)
4. The Optimistic Case: Hardware and Software Underutilization
-
Dan Fu:
Contrasts by arguing most current models woefully under-leverage even existing hardware (low chip utilization rates). There’s easily “100x more compute” in the short-term pipeline, with ever-bigger clusters and more efficient training/inference methods emerging:
"If you look at where the systems are today ... we are just so far from even using the last generation of hardware as efficiently as possible ... you can see up to two orders of magnitude more compute available." (16:16—17:50) -
Models Are Lagging Indicators:
"The models that we see today ... are already trained on clusters that are a year and a half old ... The models we index on quality today are actually trained on pretty old hardware." (21:25) -
Post-training & Usefulness:
Pre-training dominates compute cost, but post-training enables precise, domain-specific utility: "Pre training is like the general strength training that you do in the gym ... post training is like the specific drills that you run." (23:10)
5. The Usefulness Convergence
-
Matt Turck:
Suggests, regardless of where “true AGI” lands, practicality and usefulness win: "What ultimately matters is where you land in terms of usefulness in the industry ... We still have so much juice to squeeze." (25:30) -
Tim Dettmers:
"You shouldn't pay too much attention to AGI but more about thinking about how can we make it most useful." (26:06) -
Dan Fu:
Notes the transformative potential is already showing itself sector-by-sector (e.g., self-driving, healthcare), and compares the breakthrough to self-driving cars, where sudden leaps in reliability change perceptions overnight:
"Progress is funny in this way ... it's not there, and then one day ... it's actually a lot better than the service that I'd get in an Uber." (27:25)
6. Hardware Ecosystem: Beyond Nvidia?
- Multi-Hardware, Multi-Chip Future:
Dan Fu envisions increasing specialization and diversity (AMD, Grok, Cerebras, etc.), especially for inference workloads: "You're going to see a lot more diversity, especially around inference ... training and inference are actually quite different computations and as a result you might want quite different chips to do it." (30:10)
7. The Agent Era: Are We at the Inflection Point?
-
Coding Agents as the 'Switch Flip' Moment:
Dan Fu recounts a transformation in 2025, when agents became substantially better than even domain experts at GPU kernel work, dramatically accelerating productivity: "Last June we had this really interesting realization ... these agentic coding assistants, were actually very good at writing these kernels ... I was like, oh my God, this thing is making me five times more productive as a kernel expert." (32:32—34:15) -
Generalization of Agents:
Tim Dettmers sees coding agents as general agents, with broad impact outside of coding, rapidly accelerating all digital workflows: "Coding agents are general agents ... coding agents make things so easy ... you can paralyze a lot of different tasks." (35:15) -
The New Skill: Agent Literacy:
Dettmers insists over 90% of code/text should now be agent-generated, with critical human review and customization: "If you don't know how to use agents well, you will be left behind. That will become a critical skill." (39:11)
8. Practical Advice: How to Harness Agents (Even as a Non-Coder)
-
Start Small:
Try automating minor tasks, use visual feedback, iterate, and play—agents can explain and adapt for non-coders: "With minimal learning, you can get there, execute programs, build websites ... The agents write good code." (39:25—41:10) -
How to Pick What to Automate:
Use both creative "what would be useful?" and more analytical, process-oriented approaches (cost-benefit, time saved vs. time to automate): "Look at how you work, you time each of these steps ... you can quickly realize that automating certain things will not make a difference." (41:17)
9. Managing & Learning with Agents
-
Treat Agents Like Junior Employees:
Break down tasks, supervise, and provide context—don’t just give agents limitless freedom or mechanize everything blind: "Making the agents effective ends up being a lot like managing junior folks on your team or at a company." (44:00) -
Expertise Multiplies Agent Power:
The more domain expertise you have, the more agents can boost your productivity. The process for becoming an expert hasn't changed, but learning is now easier and more interactive with agents as teachers/collaborators. (44:00—47:38) -
Education Challenge:
Current students can become reliant on agents before they've internalized core concepts. The future demands both foundational knowledge and agent fluency—a challenge for teachers and learners: "If we allow students to use agents, they are very productive. But sometimes the built solutions ... are actually very bad ... we don't want to have students that don't understand things, but we also want students that basically can use agents." (49:56)
10. What's Next: Research & Commercial Priorities
-
AI2 (Dettmers):
Upcoming open-source coding agent that can:- Be trained 100x cheaper than current SOTA agents.
- Rapidly specialize to private codebases automatically and locally.
- Offers a complete scientific breakdown of what actually moves the needle for coding agents. "You can just point our method to that repository ... quickly generate the data and then you have an agent that is as good as a frontier model, but you can deploy it locally." (52:44)
-
Together AI (Fu):
Focus on model efficiency at inference time (currently sub-5% hardware utilization!), unveiling “mega kernels” (packing entire models into a single GPU kernel for big speed-ups), and "Together Atlas" (adaptive speculative decoding). "At inference time, when you have the model ... the hardware utilization is less than 5%. So it's at a place where there's so much more we can do." (54:44)
11. Looking Forward: The Rest of 2026
-
Dettmers:
Sees most surprise not at the "frontier" (biggest models likely to plateau; user experience to improve, not capabilities), but in efficient smaller/specialized models that are easier to deploy and own. "Performance on the frontier will stagnate. But on the smaller level we get more and more powerful models still ... smaller models might even be better because they're specialized." (58:19—60:49) -
Fu:
Extremely optimistic about open source model leaps, new hardware launches, and multi-modality (video, audio).
"I think we're going to see another big jump in open source capabilities ... excited to just see what is that frontier of intelligence you can get on your laptop or on your phone." (60:49)
12. Post-Transformer Architectures?
- Fu:
State space and hybrid architectures are already present (best audio models, new minimax/linear attention hybrids), with Chinese labs leading on risky, innovative research: "You're going to see a lot more diversity in architectures ... kind of already seeing it." (62:25)
Notable Quotes & Memorable Moments
-
Dan Fu:
“By almost any definition ... we basically have the vision of AGI that we had back then.” (00:02) -
Tim Dettmers:
“Everything that grows exponential will level off. Because if you need resources, the resources will be exhausted.” (11:26) -
Dan Fu:
“You can see up to two orders of magnitude more compute available, 100x more compute.” (17:50) -
Tim Dettmers:
“If you don't know how to use agents well, you will be left behind.” (22:22, 39:11) -
Dan Fu:
“It might not generate the right thing for you. But if you give an expert programmer this set of tools, they can go 10 times faster than they were able to go before. And I think that's a really exciting place to be.” (34:34) -
Tim Dettmers:
“More than 90% of code and text should be written by agents. You need to do so or you will be left behind.” (37:18)
Timestamps for Important Segments
- [00:02] – Dan Fu: “By almost any definition anyone could have written down ... we basically have the vision of AGI that we had back then.”
- [11:26] – Tim Dettmers: Exponential progress, GPU scaling bottlenecks, “Everything that grows exponential will level off.”
- [16:16] – Dan Fu: “We are just so far from even using the last generation of hardware as efficiently as possible ...” Plus specifics of utilization numbers and stalling progress.
- [22:47] – Matt Turck and Dan Fu: On pre-training, post-training, and model lag.
- [35:15] – Tim Dettmers: Why coding agents are general agents and the broader impact.
- [39:11] – Tim Dettmers: “If you don't know how to use agents well, you will be left behind. That will become a critical skill.”
- [49:56] – Tim Dettmers: Educational tradeoffs—can students master both foundational CS and agent skills?
- [52:44] – Tim Dettmers: Announcing upcoming low-cost, highly customizable coding agent.
- [54:44] – Dan Fu: Together AI’s priorities: “At inference time ... hardware utilization is less than 5% ... there’s so much more we can do.”
- [58:19–60:49] – Both: Predictions for the rest of 2026—stagnant frontier, rapidly improving smaller models, open-source explosion.
Tone
- Candid, technical, future-facing, but with a pragmatic and sometimes skeptical undercurrent.
- Interleaves deep technical observations with pragmatic advice for all listeners.
- Occasional playful ribbing at the hype cycles that dominate the AGI discourse.
- Insistent that the "agent era" is already here, urging all to develop agent skills on pain of irrelevance.
Takeaways for Listeners
- AI progress is at an inflection point not just in raw capability, but in practical, workplace and daily task automation via agents.
- While hardware scaling may plateau, massive optimization potential remains untapped—those who leverage it early will have superpowers.
- Agent skills—knowing how to harness, guide, manage, and collaborate with agents—are becoming the most essential digital literacy.
- Practical usefulness will matter much more than ambiguous definitions of AGI—focus on achieving tangible economic and workflow gain.
- Open-source models and small, specialized models will bloom in 2026 as deployment and ownership become practical and effective.
