Decoder with Nilay Patel: The Good, Bad, and Future of AI Agents
Date: October 2, 2025
Host: Hayden Field, Senior AI Reporter at The Verge (guest-hosting for Nilay Patel)
Guest: David Hershey, Lead of Applied AI Team at Anthropic
Episode Overview
This episode dives deep into the current reality, challenges, and future potential of AI agents—autonomous AI systems capable of completing complex multi-step tasks over hours or even days. Hayden Field talks with David Hershey of Anthropic about their latest breakthrough model, Claude Sonnet 4.5, and what it truly means for the field. The conversation covers where agentic AI excels today, where it still struggles, surprising industry adoption stories, model limitations, and what the near future may hold.
Key Discussion Points & Insights
1. State of AI Agents Today
- Agents Are Improving, but Still Patchy Across Domains
- "I've seen agents come a long ways in the last year...there are places where we're starting to see what it looks like when agents work really well. And there are still a lot of places where they don't work really well." — David Hershey [04:50]
- Coding as the Standout Use Case
- The industry has made the most visible progress with AI agents in code writing/development, but agents still hit surprising snags in areas like complex UIs and seemingly simple spreadsheet manipulations.
- Gaps Remain, Often in Unexpected Places
- "Sometimes there's stuff they're still not good at...they fall over and stumble over themselves on something sort of silly." — David Hershey [05:39]
- Progress Is Uneven
- "We're making really fast progress; it's just not necessarily visible in every part of the economy yet." — David Hershey [06:44]
2. Consumer-Facing Weaknesses and Lingering Challenges
- No Singular Blind Spot—Lots of Little Ones Instead
- "I can't quite put my finger on. There's just this one thing [that agents can't do]. I think this is why this field is hard. We're just constantly working on the whole universe of the stuff people do in room computer and trying to help them out." — David Hershey [07:30]
- Often Tripped By the Mundane in Specific Workflows
- Agents might do 99% of a finance task right but fail to properly manipulate a spreadsheet cell.
3. Surprising Industry Adoption
-
Legal Sector as a Standout
- "One of the domains that surprised me the most...is the legal domain. At face value, it's apparent why that can be really useful...But actually there's so much depth and complexity to the legal field, which I didn't appreciate." — David Hershey [09:33]
-
Rapid Legal AI Growth
- Despite being traditionally slow to adopt tech, the legal industry has seen a boom with AI agents thanks to both need (information volume) and integration of domain experts.
-
Other Unexpected Surprises Appear with Model Releases
- Each new release leads to "micro surprises"—new things the model is suddenly capable of, showing progress is often domain-dependent and unpredictable.
4. Why Better Data and Specialist Feedback Matter
- Enhancing Niche Use Cases
- "We need great ways to learn from specialists...I think a lot about learning directly from our customers. I think there's a future where more companies can contribute more directly to making the models do the stuff they care about." — David Hershey [13:02]
- Anthropic’s Strengths Reflect Its Staff
- AI models excel at coding partly because the company is filled with software engineers; better performance in other domains will require bringing more domain experts into the development process.
5. Claude Sonnet 4.5: What’s New and What’s Next
Why Is Sonnet 4.5 a Big Deal?
- Unprecedented Agentic Autonomy
- Capable of working continuously for up to 30 hours on a single complex development task, far outstripping prior models.
- Smarter, More Capable, and Unexpected Utility
- "I'm really confident this model is the smartest model we've ever created... Some of it's stuff that we really don't know until our customers try to build cool new things... and they suddenly make it work." — David Hershey [18:35]
- Concrete Coding Milestones
- The model was able to recreate Anthropic's own consumer chat application (Claude AI) overnight, fully autonomously, including advanced features like rendering live documents ("Artifacts")—a process which previously took human engineers months.
- "We woke up and it just did it. This beautiful clone of Claude AI that works incredibly well... It would take me months to do if I did not have Claude. And overnight we sort of looked at it and watched it happen." — David Hershey [21:50]
The “30 Hour” Autonomy Example
- Slack/Teams-Style App Built Autonomously
- "It has DMs and threads and channels and a slick search functionality... multi user authentication and Claude even implemented a whole bunch of AI users for testing... It is not by any means Slack, but you'd look at it and think that was a pretty reasonable productivity app." — David Hershey [25:24]
Notable Memorable Moments
- Model’s Work Approach Feels "Pragmatic” and “Coworker-like”
- "It's just like pragmatic kind of. It's like, okay, right now I'm going to test does image upload work and then it's going to do that... It feels more natural, and funny enough, it cracks better jokes." — David Hershey [27:19]
- Sycophancy ("over-complimenting") is a known issue in LLMs; Sonnet 4.5 shows more willingness to push back, aligning better with real-world coworker expectations.
Concerns for the Future of Work
- Job Replacement vs. Collaboration
- "I'm not currently worried right now. Claude is a collaborator. It accelerates me... But to be really honest, watching Claude go for 30 hours, it does trigger a little bit like, oh, my God, that's a pretty different thing... If it's really going to, like, build the whole app itself, like, we have probably a different role that we need to play here." — David Hershey [30:14]
6. Current Limitations & Model "Dumbness"
-
Spatial Awareness Remains a Major Stumbling Block
- "It's really bad at spatial awareness still. It just basically doesn't know the difference between left and right and up and down... it can do PhD level math... and just for it to not really understand that it can't walk straight through a building hurts my brain." — David Hershey [35:48]
-
Continuous Gaps in Highly Specialized Fields
- Models are still better at coding than at being lawyers or accountants; constant iteration and feedback from domain experts are needed.
7. Anthropic’s Market Focus: Enterprise, Consumer, or Both?
- No Singular Focus: General Smarts Serve All Client Types
- "When we make our models generally smarter, they service all of those segments. They service enterprises... They service consumers... and they're useful for the public sector, too." — David Hershey [41:43]
- First-Party Applications vs. Ecosystem Partners
- Direct consumer interfaces like Claude Code are important, but much of Anthropic’s reach comes via an ecosystem—startups and enterprises building their own consumer or business applications with Claude’s models.
8. The Current and Future State of "Vibe Coding"
- Productivity Gains Are Real, But Interfaces Still Evolve
- "I actually do think Sonnet 4.5 is the first model that could be that thing where anybody could build a sort of production-ready application...I have a feeling, though, we need one more interface that isn't Claude Code and isn't Cursor. But the next step past that I think needs to happen." — David Hershey [46:30]
Notable Quotes (with Timestamps)
- On Progress and Limits
- "I have a feeling sort of each model that comes out will get one bit closer to being something that everybody can sort of interact with and see." — David Hershey [06:44]
- On Weird Model Failures
- "It can do PhD level math and I can't. And just for it to not really understand that it can't walk straight through a building hurts my brain." — David Hershey [35:48]
- On the Nature of Progress
- "The fun thing about this space is it's really hard to guess where the next agent is going to take off because sometimes there's just this one little tiny thing that's not super obvious... that's blocking an agent from working." — David Hershey [10:18]
- On Job Impact
- "Claude is a collaborator... But to be really honest, watching Claude go for 30 hours, it does trigger a little bit like, oh, my God, that's a pretty different thing. It is a meaningful step to change, and I think it does... change how we do jobs." — David Hershey [30:14]
Timestamps of Important Segments
- State of AI Agents / Progress in Coding: [04:50]–[07:04]
- Consumer-Facing Shortcomings: [07:04]–[09:05]
- Surprising Industry Adoption: [09:05]–[12:49]
- Specialist Contributions and Data: [12:49]–[14:46]
- Sonnet 4.5’s Distinction & Coding Demo: [18:35]–[26:48]
- Behavioral Improvements and Natural Collaboration: [27:19]–[29:13]
- Job Disruption & Human Roles: [30:14]–[31:41]
- Limitations (Spatial/Dumbness): [35:32]–[38:01]
- Anthropic’s Market Approach: [41:43]–[45:54]
- ‘Vibe Coding’ – Interface & Productivity: [45:54]–[48:23]
Conclusion
The episode spotlights both the rapid forward leaps and the “long tail” of unresolved quirks for agentic AI. Anthropic’s Claude Sonnet 4.5 is a watershed moment for autonomous software generation, but the broader vision of agents that seamlessly tackle arbitrary (and mundane) digital or real-world tasks is still on the horizon. Hershey’s optimism is tempered with realism; as fast as progress arrives, new challenges always emerge. Most meaningfully, the discussion frames AI agents less as job replacements and more as fundamentally new kinds of co-workers—at least for now.
