This Day in AI Podcast
Episode: Long Horizon Agents, State of MCPs, Meta's AI Glasses & Geoffrey Hinton is a LOVE RAT - EP99.17
Hosts: Michael Sharkey & Chris Sharkey
Date: September 19, 2025
Overview
In this characteristically self-deprecating episode, Michael and Chris Sharkey delve into three key topics: the reality behind model provider behavior (including model degradation and routing), the challenge of building genuinely autonomous "long horizon" AI agents, and the state of MCPs (Modular Capability Providers) for practical business AI. The brothers also entertain with their takes on Meta's latest smart glasses, before closing on the viral "Love Rat" story about AI godfather Geoffrey Hinton.
1. Do AI Model Providers Degrade Their Models Over Time? (00:37–05:01)
- Key Issue: Community speculation that AI providers (e.g., OpenAI, Anthropic/Claude) intentionally degrade models or re-route queries to cheaper or lower-quality versions after a model launches to save costs.
- Latest Revelation:
- Anthropic's Statement: "We never intentionally degrade model quality as a result of demand or other factors."
- Reality Check: Anthropic/Claude admits to recent routing errors causing up to 16% of requests to be sent to the wrong (lower quality) model, especially tied to their new 1-million-token context feature.
- Implication: There's more "behind-the-scenes" evaluation and routing based on token count, input content, etc., than most users realize.
“Even their post-mortem sort of admits that there is routing going on ... Even if they are telling the truth, it's still a little bit sneaky.”
—Nate, (01:57)
- Takeaway: Even if not malicious, AI providers do sometimes auto-route queries, leading to possible degradation or inconsistent quality.
2. The Real Bottleneck for LLM Agents: Execution, Not Reasoning (03:34–12:41)
2.1 Insights from "Illusion of Diminishing Returns" Paper (03:34–07:53)
- Paper Argument: LLMs fail at long-horizon tasks not because of poor reasoning, but due to compounding execution mistakes.
- As an agent proceeds step-by-step, a single error can snowball and lead to further errors—unless actively re-prompted or supervised.
- Experiment Results:
- GPT-5 can execute over 1,000 steps correctly; Claude Sonnet manages 432, showing a huge gap.
"If in the paper they, they can get a lot further than you would think. So GPT-5 can execute over a thousand steps correctly. ... The next best competitor is Claude for Sonnet at 432 steps.”
—Chris, (06:49)
2.2 Why the Agent Needs Supervision (07:53–12:41)
- Models need nudges: LLMs benefit hugely from human (or supervisory agent) interventions. Left unsupervised, they tend to compound their own errors.
- Key Approaches:
- Use smaller stepwise tasks with user or supervisor feedback.
- Consider multi-threaded execution (simultaneous different approaches) with meta-evaluation or voting among "sub-agents."
- Building systems where "supervisor" agents can intervene, or select/balance among outcomes, may matter more than scaling model step capacity.
“My preference has always been do it in smaller steps and let's evaluate and guide you along the way rather than thinking that some holy grail model is just going to fully solve an issue.”
—Nate, (12:41)
- Practical Example:
- Building AI support agents: best to connect specific tools, train on step-by-step examples, constrain context/actions tightly.
3. Building AI Agents in Practice: Small Skills, Enterprise Context, and the State of MCPs (14:14–43:52)
3.1 Modular Design Over Monolithic Models (16:37–20:22)
- Even as models get longer context/memory, agentic tasks benefit from being decomposed into "skills" or subtasks, each tied to:
- Specific tools (MCPs)
- Well-defined workflow and feedback
- The agent’s role is then orchestrating discrete skills rather than being all-knowing.
3.2 Importance of Internal/Custom MCPs (26:48–32:58)
- MCPs (Modular Capability Providers):
- MCPs act as plug-in tools for AI assistants, allowing connections to business data, external APIs, or custom functions.
- Key Opportunity:
- Building internal company MCPs (e.g., accessing internal databases, server commands, CRM data) is the next frontier for enterprise automation and decision-making.
- Practical examples: automatic gathering of support ticket context, AI-generated business metrics dashboards, and reports from internal data.
"Every company needs to have an internal MCP that exposes data.”
—Nate, (28:16)
- Challenge:
- Security, permissions, and ease of integration are still major hurdles, especially with sensitive data and complex business processes.
3.3 State of MCP Auth, Registries, and Adoption Barriers (36:18–43:52)
- Authentication Friction:
- Many enterprise SaaS APIs require developer accounts, pre-approvals, or awkward OAuth flows, limiting plug-and-play for “the rest of us.”
- Big platforms (Atlassian, Intercom, Asana, etc.) often only allow trusted “big name” clients in their MCP implementations.
“The state of MCPS is an absolute mess. Like it's gotta, it just needs a big overhaul...”
—Chris, (39:32)
-
Registry Problems:
- Many “MCP registries” pop up, but typically list only a handful of actually useful or ready-to-use MCPs, most with inconsistent implementations or vague setup instructions.
-
Path Forward:
- Drive toward more user-friendly, standard, plug-and-play protocols and marketplaces (possibly paid/curated) for high-quality MCPs.
- Allow business users to “train” or “define” their own agents, and have tools observe then automate actual workflows.
4. Meta’s Ray-Ban Display Glasses & Wearable AI: Innovation or Gimmick? (50:59–59:49)
- Meta Ray-Ban Display Launched (51:42):
- Smart glasses with a right-eye display, camera, audio, voice assistant, hand-motion wristband for silent interaction (e.g. scribbling on your leg to reply to texts).
- Demo revealed flaws in the “stealth texting” in meetings—awkward eye movement and distraction give it away.
“What you can do is reply to a text by like scribbling on your leg ... Their eye movements were so weird.”
—Chris, (53:09)
- Potential Use Cases:
- Passively gathering context (e.g., in work environments or sports), live translation, and specialized scenarios (e.g., cycling with telemetry, hands-free documentation).
- Upcoming Oakley Meta Vanguard glasses shown for cycling/sports.
“They can live translate ... detect the direction of the voice that's speaking to you ... and put up on the screen like what that person’s saying.”
—Chris, (60:47)
- Hype vs. Reality:
- Skepticism about mainstream value; more optimism for vertical/specialist domains.
- Lack of an open SDK/app store currently, but if it arrives, could unleash creativity for custom business or accessibility apps.
5. The Viral Geoffrey Hinton "Love Rat" AI Breakup Story (61:44–64:52)
- Summary: Media reports revealed Geoffrey Hinton's girlfriend broke up with him using ChatGPT, giving him a chatbot-generated critique of his behavior.
- Hinton admitted, in his own words, "She got the chatbot to explain how awful my behavior was and gave it to me...I didn't think I had been a rat."
- The story prompted a running joke about Hinton's “love rat” status, reinforced with a jokey musical competition between the hosts.
“He voluntarily gave, he wanted it out in the media. That's what Geoffrey Hinton is, a player.”
—Chris, (63:44)
- Lighthearted Takeaway: Even AI godfathers aren’t immune to the social consequences (or comic possibilities) of their creations.
6. Notable Quotes & Moments
-
On Model Degradation:
"There's more introspection going on than people expect—it's not just round robin routing. They're clearly looking at factors like the number of tokens, the content of your messages, to decide which model to send it to." —Nate, (03:05) -
On Execution over Reasoning:
“The real bottleneck in LLMs right now is actually execution, not reasoning.” —Chris, (03:34) -
On AI Copilot Impact:
“Six months ago I was writing a lot of manual code. ... Six months later, rarely. I'm just yelling, ‘do this! Plus, no, you're wrong, you're an idiot.' You know, like you're a director now.” —Chris, (48:02) -
On Automation and Work:
“Maybe six months from now, people will just start automating away different processes ... their job will be to control and supervise and run those things.” —Chris, (49:14) -
On Geoffrey Hinton:
“She got the chatbot to explain how awful my behavior was and gave it to me ... I didn't think I had been a rat.” —Hinton, quoted by Chris, (62:09)
7. Overall Tone
Casual, irreverent, and self-deprecating, but peppered with genuine insights and specific practical observations. The hosts consistently frame technical developments in terms of everyday usability and the real life messiness of tool integration, with a flair for dry humor and pop culture references. The episode closes with goofy AI-generated music poking fun at Geoffrey Hinton’s “love rat” public image.
8. Timestamps of Key Segments
- Model Provider Routing/Degradation: 00:37–05:01
- Long Horizon Agents & Execution Failure: 03:34–12:41
- Agent Design (Skills, Supervision, Memory): 12:41–24:22
- Custom/Internal MCPs & Enterprise Automation: 26:48–32:58
- MCP APIs/Auth/Registry Problems: 36:18–43:52
- Meta Ray-Ban Display Glasses Review: 50:59–59:49
- Geoffrey Hinton Breakup/Love Rat Song: 61:44–end
9. Memorable Moment
“Jeffrey the love rat, king of AI, swiping through the ladies like I optimize ... Got my deep learning charm and my neural net game, when I find someone better, I'm gone without shame.” —Excerpt from the hilarious "Geoffrey Hinton Love Rat" AI-generated song, (66:02–68:43)
Summary:
A lively exploration of the genuine day-to-day challenges in building useful AI—from the realities of model performance and agentic autonomy, through the messiness of tool integration and authorization, to the fun (and foibles) of gadgets and industry personalities. The episode is equal parts practical guide, skeptical commentary, and irreverent entertainment.
